diff --git a/.github/PULL_REQUEST_TEMPLATE/README.md b/.github/PULL_REQUEST_TEMPLATE/README.md index 0d82b22268a..b895a04a626 100644 --- a/.github/PULL_REQUEST_TEMPLATE/README.md +++ b/.github/PULL_REQUEST_TEMPLATE/README.md @@ -2,18 +2,24 @@ This folder is for custom pull request templates. Templates are Markdown (\*.md) files. -These custom templates can be used for example, by individual teams to have a custom pull request template with team specific testing or documentation instructions. +These custom templates can be used for example, by individual teams to have a custom pull request +template with team specific testing or documentation instructions. -Read more in [Github's docs](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository) +Read more in +[Github's docs](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository) -If you update the default PR template, you also need to update the commit metadata in github branch rulesets. +If you update the default PR template, you also need to update the commit metadata in github branch +rulesets. # How To Use This Folder To create a custom template, create a new markdown file in this folder. -Then create a link of the form `https://github.com/mongodb/mongo/compare/main...my-branch?quick_pull=1&template=your_new_template.md` +Then create a link of the form +`https://github.com/mongodb/mongo/compare/main...my-branch?quick_pull=1&template=your_new_template.md` -Share that link in your team docs to use for creating PRs. By selecting an unused values for `my-branch` it should show a branch selector when following the link. +Share that link in your team docs to use for creating PRs. By selecting an unused values for +`my-branch` it should show a branch selector when following the link. -Read more in [Github's docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/using-query-parameters-to-create-a-pull-request) +Read more in +[Github's docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/using-query-parameters-to-create-a-pull-request) diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 4372a97e2d3..d89f10230c0 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1 +1,2 @@ -Anything in this description will be included in the commit message. Replace or delete this text before merging. Add links to testing in the comments of the PR. +Anything in this description will be included in the commit message. Replace or delete this text +before merging. Add links to testing in the comments of the PR. diff --git a/.prettierrc b/.prettierrc index 6499bdd0fb8..c0f5e9ae3f5 100644 --- a/.prettierrc +++ b/.prettierrc @@ -15,6 +15,13 @@ "parser": "yaml", "tabWidth": 4 } + }, + { + "files": "*.md", + "options": { + "proseWrap": "always", + "printWidth": 100 + } } ] } diff --git a/README.md b/README.md index bf66f6a8690..c2728e8d826 100644 --- a/README.md +++ b/README.md @@ -49,8 +49,7 @@ You can install compass using the `install_compass` script packaged with MongoDB $ ./install_compass ``` -This will download the appropriate MongoDB Compass package for your platform -and install it. +This will download the appropriate MongoDB Compass package for your platform and install it. ## Drivers @@ -88,9 +87,9 @@ https://www.mongodb.com/cloud/atlas ## LICENSE -MongoDB is free and the source is available. Versions released prior to -October 16, 2018 are published under the AGPL. All versions released after -October 16, 2018, including patch fixes for prior versions, are published -under the [Server Side Public License (SSPL) v1](LICENSE-Community.txt). -See individual files for details which will specify the license applicable -to each file. Files subject to the SSPL will be noted in their headers. +MongoDB is free and the source is available. Versions released prior to October 16, 2018 are +published under the AGPL. All versions released after October 16, 2018, including patch fixes for +prior versions, are published under the +[Server Side Public License (SSPL) v1](LICENSE-Community.txt). See individual files for details +which will specify the license applicable to each file. Files subject to the SSPL will be noted in +their headers. diff --git a/bazel/docs/architecture/ppc64le_build_from_source.md b/bazel/docs/architecture/ppc64le_build_from_source.md index 41350e8aabe..2dda089afba 100644 --- a/bazel/docs/architecture/ppc64le_build_from_source.md +++ b/bazel/docs/architecture/ppc64le_build_from_source.md @@ -1,10 +1,13 @@ # Building Bazel from Source to target the PPC64LE Architecture -Bazel doesn't release to the PPC64LE architecture. To address this, MongoDB maintains our own Bazel build that we perform on our PPC64LE development systems. +Bazel doesn't release to the PPC64LE architecture. To address this, MongoDB maintains our own Bazel +build that we perform on our PPC64LE development systems. # JDK? -Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't support PPC64LE. To get around this, an external JDK must be present on both the system compiling the Bazel executable itself as well as the host running Bazel as a build system. +Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't +support PPC64LE. To get around this, an external JDK must be present on both the system compiling +the Bazel executable itself as well as the host running Bazel as a build system. On the MongoDB PPC64LE Evergreen static hosts and dev hosts, the OpenJDK 21 installation exists at: diff --git a/bazel/docs/architecture/s390x_build_from_source.md b/bazel/docs/architecture/s390x_build_from_source.md index 55553dd064e..e5097a9bc6f 100644 --- a/bazel/docs/architecture/s390x_build_from_source.md +++ b/bazel/docs/architecture/s390x_build_from_source.md @@ -1,10 +1,13 @@ # Building Bazel from Source to target the S390X Architecture -Bazel doesn't release to the S390X architecture. To address this, MongoDB maintains our own Bazel build that we perform on our S390X development systems. +Bazel doesn't release to the S390X architecture. To address this, MongoDB maintains our own Bazel +build that we perform on our S390X development systems. # JDK? -Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't support S390X. To get around this, an external JDK must be present on both the system compiling the Bazel executable itself as well as the host running Bazel as a build system. +Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't +support S390X. To get around this, an external JDK must be present on both the system compiling the +Bazel executable itself as well as the host running Bazel as a build system. On the MongoDB S390X Evergreen static hosts and dev hosts, the OpenJDK 21 installation exists at: diff --git a/bazel/docs/best_practices.md b/bazel/docs/best_practices.md index d532a8d562e..9ab534806cb 100644 --- a/bazel/docs/best_practices.md +++ b/bazel/docs/best_practices.md @@ -1,3 +1,4 @@ # MongoDB Bazel Best Practices -Please refer to https://bazel.build/configure/best-practices as a baseline. This doc will be updated with MongoDB-specific best practices as they're defined. +Please refer to https://bazel.build/configure/best-practices as a baseline. This doc will be updated +with MongoDB-specific best practices as they're defined. diff --git a/bazel/docs/developer_workflow.md b/bazel/docs/developer_workflow.md index 019bb44157f..aebfa5ebbda 100644 --- a/bazel/docs/developer_workflow.md +++ b/bazel/docs/developer_workflow.md @@ -4,7 +4,8 @@ This document describes the Server Developer workflow for modifying Bazel build # Creating a new BUILD.bazel file -A build target is defined in the directory where its source code exists. To create a target that compiles **src/mongo/hello_world.cpp**, you would create **src/mongo/BUILD.bazel**. +A build target is defined in the directory where its source code exists. To create a target that +compiles **src/mongo/hello_world.cpp**, you would create **src/mongo/BUILD.bazel**. src/mongo/BUILD.bazel would contain: @@ -15,7 +16,8 @@ src/mongo/BUILD.bazel would contain: ], } -Once you've obtained bazel by running **python buildscripts/install_bazel.py**, you can then build this target via "bazel build": +Once you've obtained bazel by running **python buildscripts/install_bazel.py**, you can then build +this target via "bazel build": bazel build //src/mongo:hello_world @@ -23,13 +25,17 @@ Or run this target via "bazel run": bazel run //src/mongo:hello_world -The full target name is a combination between the directory of the BUILD.bazel file and the target name: +The full target name is a combination between the directory of the BUILD.bazel file and the target +name: //{BUILD.bazel dir}:{targetname} # Adding a New Header / Source File -Bazel makes use of static analysis wherever possible to improve execution and querying speed. As part of this, source and header files must not be declared dynamically (ex. glob, wildcard, etc). Instead, you'll need to manually add a reference to each header or source file you add into your build target. +Bazel makes use of static analysis wherever possible to improve execution and querying speed. As +part of this, source and header files must not be declared dynamically (ex. glob, wildcard, etc). +Instead, you'll need to manually add a reference to each header or source file you add into your +build target. mongo_cc_binary( name = "hello_world", @@ -44,13 +50,15 @@ Bazel makes use of static analysis wherever possible to improve execution and qu ## Adding a New Library -The DevProd Build Team created MongoDB-specific macros for the different types of build targets you may want to specify. These include: +The DevProd Build Team created MongoDB-specific macros for the different types of build targets you +may want to specify. These include: - mongo_cc_binary - mongo_cc_library - idl_generator -Creating a new library is similar to the steps above for creating a new binary. A new **mongo_cc_library** definition would be created in the BUILD.bazel file. +Creating a new library is similar to the steps above for creating a new binary. A new +**mongo_cc_library** definition would be created in the BUILD.bazel file. mongo_cc_library( name = "new_library", @@ -61,7 +69,9 @@ Creating a new library is similar to the steps above for creating a new binary. ## Declaring Dependencies -If a library or binary depends on another library, this must be declared in the **deps** section of the target. The syntax for referring to the library is the same syntax used in the bazel build/run command. +If a library or binary depends on another library, this must be declared in the **deps** section of +the target. The syntax for referring to the library is the same syntax used in the bazel build/run +command. mongo_cc_library( name = "new_library", @@ -82,16 +92,20 @@ If a library or binary depends on another library, this must be declared in the ## Running clang-tidy via Bazel -Note: This feature is still in development; see https://jira.mongodb.org/browse/SERVER-80396 for details) +Note: This feature is still in development; see https://jira.mongodb.org/browse/SERVER-80396 for +details) To run clang-tidy via Bazel, do the following: 1. To analyze all code, run `bazel build --config=clang-tidy src/...` -2. To analyze a single target (e.g.: `environment_buffer`), run the following command (note that `_with_debug` suffix on the target): `bazel build --config=clang-tidy src/mongo/db/commands:environment_buffer_with_debug` +2. To analyze a single target (e.g.: `environment_buffer`), run the following command (note that + `_with_debug` suffix on the target): + `bazel build --config=clang-tidy src/mongo/db/commands:environment_buffer_with_debug` Testing notes: -- If you want to test whether clang-tidy is in fact finding bugs, you can inject the following code into a `cpp` file to generate a `bugprone-incorrect-roundings` warning: +- If you want to test whether clang-tidy is in fact finding bugs, you can inject the following code + into a `cpp` file to generate a `bugprone-incorrect-roundings` warning: ``` const double f = 1.0; @@ -105,12 +119,24 @@ const int foo = (int)(f + 0.5); Follow this loop to figure out where the header needs to be added 1. Build directly with bazel to speed up the loop: `bazel build //src/...` -2. This will fail on the first missing header dependency, search the bazel build files for the library the header is defined on. Currently there are cases where headers are incorrectly located so you'll need to use your best judgement. If the header exists on some library, add that library as a dep, for example `scoped_timer.h` is part of `scope_timer` library so add `//src/mongo/db/exec:scoped_timer` to deps field (this will take care of `scoped_timer.h` transitive dependencies). If not add the header directly to the hdrs field of the library that's failing to compile. +2. This will fail on the first missing header dependency, search the bazel build files for the + library the header is defined on. Currently there are cases where headers are incorrectly located + so you'll need to use your best judgement. If the header exists on some library, add that library + as a dep, for example `scoped_timer.h` is part of `scope_timer` library so add + `//src/mongo/db/exec:scoped_timer` to deps field (this will take care of `scoped_timer.h` + transitive dependencies). If not add the header directly to the hdrs field of the library that's + failing to compile. 3. Build directly with bazel `bazel build //src/...` -4. If there is a cycle remove the dependency from Step #2, add the header as direct dependency to the hdrs field, and then start back at Step #1 +4. If there is a cycle remove the dependency from Step #2, add the header as direct dependency to + the hdrs field, and then start back at Step #1 ### The header I want to add is referenced in dozens or more locations, and adding it to the proper location requires a large refactor that is blocking critical work, what should I do? -If you've put in a significant amount of work to try to get a header added and have found to get it added to the right place (usually alongside the associated .cpp file, having all dependents add that library as a dep) will take a significant refactor, create a SERVER ticket explaining the problem, solution, and complexity required to resolve it. Then, open up src/mongo/BUILD.bazel and add the header to "core_headers" file group referencing your ticket in a TODO comment. +If you've put in a significant amount of work to try to get a header added and have found to get it +added to the right place (usually alongside the associated .cpp file, having all dependents add that +library as a dep) will take a significant refactor, create a SERVER ticket explaining the problem, +solution, and complexity required to resolve it. Then, open up src/mongo/BUILD.bazel and add the +header to "core_headers" file group referencing your ticket in a TODO comment. -This is very much a last resort and should only be done if the refactor will take a very significant amount of time and is blocking other work. +This is very much a last resort and should only be done if the refactor will take a very significant +amount of time and is blocking other work. diff --git a/bazel/docs/engflow_credential_setup.md b/bazel/docs/engflow_credential_setup.md index 43fa71e5b71..4c2874d2fb3 100644 --- a/bazel/docs/engflow_credential_setup.md +++ b/bazel/docs/engflow_credential_setup.md @@ -1,7 +1,9 @@ # EngFlow Certification Installation -MongoDB uses EngFlow to enable remote execution with Bazel. This dramatically speeds up the build process, but is only available to internal MongoDB employees. +MongoDB uses EngFlow to enable remote execution with Bazel. This dramatically speeds up the build +process, but is only available to internal MongoDB employees. -Bazel uses a wrapper script to check the credentials on each invocation, if for some reason thats not working, you can also manually perform this process with this command alternatively: +Bazel uses a wrapper script to check the credentials on each invocation, if for some reason thats +not working, you can also manually perform this process with this command alternatively: python buildscripts/engflow_auth.py diff --git a/bazel/docs/header_cycle_resolution.md b/bazel/docs/header_cycle_resolution.md index 7625d41ea8b..0bc54af7aac 100644 --- a/bazel/docs/header_cycle_resolution.md +++ b/bazel/docs/header_cycle_resolution.md @@ -1,8 +1,12 @@ # Header Relocation and Cycle Resolution 1. Locate all the targets that reference the header file in BUILD.bazel files. -2. Find an ideal target to declare the header under. This is usually under the target that features the .cpp file of the same name. Otherwise, the header can be placed in its own library. -3. Ensure that all the targets that need this header can depend on the target the header was moved to. -4. Run `bazel build //src/...` to check for build failures (look for failures related to dependency cycles). -5. If the build fails because of a dependency cycle, you may need to split up the dependent library or relocate the header. +2. Find an ideal target to declare the header under. This is usually under the target that features + the .cpp file of the same name. Otherwise, the header can be placed in its own library. +3. Ensure that all the targets that need this header can depend on the target the header was moved + to. +4. Run `bazel build //src/...` to check for build failures (look for failures related to dependency + cycles). +5. If the build fails because of a dependency cycle, you may need to split up the dependent library + or relocate the header. 6. Once the build succeeds, please create a PR and include `devprod-build` for review. diff --git a/bazel/docs/rbe_images.md b/bazel/docs/rbe_images.md index 5187ca34851..1490463ff6d 100644 --- a/bazel/docs/rbe_images.md +++ b/bazel/docs/rbe_images.md @@ -1,8 +1,7 @@ # Remote execution images -The Dockerfiles for remote execution images are autogenerated to pin all -versions and allow for updates at the same time. To repin the image hashes and -package versions: +The Dockerfiles for remote execution images are autogenerated to pin all versions and allow for +updates at the same time. To repin the image hashes and package versions: ```bash # With Bazel diff --git a/bazel/docs/toolchain.md b/bazel/docs/toolchain.md index ba14adacbed..3701277e861 100644 --- a/bazel/docs/toolchain.md +++ b/bazel/docs/toolchain.md @@ -1,16 +1,22 @@ # About -This documents some useful tools, concepts, and debugging strategies for bazel toolchains. -This information was gathered while developing the WASI SDK toolchain. +This documents some useful tools, concepts, and debugging strategies for bazel toolchains. This +information was gathered while developing the WASI SDK toolchain. # Concepts -[Toolchain](https://bazel.build/extending/toolchains#debugging-toolchains) and [Platform](https://bazel.build/extending/platforms) are the core relevant concepts. -Toolchains define the tools used to compile, and the platform defines either the execution platform (for the compilation/compiler tools) and target platform (for the binary). -Bazel tries to search for a toolchain based on these constraints. +[Toolchain](https://bazel.build/extending/toolchains#debugging-toolchains) and +[Platform](https://bazel.build/extending/platforms) are the core relevant concepts. Toolchains +define the tools used to compile, and the platform defines either the execution platform (for the +compilation/compiler tools) and target platform (for the binary). Bazel tries to search for a +toolchain based on these constraints. -We also made use of [transitions](https://bazel.build/rules/lib/builtins/transition) which allow bazel to reconfigure itself before building a target to avoid passing irrelevant or incorrect compiler flags (e.g. WASI SDK doesn't support shared objects). -Similarly, we used [actions](https://bazel.build/docs/cc-toolchain-config-reference#using-action-config) instead of the tool paths attribute because of, [possibly historical, lack of support for remote resources in tool paths](https://stackoverflow.com/questions/73504780/bazel-reference-binaries-from-packages-in-custom-toolchain-definition/73505313#73505313). +We also made use of [transitions](https://bazel.build/rules/lib/builtins/transition) which allow +bazel to reconfigure itself before building a target to avoid passing irrelevant or incorrect +compiler flags (e.g. WASI SDK doesn't support shared objects). Similarly, we used +[actions](https://bazel.build/docs/cc-toolchain-config-reference#using-action-config) instead of the +tool paths attribute because of, +[possibly historical, lack of support for remote resources in tool paths](https://stackoverflow.com/questions/73504780/bazel-reference-binaries-from-packages-in-custom-toolchain-definition/73505313#73505313). # Debugging tools @@ -20,13 +26,15 @@ Similarly, we used [actions](https://bazel.build/docs/cc-toolchain-config-refere bazel ... --toolchain_resolution_debug=.* ... ``` -The above flag can be used to debug toolchain resolution as bazel tries to automatically satisfy constraints. +The above flag can be used to debug toolchain resolution as bazel tries to automatically satisfy +constraints. ## Debugging Remote Resources -Toolchains may be remotely fetched, but the directory structure of the build environment after these remote resources are fetched may not be clear. -`bazel info` can be used to find the bazel directory and inspect it `bazel info output_base`. -Note: this may be different depending on your configuration and level of sandboxing. +Toolchains may be remotely fetched, but the directory structure of the build environment after these +remote resources are fetched may not be clear. `bazel info` can be used to find the bazel directory +and inspect it `bazel info output_base`. Note: this may be different depending on your configuration +and level of sandboxing. This is particularly useful when used in combination with the `find` command as shown below. @@ -42,10 +50,11 @@ Note: this command is directory dependent because output_base is per bazel insta bazel ... -s ... ``` -This will show verbose output such as cd actions and compiler/linker invocations. -Note: bazel may recast paths relative to the exec directory. +This will show verbose output such as cd actions and compiler/linker invocations. Note: bazel may +recast paths relative to the exec directory. ## Debugging on Engflow -Engflow has a lot of helpful views showing remote execution stats and the remote file structure. -We don't intent to duplicate their documentation but be careful as some of their data (particularly remotely executed actions) may not be accurate immediately after execution. +Engflow has a lot of helpful views showing remote execution stats and the remote file structure. We +don't intent to duplicate their documentation but be careful as some of their data (particularly +remotely executed actions) may not be accurate immediately after execution. diff --git a/bazel/resmoke/README.md b/bazel/resmoke/README.md index 7baf3828f6f..c536ed4b7f2 100644 --- a/bazel/resmoke/README.md +++ b/bazel/resmoke/README.md @@ -38,18 +38,21 @@ resmoke_suite_test( ### Test Sharding -Test sharding allows you to split a large test suite across multiple parallel test executions, significantly reducing total test time. When `shard_count` is specified, Bazel will: +Test sharding allows you to split a large test suite across multiple parallel test executions, +significantly reducing total test time. When `shard_count` is specified, Bazel will: 1. Run the test target multiple times in parallel (up to the specified shard count) 2. Each shard receives a unique shard index (0 to N-1) 3. The resmoke runner uses these values to determine which subset of tests to run in each shard 4. Each shard produces its own test output and logs -Note: sharding is an alternative to the resmoke `--jobs` flag, which should not be used with `resmoke_suite_test`. +Note: sharding is an alternative to the resmoke `--jobs` flag, which should not be used with +`resmoke_suite_test`. ### Test Logs and Output Directory -Bazel creates a dedicated output directory for each test run under the `bazel-testlogs` symlink in your workspace root. +Bazel creates a dedicated output directory for each test run under the `bazel-testlogs` symlink in +your workspace root. For a test target `//jstests/suites/query-execution:core`, the outputs are like: @@ -78,7 +81,8 @@ bazel test //jstests/suites/query-execution:core --test_sharding_strategy=disabl #### Run with additional resmoke flags: -Any `--test_arg` in the bazel command will be propagated as a flag to resmoke.py. To modify the resmoke invocation with any of resmoke's flags, add them as `--test_arg`s. +Any `--test_arg` in the bazel command will be propagated as a flag to resmoke.py. To modify the +resmoke invocation with any of resmoke's flags, add them as `--test_arg`s. ``` # Runs all tests from the core suite with timeseries in their name, twice, with all feature flags enabled. diff --git a/bazel/toolchains/cc/mongo_wasm/README.md b/bazel/toolchains/cc/mongo_wasm/README.md index e2d3ea1f553..e93107dfa2d 100644 --- a/bazel/toolchains/cc/mongo_wasm/README.md +++ b/bazel/toolchains/cc/mongo_wasm/README.md @@ -11,7 +11,8 @@ To use the WASI SDK apply the `wasi_compatible` with a select statement: }) ``` -If your target is defined in terms of a traditional bazel C/C++ target you can use the WASI transition in order to ensure the bazel options are WASI compatible. +If your target is defined in terms of a traditional bazel C/C++ target you can use the WASI +transition in order to ensure the bazel options are WASI compatible. ```python load("//bazel/toolchains/cc/wasm/toolchain:with_wasi_config.bzl", "with_wasi_config") diff --git a/buildscripts/antithesis/test_composer/README.md b/buildscripts/antithesis/test_composer/README.md index 37a6df7844d..914b85c2711 100644 --- a/buildscripts/antithesis/test_composer/README.md +++ b/buildscripts/antithesis/test_composer/README.md @@ -17,8 +17,8 @@ For background on Antithesis, the base images, and the broader CI pipeline, see Scripts must be executable and live directly under the template directory (not in subdirectories). The prefix of the filename determines scheduling behavior. Any file that doesn't match a known -prefix — including files in subdirectories or files prefixed with `helper_` — is ignored by -Test Composer and can be used for shared logic. +prefix — including files in subdirectories or files prefixed with `helper_` — is ignored by Test +Composer and can be used for shared logic. ### Driver commands @@ -27,18 +27,18 @@ Run during fault injection periods. At least one driver or `anytime_*` command i - **`parallel_driver_`** — runs concurrently with other parallel drivers, including itself. Use for continuous client operations, parallel workloads, and availability checks under faults. -- **`singleton_driver_`** — runs as the only active driver in a history branch. - Use for porting existing integration tests or workloads that shouldn't overlap with other drivers. +- **`singleton_driver_`** — runs as the only active driver in a history branch. Use for + porting existing integration tests or workloads that shouldn't overlap with other drivers. -- **`serial_driver_`** — runs only when no other driver commands are active. - Use for validation steps and operations that require quiescence. +- **`serial_driver_`** — runs only when no other driver commands are active. Use for + validation steps and operations that require quiescence. ### Quiescent commands Run in the absence of faults. -- **`first_`** — optional one-time setup that runs once before any driver commands start. - Use for data initialization, schema setup, and bootstrapping. +- **`first_`** — optional one-time setup that runs once before any driver commands start. Use + for data initialization, schema setup, and bootstrapping. - **`eventually_`** — runs after driver commands start; halts all drivers and stops faults, creating a new history branch. Use for testing eventual consistency and post-recovery state. @@ -57,8 +57,8 @@ Run in the absence of faults. ### `basic_js_commands` Parallel JavaScript workload against a single `mongod`. All commands share retry logic defined in -[`js/commands.js`](basic_js_commands/js/commands.js) that handles transient network errors, -server selection failures, and retryable write errors. +[`js/commands.js`](basic_js_commands/js/commands.js) that handles transient network errors, server +selection failures, and retryable write errors. | Script | Function | Notes | | ------------------------------------------------ | ----------------------------- | --------------------------------------------------------------------------- | @@ -86,13 +86,13 @@ infrastructure for Test Composer. Both scripts use ## Best practices -- **Retry logic** — always handle transient network errors and server selection failures. - See [`commands.js`](basic_js_commands/js/commands.js) for a reusable retry wrapper. +- **Retry logic** — always handle transient network errors and server selection failures. See + [`commands.js`](basic_js_commands/js/commands.js) for a reusable retry wrapper. - **Randomize** — the more variation you introduce, the more state space Antithesis can explore. Antithesis controls and can reproduce the random seed, so interesting paths can be re-explored. - **Idempotency** — design scripts to tolerate being killed and restarted at any point. -- **Start simple** — begin with a `singleton_driver_*` to port an existing test, then evolve - toward parallel drivers as confidence grows. +- **Start simple** — begin with a `singleton_driver_*` to port an existing test, then evolve toward + parallel drivers as confidence grows. ## Running locally @@ -126,8 +126,8 @@ docker compose -f docker_compose//docker-compose.yml \ /opt/antithesis/test/v1/basic_js_commands/parallel_driver_mongod_aggregate.sh ``` -The `/scripts/print_connection_string.sh` helper used by each script is generated automatically -from the resmoke fixture's connection string and placed in the config image during the build step. +The `/scripts/print_connection_string.sh` helper used by each script is generated automatically from +the resmoke fixture's connection string and placed in the config image during the build step. ## Adding a new template diff --git a/buildscripts/bazel_rules_mongo/README.md b/buildscripts/bazel_rules_mongo/README.md index 55d60051138..fdfb2445d58 100644 --- a/buildscripts/bazel_rules_mongo/README.md +++ b/buildscripts/bazel_rules_mongo/README.md @@ -4,13 +4,19 @@ This directory is a bazel rule we use to ship common code between bazel repos # Using in your repo -1. Look at the latest version in [this](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml) file +1. Look at the latest version in + [this](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml) + file -2. Get the sha of the latest release at https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz.sha256 +2. Get the sha of the latest release at + https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz.sha256 -3. Get the link to the latest version at https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz +3. Get the link to the latest version at + https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz -4. Add this as a http archive to your repo and implement the dependencies listed in the [WORKSPACE](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/WORKSPACE.bazel) file. It will look something like this +4. Add this as a http archive to your repo and implement the dependencies listed in the + [WORKSPACE](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/WORKSPACE.bazel) + file. It will look something like this ``` # Poetry rules for managing Python dependencies @@ -50,7 +56,8 @@ poetry( ) ``` -5. Use the rule however you see fit! For example to add `bazel run codeowners` to your repo you can add the following to your root `BUILD.bazel` file +5. Use the rule however you see fit! For example to add `bazel run codeowners` to your repo you can + add the following to your root `BUILD.bazel` file ``` alias( @@ -61,5 +68,7 @@ alias( # Deploying -When you are ready for a new version to be released, bump the version in the [pyproject.toml](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml) file. -This will be deployed the next time the `package_bazel_rules_mongo` task runs (nightly). You can schedule this earlier in the waterfall when your pr is merged if you want it quicker. +When you are ready for a new version to be released, bump the version in the +[pyproject.toml](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml) +file. This will be deployed the next time the `package_bazel_rules_mongo` task runs (nightly). You +can schedule this earlier in the waterfall when your pr is merged if you want it quicker. diff --git a/buildscripts/cltcache/README.md b/buildscripts/cltcache/README.md index 131658c1cd8..cfc153354a7 100644 --- a/buildscripts/cltcache/README.md +++ b/buildscripts/cltcache/README.md @@ -3,4 +3,5 @@ This is cltcache.py.txt taken from CLTCACHE_URL = "https://raw.githubusercontent.com/freedick/cltcache/1.2.2/src/cltcache/cltcache.py" CLTCACHE_SHA256 = "30d9bf6d3615eab1826d5e24aea54873de034014c1e77506c9ff983e1e858b3c" -A small simple clang tidy cacher used with vscode which does not use bazel to run clang tidy. The extension is used to avoid linting and changing the file from its source. +A small simple clang tidy cacher used with vscode which does not use bazel to run clang tidy. The +extension is used to avoid linting and changing the file from its source. diff --git a/buildscripts/cost_model/README.md b/buildscripts/cost_model/README.md index 861e7f4539c..c7ab9651efb 100644 --- a/buildscripts/cost_model/README.md +++ b/buildscripts/cost_model/README.md @@ -18,7 +18,8 @@ source python3-venv/bin/activate (python3-venv) bazel build --config=opt install-devcore ``` -3. Run mongod instance (only for CBR calibration, because join_start.py manages mongod's lifecycle itself): +3. Run mongod instance (only for CBR calibration, because join_start.py manages mongod's lifecycle + itself): ```sh (python3-venv) bazel-bin/install-mongod/bin/mongod --setParameter internalMeasureQueryExecutionTimeInNanoseconds=true @@ -74,16 +75,21 @@ source cm/bin/activate ```sh (cm) python join_start.py ``` - To skip the constant calibration (warm scan, CPU, sequential I/O, random I/O) and only run the join algorithm comparison: + To skip the constant calibration (warm scan, CPU, sequential I/O, random I/O) and only run the + join algorithm comparison: ```sh (cm) python join_start.py --join-only ``` - To iterate quickly on cost model changes, reuse pre-recorded execution times from a previous full run. This skips actual query execution, only running `queryPlanner` explains to collect fresh cost estimates: + To iterate quickly on cost model changes, reuse pre-recorded execution times from a previous full + run. This skips actual query execution, only running `queryPlanner` explains to collect fresh cost + estimates: ```sh (cm) python join_start.py --execution-times join_output/join_times_in-cache.csv join_output/join_times_exceeds-cache.csv ``` -**Note:** For CBR calibration, the first time it will take a while since it has to generate the data. Afterwards, as long as you aren't modifying the collections, you can comment out `await generator.populate_collections()` in `start.py` - this will make it a lot faster. +**Note:** For CBR calibration, the first time it will take a while since it has to generate the +data. Afterwards, as long as you aren't modifying the collections, you can comment out +`await generator.populate_collections()` in `start.py` - this will make it a lot faster. 8. When done, deactivate the environment: diff --git a/buildscripts/docs/suites.md b/buildscripts/docs/suites.md index f31aba908a2..68bc7d8cdf8 100644 --- a/buildscripts/docs/suites.md +++ b/buildscripts/docs/suites.md @@ -1 +1,2 @@ -> Content moved to [buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md). +> Content moved to +> [buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md). diff --git a/buildscripts/mongo_gpg_builds/README.md b/buildscripts/mongo_gpg_builds/README.md index d160d324c3d..9b703b396ac 100644 --- a/buildscripts/mongo_gpg_builds/README.md +++ b/buildscripts/mongo_gpg_builds/README.md @@ -1,13 +1,14 @@ # mongo gpg builds -This directory contains a script to produce **portable `gpg` binaries** for all our supported linux platforms: +This directory contains a script to produce **portable `gpg` binaries** for all our supported linux +platforms: - **Linux** (`manylinux2014` glibc 2.17 baseline): `x86_64`, `aarch64`, `s390x`, `ppc64le` In particular, it builds gnupg-2.5.16 from source. -This script is used to generate the binaries that we use bring into bazel as a dependency to sign test extensions. -All artifacts are placed in the `dist/` directory. +This script is used to generate the binaries that we use bring into bazel as a dependency to sign +test extensions. All artifacts are placed in the `dist/` directory. --- @@ -61,8 +62,8 @@ ARCH=ppc64le PLATFORM=linux/ppc64le ./build_gpg_manylinux.sh ## 📜 License & Attribution -These scripts build **gpg** and its required dependencies from sources originally obtained from: -👉 and +These scripts build **gpg** and its required dependencies from sources originally obtained from: 👉 + and The exact sources can be obtained at the following URLs: diff --git a/buildscripts/mongo_rapidyaml_builds/README.md b/buildscripts/mongo_rapidyaml_builds/README.md index 7e3394e5e15..10866f2213a 100644 --- a/buildscripts/mongo_rapidyaml_builds/README.md +++ b/buildscripts/mongo_rapidyaml_builds/README.md @@ -1,12 +1,14 @@ # mongo rapidyaml wheel builds -This directory contains scripts to produce versioned `rapidyaml` wheels that can be uploaded to S3 and consumed directly instead of building from the git dependency in `pyproject.toml`. +This directory contains scripts to produce versioned `rapidyaml` wheels that can be uploaded to S3 +and consumed directly instead of building from the git dependency in `pyproject.toml`. The scripts default to the `rapidyaml` commit currently pinned in `pyproject.toml`: - `a5d485fd44719e1c03e059177fc1f695fc462b66` -They also require `RAPIDYAML_VERSION` to be set explicitly. The MongoDB fork does not currently publish git tags, so `setuptools-scm` cannot infer a stable release version on its own. +They also require `RAPIDYAML_VERSION` to be set explicitly. The MongoDB fork does not currently +publish git tags, so `setuptools-scm` cannot infer a stable release version on its own. All artifacts are written to `dist/`. @@ -47,11 +49,14 @@ RAPIDYAML_VERSION=0.9.0.post0 ARCH=ppc64le PLATFORM=linux/ppc64le ./build_rapidy ### macOS -Run the script on each target macOS architecture you want to publish. The script intentionally builds for the host arch only, which keeps wheel tags and interpreter usage straightforward. +Run the script on each target macOS architecture you want to publish. The script intentionally +builds for the host arch only, which keeps wheel tags and interpreter usage straightforward. -The script creates and uses a temporary virtualenv, so it works with Homebrew-managed Python installations that reject direct `pip install` into the system environment. +The script creates and uses a temporary virtualenv, so it works with Homebrew-managed Python +installations that reject direct `pip install` into the system environment. -It also leaves `Python.framework` external during delocation, so the wheel should be built with the same Python distribution family you expect consumers to use. +It also leaves `Python.framework` external during delocation, so the wheel should be built with the +same Python distribution family you expect consumers to use. ```bash RAPIDYAML_VERSION=0.9.0.post0 PYTHON_BIN=python3.13 ./build_rapidyaml_macos.sh @@ -67,15 +72,19 @@ $env:PYTHON_BIN = "C:\Python313\python.exe" .\build_rapidyaml_windows_x64.ps1 ``` -Note: `pyproject.toml` currently excludes `rapidyaml` on Windows, so a Windows wheel is only needed if that marker changes later. +Note: `pyproject.toml` currently excludes `rapidyaml` on Windows, so a Windows wheel is only needed +if that marker changes later. ## Build Behavior - The Linux script builds inside the appropriate `manylinux2014` image and runs `auditwheel repair`. -- The macOS script creates a temporary virtualenv, installs its build tooling there, and runs `delocate-wheel` while excluding `Python.framework` from bundling. +- The macOS script creates a temporary virtualenv, installs its build tooling there, and runs + `delocate-wheel` while excluding `Python.framework` from bundling. - The Windows script runs `delvewheel repair` after building. -- Every script clones the `mongodb-forks/rapidyaml` repo, checks out the requested ref, initializes submodules, builds a wheel, and performs a simple `import ryml` smoke test. -- Linux defaults to `cp313-cp313`, which matches the repo's current Python version. Override that when you need a wheel for a different interpreter. +- Every script clones the `mongodb-forks/rapidyaml` repo, checks out the requested ref, initializes + submodules, builds a wheel, and performs a simple `import ryml` smoke test. +- Linux defaults to `cp313-cp313`, which matches the repo's current Python version. Override that + when you need a wheel for a different interpreter. ## Environment Variables @@ -94,7 +103,8 @@ Note: `pyproject.toml` currently excludes `rapidyaml` on Windows, so a Windows w ## Consuming the Wheels -Once the wheels are uploaded, you can replace the current git dependency in `pyproject.toml` with URL-based entries scoped by platform markers. +Once the wheels are uploaded, you can replace the current git dependency in `pyproject.toml` with +URL-based entries scoped by platform markers. For example: diff --git a/buildscripts/mongo_rg_builds/README.md b/buildscripts/mongo_rg_builds/README.md index df51edad5ec..1e90cce9d4a 100644 --- a/buildscripts/mongo_rg_builds/README.md +++ b/buildscripts/mongo_rg_builds/README.md @@ -1,12 +1,14 @@ # mongo ripgrep builds -This directory contains scripts to produce **portable, high-performance `ripgrep` binaries** for all major platforms: +This directory contains scripts to produce **portable, high-performance `ripgrep` binaries** for all +major platforms: - **Linux** (`manylinux2014` glibc 2.17 baseline): `x86_64`, `aarch64`, `s390x`, `ppc64le` - **macOS** universal2 (`x86_64` + `arm64`) - **Windows** x86_64 (MSVC) -Each build uses **bundled static PCRE2**, **LTO**, and conservative CPU baselines to maximize portability. +Each build uses **bundled static PCRE2**, **LTO**, and conservative CPU baselines to maximize +portability. All artifacts are placed in the `dist/` directory. --- diff --git a/buildscripts/monitor_build_status/README.md b/buildscripts/monitor_build_status/README.md index 3fd75a33b6a..34379844fee 100644 --- a/buildscripts/monitor_build_status/README.md +++ b/buildscripts/monitor_build_status/README.md @@ -1,54 +1,79 @@ # Block-on-Red -> **TL;DR:** During times of high BF volume, code approvals and merging in 10gen/mongo master will be restricted to only allow changes that help reduce BFs, Bugs, Performance Regressions, and paying down technical debt. +> **TL;DR:** During times of high BF volume, code approvals and merging in 10gen/mongo master will +> be restricted to only allow changes that help reduce BFs, Bugs, Performance Regressions, and +> paying down technical debt. ### Motivation -The master branch should remain stable to develop the Server efficiently, and to be within 30 days of releasing at all times. If it becomes too unstable, or "too red," we want to aggressively focus on getting it back into the green. As a side benefit to releasability, a "greener" build should make patch build failures more meaningful. This will also reduce release time stress by having the release time period look and feel more like normal business. +The master branch should remain stable to develop the Server efficiently, and to be within 30 days +of releasing at all times. If it becomes too unstable, or "too red," we want to aggressively focus +on getting it back into the green. As a side benefit to releasability, a "greener" build should make +patch build failures more meaningful. This will also reduce release time stress by having the +release time period look and feel more like normal business. ### Strategy -Each team carries a quota (see below for details). When a team exceeds their quota - they enter a "code lockdown". +Each team carries a quota (see below for details). When a team exceeds their quota - they enter a +"code lockdown". -- **Team Level**: The intention here is to stop work with a small blast radius in the first instance, and address the releasability risk from that team and their owned code. -- **VP Level**: We roll the quotas up to a VP’s entire organization as the next step of "code lockdown". The expectation is that redirecting resources within a VP’s organization to help address BFs is likely more effective and less disruptive than a global freeze. -- **Global Level**: Finally, if the global quota is exceeded, the entire server organization enters a "code lockdown" until we meet the threshold for unfreezing. +- **Team Level**: The intention here is to stop work with a small blast radius in the first + instance, and address the releasability risk from that team and their owned code. +- **VP Level**: We roll the quotas up to a VP’s entire organization as the next step of "code + lockdown". The expectation is that redirecting resources within a VP’s organization to help + address BFs is likely more effective and less disruptive than a global freeze. +- **Global Level**: Finally, if the global quota is exceeded, the entire server organization enters + a "code lockdown" until we meet the threshold for unfreezing. ## Impact of a "Code Lockdown" ### Allowed Code Changes -During a "code lockdown," Code Owners are expected to only approve **work that closes BFs or helps us reduce/avoid the _next_ Blocking state**. i.e. aimed at fixing a BF, a class of BFs, bugs, performance regression, etc. +During a "code lockdown," Code Owners are expected to only approve **work that closes BFs or helps +us reduce/avoid the _next_ Blocking state**. i.e. aimed at fixing a BF, a class of BFs, bugs, +performance regression, etc. -If your PR does not meet this criteria, it may be pending for some time until the system becomes unblocked. There are of course reasonable exceptions, below. +If your PR does not meet this criteria, it may be pending for some time until the system becomes +unblocked. There are of course reasonable exceptions, below. ### Feature Work -**All feature work stops** during a "code lockdown." -In exceptional circumstances VPs can approve exceptions. +**All feature work stops** during a "code lockdown." In exceptional circumstances VPs can approve +exceptions. ### Non-feature Work -We understand that in many cases addressing the larger BF problem requires refactoring, modularity improvements, changes to our test and paying down other kinds of **technical debt**. During a "code lockdown" this work is **expressly permitted and mergeable** - with the guidance that teams index heavily on risk when deciding what to work on. If a piece of work feels like it makes the BF problem worse before it gets better, talk to your director about how to proceed. +We understand that in many cases addressing the larger BF problem requires refactoring, modularity +improvements, changes to our test and paying down other kinds of **technical debt**. During a "code +lockdown" this work is **expressly permitted and mergeable** - with the guidance that teams index +heavily on risk when deciding what to work on. If a piece of work feels like it makes the BF problem +worse before it gets better, talk to your director about how to proceed. Allowable Examples (not exclusive): - Refactoring components to make them more unit testable - Increasing code coverage through high quality tests that block PRs - Making the development loop faster (decreasing build times, fixing slow tests, etc) -- Improving guardrails that improve code quality (fixing clang-tidy warnings, compiler warnings, etc) +- Improving guardrails that improve code quality (fixing clang-tidy warnings, compiler warnings, + etc) -If a team is in a lockdown, but the rest of the org is not - their focus should likely skew towards work that expedites their lockdown exit. +If a team is in a lockdown, but the rest of the org is not - their focus should likely skew towards +work that expedites their lockdown exit. -If the org is in a lockdown, but a team doesn’t have BFs to work on - they should balance helping other teams with the work they’ve identified as addressing the underlying BF problem. +If the org is in a lockdown, but a team doesn’t have BFs to work on - they should balance helping +other teams with the work they’ve identified as addressing the underlying BF problem. -The higher the risk of the work, the more involvement the Staff+ engineers and the Director/VP should have in the decision about what is ok to merge and what isn’t. +The higher the risk of the work, the more involvement the Staff+ engineers and the Director/VP +should have in the decision about what is ok to merge and what isn’t. ### Code Owner Responsibilities -Code Owners should join the `#10gen-mongo-code-lockdown` Slack channel to receive daily updates on the status of the build. It produces daily metrics with instructions if there is a state change. +Code Owners should join the `#10gen-mongo-code-lockdown` Slack channel to receive daily updates on +the status of the build. It produces daily metrics with instructions if there is a state change. -If we change to a blocking state, code owners should use their discretion to only approve changes that are allowed (see above). If we exit the blocking state, code owners should approve PRs as usual. +If we change to a blocking state, code owners should use their discretion to only approve changes +that are allowed (see above). If we exit the blocking state, code owners should approve PRs as +usual. ## Quotas and State-Changes @@ -74,21 +99,31 @@ This shows relevant JIRA queries for a more live and interactive view of the sta ### BFs remaining open only on older branches -Some teams may fix a BF in master, but are "waiting for fix" on older branches, which keeps the BF counted against the thresholds. Guidance here is currently evolving. +Some teams may fix a BF in master, but are "waiting for fix" on older branches, which keeps the BF +counted against the thresholds. Guidance here is currently evolving. -If the build failure is not frequently occurring, it can be marked as P5-Trivial, and it won’t count towards your team’s build failures for the block merge. +If the build failure is not frequently occurring, it can be marked as P5-Trivial, and it won’t count +towards your team’s build failures for the block merge. -As we iterate on our processes for this, the `exclude-from-master-quota` label can be used to exclude BFs that should not be included in these quotas. The expectation is that this is an interim solution as we improve our processes especially around BFs that remain open pending backports. +As we iterate on our processes for this, the `exclude-from-master-quota` label can be used to +exclude BFs that should not be included in these quotas. The expectation is that this is an interim +solution as we improve our processes especially around BFs that remain open pending backports. Specifically: -- If a BF is only waiting for a backport on a branch older than master, apply the `exclude-from-master-quota` label to the ticket. -- If a BF is failing on master, not a serious bug (or a test-only issue that can't affect the real clients), not noisy, and we are choosing not to fix it, set the Priority to `P5 - Trivial` and apply the `keep-trivial` label. -- If a BF is failing on an older branch and we are choosing not to backport a fix, set the `Priority to P5 - Trivial` and apply the `keep-trivial-X.Y` label appropriately. +- If a BF is only waiting for a backport on a branch older than master, apply the + `exclude-from-master-quota` label to the ticket. +- If a BF is failing on master, not a serious bug (or a test-only issue that can't affect the real + clients), not noisy, and we are choosing not to fix it, set the Priority to `P5 - Trivial` and + apply the `keep-trivial` label. +- If a BF is failing on an older branch and we are choosing not to backport a fix, set the + `Priority to P5 - Trivial` and apply the `keep-trivial-X.Y` label appropriately. ## Contributing -For any new proposals, changes to thresholds, or concerns regarding their application, please escalate to your Director/VP. **We want advocacy from all levels to make this a successful change to our engineering culture.** +For any new proposals, changes to thresholds, or concerns regarding their application, please +escalate to your Director/VP. **We want advocacy from all levels to make this a successful change to +our engineering culture.** ### CLI @@ -100,7 +135,9 @@ python buildscripts/monitor_build_status/cli.py --help ### Testing locally -For Jira API authentication, use the `JIRA_AUTH_PAT` env variable. More about Jira Personal Access Tokens (PATs) can be found [here](https://wiki.corp.mongodb.com/pages/viewpage.action?pageId=218995581). +For Jira API authentication, use the `JIRA_AUTH_PAT` env variable. More about Jira Personal Access +Tokens (PATs) can be found +[here](https://wiki.corp.mongodb.com/pages/viewpage.action?pageId=218995581). Use your PAT to run the following and output its results: @@ -112,4 +149,6 @@ The above will _not_ send notifications to the Slack channel. ### Slack Notifications -Slack notifications use a webhook from the Devprod Correctness Slack app (rather than user credentials) for security. The webhook URL is read from the `mongo-code-lockdown-webhook` Evergreen expansion, which points to the `#10gen-mongo-code-lockdown` Slack channel. +Slack notifications use a webhook from the Devprod Correctness Slack app (rather than user +credentials) for security. The webhook URL is read from the `mongo-code-lockdown-webhook` Evergreen +expansion, which points to the `#10gen-mongo-code-lockdown` Slack channel. diff --git a/buildscripts/resmokeconfig/matrix_suites/README.md b/buildscripts/resmokeconfig/matrix_suites/README.md index 1a8a4a702d4..304ee57195c 100644 --- a/buildscripts/resmokeconfig/matrix_suites/README.md +++ b/buildscripts/resmokeconfig/matrix_suites/README.md @@ -3,27 +3,24 @@ ## Summary Matrix Suites are defined as a combination of explict -[suite files](../../../buildscripts/resmokeconfig/suites/README.md) -and a set of "overrides" for specific keys. The intention is -to avoid duplication of suite definitions as much as -possible with the eventual goal of having most suites be -fully composed of reusable sections. +[suite files](../../../buildscripts/resmokeconfig/suites/README.md) and a set of "overrides" for +specific keys. The intention is to avoid duplication of suite definitions as much as possible with +the eventual goal of having most suites be fully composed of reusable sections. ## Usage -Matrix suites behave like regular suites for all functionality in resmoke.py, -including `list-suites`, `find-suites` and `run --suites=[SUITE]`. +Matrix suites behave like regular suites for all functionality in resmoke.py, including +`list-suites`, `find-suites` and `run --suites=[SUITE]`. ## Writing a matrix suite mapping file. -Matrix suites consist of a mapping, and a set of overrides in -their eponymous directories. When you are done writing the mapping file, you must +Matrix suites consist of a mapping, and a set of overrides in their eponymous directories. When you +are done writing the mapping file, you must [generate the matrix suite file.](#generating-matrix-suites) -The "mappings" directory contains YAML files that each contain a suite definition. -Each suite definition includes `base_suite`, and a list of -modifiers. There is also an optional `description` field that will get output -with the local resmoke invocation. +The "mappings" directory contains YAML files that each contain a suite definition. Each suite +definition includes `base_suite`, and a list of modifiers. There is also an optional `description` +field that will get output with the local resmoke invocation. The fields of modifiers are the following: @@ -33,30 +30,29 @@ The fields of modifiers are the following: 4. extends Each modifier field is a dot-delimited-notation representing the file and field of the modification. -All modifier fields must be in a yaml file in the `overrides` directory -For example `encryption.mongodfixture_ese` would reference the `mongodfixture_ese` field -inside of the `encryption.yml` file inside of the `overrides` directory. +All modifier fields must be in a yaml file in the `overrides` directory For example +`encryption.mongodfixture_ese` would reference the `mongodfixture_ese` field inside of the +`encryption.yml` file inside of the `overrides` directory. ### overrides All fields referenced in the `overrides` section of the mappings file will overwrite the specified -fields in the `base_suite`. -The `overrides` modifier takes precidence over the `excludes` and `eval` modifiers. -The `overrides` list will be processed in order so order can matter if multiple override modifiers -try to overwrite the same field in the base_suite. +fields in the `base_suite`. The `overrides` modifier takes precidence over the `excludes` and `eval` +modifiers. The `overrides` list will be processed in order so order can matter if multiple override +modifiers try to overwrite the same field in the base_suite. ### excludes All fields referenced in the `excludes` section of the mappings file will append to the specified -`exclude` fields in the base suite. -The only two valid options in the referenced modifier field are `exclude_with_any_tags` and -`exclude_files`. They are appended in the order they are specified in the mappings file. +`exclude` fields in the base suite. The only two valid options in the referenced modifier field are +`exclude_with_any_tags` and `exclude_files`. They are appended in the order they are specified in +the mappings file. ### eval All fields referenced in the `eval` section of the mappings file will append to the specified -`config.shell_options.eval` field in the base suite. -They are appended in the order they are specified in the mappings file. +`config.shell_options.eval` field in the base suite. They are appended in the order they are +specified in the mappings file. ### extends @@ -69,9 +65,8 @@ modifiers), the key being extended must already exist and also be a list. The generated matrix suites live in the `buildscripts/resmokeconfig/matrix_suites/generated_suites` directory. These files may be edited for local testing but must remain consistent with the mapping files. There is a task in the commit queue that enforces this. To generate a new version of these -matrix suites, you may run -`buildscripts/resmoke.py generate-matrix-suites`. This command -will overwrite the current generated matrix suites on disk so make sure you do not have any unsaved +matrix suites, you may run `buildscripts/resmoke.py generate-matrix-suites`. This command will +overwrite the current generated matrix suites on disk so make sure you do not have any unsaved changes to these files. ## Validating matrix suites @@ -82,5 +77,4 @@ ensures that the files are validated. ## FAQ -For questions about the user or authorship experience, -please reach out in #server-testing. +For questions about the user or authorship experience, please reach out in #server-testing. diff --git a/buildscripts/resmokeconfig/suites/README.md b/buildscripts/resmokeconfig/suites/README.md index 2cf611a82db..271dea34341 100644 --- a/buildscripts/resmokeconfig/suites/README.md +++ b/buildscripts/resmokeconfig/suites/README.md @@ -2,7 +2,8 @@ Test "suites" are configuration files that group which tests to run, and how. -Yaml files enumerate the test files that the suite encompasses, as well as any test fixtures and their configurations to leverage, options for the shell, hooks, and more. +Yaml files enumerate the test files that the suite encompasses, as well as any test fixtures and +their configurations to leverage, options for the shell, hooks, and more. ## Minimal Example @@ -64,7 +65,8 @@ Example: test_kind: js_test ``` -See all supported kinds in [`buildscripts/resmokelib/testing/testcases`](../../../buildscripts/resmokelib/testing/testcases/README.md). +See all supported kinds in +[`buildscripts/resmokelib/testing/testcases`](../../../buildscripts/resmokelib/testing/testcases/README.md). ## `selector` @@ -89,25 +91,34 @@ File path(s) of test files to include. If a path without a glob is provided, it ### `selector.root` -A file containing glob patterns, one per line, typically used by test_kind cpp_unit_test (usually build/unittests.txt). Specifies which tests to consider for including into the suite. If no other options are specified, these are the tests that will be run. Glob patterns are supported (and common) here. +A file containing glob patterns, one per line, typically used by test_kind cpp_unit_test (usually +build/unittests.txt). Specifies which tests to consider for including into the suite. If no other +options are specified, these are the tests that will be run. Glob patterns are supported (and +common) here. ### `selector.include_files` -A list of strings representing glob patterns. Includes only this subset of tests in the suite. These files will be included even if they would otherwise be excluded by tags. Will error if a test specified here was not included in the roots. +A list of strings representing glob patterns. Includes only this subset of tests in the suite. These +files will be included even if they would otherwise be excluded by tags. Will error if a test +specified here was not included in the roots. ### `selector.exclude_files` -A list of strings representing glob patterns. Excludes this list of tests from the suite. These files will be excluded even if they would otherwise be included by tags. Will error if a test specified here was not included in the roots. +A list of strings representing glob patterns. Excludes this list of tests from the suite. These +files will be excluded even if they would otherwise be included by tags. Will error if a test +specified here was not included in the roots. ### `selector.include_with_any_tags` -A list of strings. Only jstests which define a list of tags which includes any of these tags will be included in the suite, unless otherwise excluded by filename. +A list of strings. Only jstests which define a list of tags which includes any of these tags will be +included in the suite, unless otherwise excluded by filename. To see all tags referenced across suites, run `./buildscripts/resmoke.py list-tags`. ### `selector.exclude_with_any_tags` -A list of strings. Any jstest which defines a list of tags which includes any of these tags will be excluded from the suite, unless otherwise included by filename. +A list of strings. Any jstest which defines a list of tags which includes any of these tags will be +excluded from the suite, unless otherwise included by filename. To see all tags referenced across suites, run `./buildscripts/resmoke.py list-tags`. @@ -118,9 +129,8 @@ Defines how the tests will be executed. ### `executor.config` This section contains additional configuration for each test. The structure of this can vary -significantly based on the `test_kind`. For specific information, you can look at the -implementation of the `test_kind` of concern in the `buildscripts/resmokelib/testing/testcases` -directory. +significantly based on the `test_kind`. For specific information, you can look at the implementation +of the `test_kind` of concern in the `buildscripts/resmokelib/testing/testcases` directory. Example: @@ -147,7 +157,9 @@ Any parameters (besides `global_vars`) will directly be passed to the mongo shel ##### `executor.config.shell_options.global_vars` -Will use this as the base for the string passed to `--eval`. Anything specified in `shell_options.eval` will be appended after these. Formats any objects so that they will evaluate properly as a string. +Will use this as the base for the string passed to `--eval`. Anything specified in +`shell_options.eval` will be appended after these. Formats any objects so that they will evaluate +properly as a string. `global_vars` allows for setting global variables. A `TestData` object is a special global variable that is used to hold testing data. Parts of `TestData` can be updated via `resmoke` command-line @@ -156,8 +168,8 @@ intelligently and made available to the `js_test` running. Behavior can vary on in general this is the order of precedence: (1) resmoke command-line (2) [suite].yml (3) runtime/default. -The mongo shell can also be invoked with flags & -named arguments. Flags must have the `''` value, such as in the case for `nodb` above. +The mongo shell can also be invoked with flags & named arguments. Flags must have the `''` value, +such as in the case for `nodb` above. `eval` can also be used to run generic javascript code in the shell. You can directly include javascript code, or you can put it in a separate script & `load` it. @@ -166,11 +178,12 @@ javascript code, or you can put it in a separate script & `load` it. Specify hooks to run before, after, and between individual tests to execute specified logic. -> Read more about hooks in [buildscripts/resmokelib/testing/hooks/README.md](../../../buildscripts/resmokelib/testing/hooks/README.md) +> Read more about hooks in +> [buildscripts/resmokelib/testing/hooks/README.md](../../../buildscripts/resmokelib/testing/hooks/README.md) -The hook name in the `.yml` must match its Python class name of the hook. Parameters can also be included in the `.yml` -and will be passed to the hook's constructor (the `hook_logger` & `fixture` parameters are -automatically included, so those should not be included in the `.yml`). +The hook name in the `.yml` must match its Python class name of the hook. Parameters can also be +included in the `.yml` and will be passed to the hook's constructor (the `hook_logger` & `fixture` +parameters are automatically included, so those should not be included in the `.yml`). Example: @@ -190,9 +203,11 @@ hooks: Specify a test fixture to run around the tests. -> Read more about fixtures in [buildscripts/resmokelib/testing/fixtures/README.md](../../../buildscripts/resmokelib/testing/fixtures/README.md). +> Read more about fixtures in +> [buildscripts/resmokelib/testing/fixtures/README.md](../../../buildscripts/resmokelib/testing/fixtures/README.md). -The `class` sub-field corresponds to the Python class name of a fixture. All other sub-fields are passed into the constructor of the fixture. These sub-fields will vary based on the fixture used. +The `class` sub-field corresponds to the Python class name of a fixture. All other sub-fields are +passed into the constructor of the fixture. These sub-fields will vary based on the fixture used. Example: @@ -238,4 +253,5 @@ Read more about [hooks](../../../buildscripts/resmokelib/testing/hooks/README.md #### `executor.archive.tests` -Specify a list of test files to archive on failure. Wildcard selection a valid. Set to `true` to archive _all_ tests. +Specify a list of test files to archive on failure. Wildcard selection a valid. Set to `true` to +archive _all_ tests. diff --git a/buildscripts/resmokelib/README.md b/buildscripts/resmokelib/README.md index dba16d1f244..e0e55a56ebb 100644 --- a/buildscripts/resmokelib/README.md +++ b/buildscripts/resmokelib/README.md @@ -2,11 +2,13 @@ Resmoke is MongoDB's integration test runner. -The JS Tests it can run live in the `jstests/` directory - reference its [README](../../jstests/README.md) to learn about their content. +The JS Tests it can run live in the `jstests/` directory - reference its +[README](../../jstests/README.md) to learn about their content. ## Build -Though the source is built with bazel, resmoke is not yet integrated. This means that the source has to be built prior to using resmoke, eg: +Though the source is built with bazel, resmoke is not yet integrated. This means that the source has +to be built prior to using resmoke, eg: ``` bazel build install-dist-test @@ -41,11 +43,13 @@ bazel build install-dist-test Generate a mongod.conf and mongos.conf using config fuzzer. ``` -Note: `bisect`, `setup-multiversion`, and `symbolize` commands have been moved to [`db-contrib-tool`](https://github.com/10gen/db-contrib-tool#readme). +Note: `bisect`, `setup-multiversion`, and `symbolize` commands have been moved to +[`db-contrib-tool`](https://github.com/10gen/db-contrib-tool#readme). ## Suites -Many of the above commands use the concept of a "suite". Loosely, suites group which tests run, and how. +Many of the above commands use the concept of a "suite". Loosely, suites group which tests run, and +how. Read more about suites [here](../../buildscripts/resmokeconfig/suites/README.md). @@ -59,43 +63,47 @@ The most typical approach is to run a particular JS test file given a suite, eg: buildscripts/resmoke.py run --suites=no_passthrough jstests/noPassthrough/shell/js/string.js ``` -That executes the content of that file, using the suite configuration as a fixture setup. The suite "no_passthrough" is associated with the file [buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml). +That executes the content of that file, using the suite configuration as a fixture setup. The suite +"no_passthrough" is associated with the file +[buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml). -Run has **100+ flags**! Use `resmoke run --help` to inspect them. To avoid risk of multiple sources of truth that can drift and become stale, **we do not attempt to document them all here** - they should each be self-descriptive and documented within the CLI help. +Run has **100+ flags**! Use `resmoke run --help` to inspect them. To avoid risk of multiple sources +of truth that can drift and become stale, **we do not attempt to document them all here** - they +should each be self-descriptive and documented within the CLI help. Below are very high-level descriptions for high-usage flags. ### Suites (`--suites`) -The run subcommand can run suites (list of tests and the MongoDB topology and -configuration to run them against), and explicitly named test files. +The run subcommand can run suites (list of tests and the MongoDB topology and configuration to run +them against), and explicitly named test files. -A single suite can be specified using the `--suite` flag, and multiple suites -can be specified by providing a comma separated list to the `--suites` flag. +A single suite can be specified using the `--suite` flag, and multiple suites can be specified by +providing a comma separated list to the `--suites` flag. Additional documentation on our suite configuration can be found in [buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md). ### Testable Installations (`--installDir`) -resmoke can run tests against any testable installation of MongoDB (such -as ASAN, Debug, Release). When possible, resmoke will automatically locate and -run with a locally built copy of MongoDB Server, so long as that build was -installed to a subdirectory of the root of the git repository, and there is -exactly one build. In other situations, the `--installDir` flag, passed to run -subcommand, can be used to indicate the location of the mongod/mongos binaries. +resmoke can run tests against any testable installation of MongoDB (such as ASAN, Debug, Release). +When possible, resmoke will automatically locate and run with a locally built copy of MongoDB +Server, so long as that build was installed to a subdirectory of the root of the git repository, and +there is exactly one build. In other situations, the `--installDir` flag, passed to run subcommand, +can be used to indicate the location of the mongod/mongos binaries. -As an alternative, you may instead prefer to use the resmoke.py wrapper script -located in the same directory as the mongod binary, which will automatically -set `installDir` for you. +As an alternative, you may instead prefer to use the resmoke.py wrapper script located in the same +directory as the mongod binary, which will automatically set `installDir` for you. -Note that this wrapper is unavailable in packaged installations of MongoDB -Server, such as those provided by Homebrew, and other package managers. If you -would like to run tests against a packaged installation, you must explicitly -pass `--installDir` to resmoke.py +Note that this wrapper is unavailable in packaged installations of MongoDB Server, such as those +provided by Homebrew, and other package managers. If you would like to run tests against a packaged +installation, you must explicitly pass `--installDir` to resmoke.py ### Resmoke test telemetry We capture telemetry from resmoke using open telemetry. -Using open telemetry (OTel) we capture more specific information about the internals of resmoke. This data is used for improvements specifically when running in evergreen. This data is captured on every resmoke invocation but only sent to honeycomb when running in evergreen. More info about how we use OTel in resmoke can be found [here](otel_resmoke.md). +Using open telemetry (OTel) we capture more specific information about the internals of resmoke. +This data is used for improvements specifically when running in evergreen. This data is captured on +every resmoke invocation but only sent to honeycomb when running in evergreen. More info about how +we use OTel in resmoke can be found [here](otel_resmoke.md). diff --git a/buildscripts/resmokelib/extensions/README.md b/buildscripts/resmokelib/extensions/README.md index 0b94ee3ba47..5a5e87731d3 100644 --- a/buildscripts/resmokelib/extensions/README.md +++ b/buildscripts/resmokelib/extensions/README.md @@ -1,10 +1,12 @@ # Extensions -This module provides utilities for setting up and configuring MongoDB extensions in resmoke test suites. +This module provides utilities for setting up and configuring MongoDB extensions in resmoke test +suites. ## Overview -Extensions are dynamically loaded shared objects (`.so` files) that provide additional functionality to MongoDB. The utilities in this folder can handle: +Extensions are dynamically loaded shared objects (`.so` files) that provide additional functionality +to MongoDB. The utilities in this folder can handle: 1. Discovering extension `.so` files in build directories 2. Generating `.conf` configuration files for extensions @@ -12,7 +14,8 @@ Extensions are dynamically loaded shared objects (`.so` files) that provide addi ## Configuration File Generation in Tests -Extension `.conf` files are YAML configuration files that tell the server how to load an extension. They contain: +Extension `.conf` files are YAML configuration files that tell the server how to load an extension. +They contain: - `sharedLibraryPath`: Path to the `.so` file - `extensionOptions`: Optional configuration parameters for the extension @@ -30,9 +33,11 @@ extensionOptions: The `generate_extension_configs.py` module creates `.conf` files: -1. Receives a list of `.so` file paths (either from automatic discovery via `find_and_generate_extension_configs.py`, or manually via `--so-files` command-line argument) +1. Receives a list of `.so` file paths (either from automatic discovery via + `find_and_generate_extension_configs.py`, or manually via `--so-files` command-line argument) 2. For each `.so`, creates a `.conf` file in the temp directory (`/tmp/mongo/extensions/`) -3. Looks up corresponding extension options from `src/mongo/db/extension/test_examples/configurations.yml`, if any are specified +3. Looks up corresponding extension options from + `src/mongo/db/extension/test_examples/configurations.yml`, if any are specified 4. Writes the config file with `sharedLibraryPath` and any `extensionOptions` ### Automatic Discovery and Generation diff --git a/buildscripts/resmokelib/generate_fuzz_config/README.md b/buildscripts/resmokelib/generate_fuzz_config/README.md index f1c600bbce1..7177d171b80 100644 --- a/buildscripts/resmokelib/generate_fuzz_config/README.md +++ b/buildscripts/resmokelib/generate_fuzz_config/README.md @@ -2,7 +2,10 @@ This is a testing feature of the mongod and mongos, built into resmoke.py! -The config fuzzer is a resmoke feature that randomizes various server parameters of both mongod and mongos on startup. These fuzzed parameters should not affect the correctness of any tests. Therefore, the config fuzzer can be enabled for any test or suite run with resmoke to ensure the database is resilient to abnormal server configurations. +The config fuzzer is a resmoke feature that randomizes various server parameters of both mongod and +mongos on startup. These fuzzed parameters should not affect the correctness of any tests. +Therefore, the config fuzzer can be enabled for any test or suite run with resmoke to ensure the +database is resilient to abnormal server configurations. More information can be displayed in the resmoke --help output: @@ -25,15 +28,22 @@ The bulk of the fuzzing logic is in [mongo_fuzzer_configs.py](./mongo_fuzzer_con ## How does it work? -The config fuzzer assigns random values to various tunable parameters. Server parameters and their ranges are specified manually by developers and are not discovered automatically in any way. +The config fuzzer assigns random values to various tunable parameters. Server parameters and their +ranges are specified manually by developers and are not discovered automatically in any way. -When the above resmoke flags are used, the [plugin](./plugin.py) implicitly enables the [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) hook for testing. +When the above resmoke flags are used, the [plugin](./plugin.py) implicitly enables the +[FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) +hook for testing. ## Where and When does it run on evergreen? -The config fuzzer is represented as a handful of evergreen tasks with "_config_fuzzer_" in the name. Search "config_fuzzer" in the [etc/](../../../etc) directory to find all the evergreen tasks. +The config fuzzer is represented as a handful of evergreen tasks with "_config_fuzzer_" in the name. +Search "config_fuzzer" in the [etc/](../../../etc) directory to find all the evergreen tasks. -Arguably the simplest evergreen task, `config_fuzzer_jsCore`, runs the "core" (i.e. `jstests/core`) resmoke suite with the config fuzzer parameters to resmoke set, and excludes some incompatible tests ([src link](https://github.com/mongodb/mongo/blob/a2e7e83a135c3096de7f360b88de1b3cdc1caaf2/etc/evergreen_yml_components/tasks/resmoke/server_divisions/durable_transactions_and_availability/tasks.yml#L1956-L1975)). Here is a sampling of some of the task names: +Arguably the simplest evergreen task, `config_fuzzer_jsCore`, runs the "core" (i.e. `jstests/core`) +resmoke suite with the config fuzzer parameters to resmoke set, and excludes some incompatible tests +([src link](https://github.com/mongodb/mongo/blob/a2e7e83a135c3096de7f360b88de1b3cdc1caaf2/etc/evergreen_yml_components/tasks/resmoke/server_divisions/durable_transactions_and_availability/tasks.yml#L1956-L1975)). +Here is a sampling of some of the task names: - `config_fuzzer_concurrency_replication` - `config_fuzzer_concurrency_sharded_replication` @@ -41,7 +51,10 @@ Arguably the simplest evergreen task, `config_fuzzer_jsCore`, runs the "core" (i ## Reproducing a config fuzzer failure -In the Evergreen task view, click on the Logs tab, then Task Logs, and open in Parsely. Search for "Fuzzed" ([source link](https://github.com/mongodb/mongo/blob/ca1c935aca43ca2e028507e2a878d4e12f50355b/buildscripts/resmokelib/run/__init__.py#L352-L366)). The output will look similar to this: +In the Evergreen task view, click on the Logs tab, then Task Logs, and open in Parsely. Search for +"Fuzzed" +([source link](https://github.com/mongodb/mongo/blob/ca1c935aca43ca2e028507e2a878d4e12f50355b/buildscripts/resmokelib/run/__init__.py#L352-L366)). +The output will look similar to this:
Logs @@ -112,13 +125,22 @@ In the Evergreen task view, click on the Logs tab, then Task Logs, and open in P
-The log line starting with "resmoke.py invocation for local usage" and the one with "configFuzzSeed" provide an option `--configFuzzSeed=5583430894313922699` that can be used to generate the same fuzzed server parameters locally in resmoke. +The log line starting with "resmoke.py invocation for local usage" and the one with "configFuzzSeed" +provide an option `--configFuzzSeed=5583430894313922699` that can be used to generate the same +fuzzed server parameters locally in resmoke. ## Running the config fuzzer locally -Before running the Resmoke config fuzzer command, you need to obtain the necessary binaries. You can download them from the "Files" section of the `archive_dist_test` task in Evergreen (e.g., binaries from the `amazon2-arm64-compile` variant). Alternatively, if you don't require those specific binaries, you can use `db-contrib-tool` to download the binaries (e.g., by running `bazel run db-contrib-tool -- setup-repro-env master`). +Before running the Resmoke config fuzzer command, you need to obtain the necessary binaries. You can +download them from the "Files" section of the `archive_dist_test` task in Evergreen (e.g., binaries +from the `amazon2-arm64-compile` variant). Alternatively, if you don't require those specific +binaries, you can use `db-contrib-tool` to download the binaries (e.g., by running +`bazel run db-contrib-tool -- setup-repro-env master`). -To re-run a command locally that failed through the config fuzzer, you can navigate to the specific test that failed, and under files you can find a name titled "Resmoke.py Invocation for Local Usage". If you are replicating an older config fuzzer invocation, remove the command line argument "`--installDir=dist-test/bin`". A simple example command is shown below: +To re-run a command locally that failed through the config fuzzer, you can navigate to the specific +test that failed, and under files you can find a name titled "Resmoke.py Invocation for Local +Usage". If you are replicating an older config fuzzer invocation, remove the command line argument +"`--installDir=dist-test/bin`". A simple example command is shown below: ``` buildscripts/resmoke.py run jstests/noPassthrough/bulk_write_w0.js \ @@ -127,7 +149,12 @@ buildscripts/resmoke.py run jstests/noPassthrough/bulk_write_w0.js \ --configFuzzSeed=7956511060361033919 ``` -It is easiest to pipe the output to another text file and then to analyze the output through there. The format of the file is slightly different, as you will not be able to explicitly look up Fuzzed, but you can look up one of the fuzzed config parameters to find the list of fuzzed config parameter settings. A subset of a log from running the above command on [this version](https://github.com/mongodb/mongo/commit/856e4ecd8612b19c8ba281cf23450d74b5838650) of master yields is the following: +It is easiest to pipe the output to another text file and then to analyze the output through there. +The format of the file is slightly different, as you will not be able to explicitly look up Fuzzed, +but you can look up one of the fuzzed config parameters to find the list of fuzzed config parameter +settings. A subset of a log from running the above command on +[this version](https://github.com/mongodb/mongo/commit/856e4ecd8612b19c8ba281cf23450d74b5838650) of +master yields is the following: ``` js_test:bulk_write_w0] Skip waiting to connect to node with pid=2522712, port=20040 @@ -140,7 +167,8 @@ js_test:bulk_write_w0] Skip waiting to connect to node with pid=2522712, port=20 ## Adding a new parameter to be fuzzed to the config fuzzer -There are two broad categories of parameters in the config fuzzer, that each have two sub-categories of parameters: +There are two broad categories of parameters in the config fuzzer, that each have two sub-categories +of parameters: 1. mongo parameters - mongod parameters @@ -151,25 +179,43 @@ There are two broad categories of parameters in the config fuzzer, that each hav ### Adding new mongo parameters -Mongo parameters and their properties (e.g. min, max, default) are stored in [config_fuzzer_limits.py](./config_fuzzer_limits.py). +Mongo parameters and their properties (e.g. min, max, default) are stored in +[config_fuzzer_limits.py](./config_fuzzer_limits.py). -Below is a list of ways to fuzz configs which are supported without having to also change [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). -Please ensure that you add it correctly to the `mongod` or `mongos` subdictionary. +Below is a list of ways to fuzz configs which are supported without having to also change +[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). Please ensure that you add it correctly to the +`mongod` or `mongos` subdictionary. -You need to specify if your parameter should be fuzzed at runtime, startup, or both by declaring the `fuzz_at` key for the parameter. The `fuzz_at` key should be a list that can contain the values `startup`, `runtime`, or both. The eligible values are specified in the `set_at` keys of the corresponding `.idl` files. +You need to specify if your parameter should be fuzzed at runtime, startup, or both by declaring the +`fuzz_at` key for the parameter. The `fuzz_at` key should be a list that can contain the values +`startup`, `runtime`, or both. The eligible values are specified in the `set_at` keys of the +corresponding `.idl` files. -For a parameter that is only fuzzed at startup, the fuzzer will generate a fuzzed value for the parameter and set it when starting up the server. +For a parameter that is only fuzzed at startup, the fuzzer will generate a fuzzed value for the +parameter and set it when starting up the server. -For a parameter fuzzed at runtime, the fuzzer will generate a fuzzed value for the parameter while running the server based on a `period` key that is required for fuzzed runtime parameters. -The `period` key describes how often the parameter should be changed, in seconds. Every `period` seconds, the fuzzer will select a new random value for the parameter and use the setParameter command to update the value of the -parameter on every node in the cluster while the suite is running. This is perfomed by the [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) hook. +For a parameter fuzzed at runtime, the fuzzer will generate a fuzzed value for the parameter while +running the server based on a `period` key that is required for fuzzed runtime parameters. The +`period` key describes how often the parameter should be changed, in seconds. Every `period` +seconds, the fuzzer will select a new random value for the parameter and use the setParameter +command to update the value of the parameter on every node in the cluster while the suite is +running. This is perfomed by the +[FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) +hook. -For parameters with complex fuzzing logic or interdependencies with other parameters, you can set `"custom_fuzz_value_assignment": True` to bypass the standard fuzzing logic. Parameters with this flag must be handled explicitly in the special handling functions (`generate_special_mongod_startup_parameters()` for startup parameters or `generate_special_runtime_parameters()` for runtime parameters). Note that parameter dependency logic is currently only supported for startup fuzzing - runtime fuzzing operates on individual parameters. See the section below on parameters requiring special handling for more details. +For parameters with complex fuzzing logic or interdependencies with other parameters, you can set +`"custom_fuzz_value_assignment": True` to bypass the standard fuzzing logic. Parameters with this +flag must be handled explicitly in the special handling functions +(`generate_special_mongod_startup_parameters()` for startup parameters or +`generate_special_runtime_parameters()` for runtime parameters). Note that parameter dependency +logic is currently only supported for startup fuzzing - runtime fuzzing operates on individual +parameters. See the section below on parameters requiring special handling for more details. -Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have as a value. -The parameters are added in order of priority chosen in the if-elif-else statement in `generate_normal_mongo_parameters()` -in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). -So, if you added the fields `default`, `min`, and `max` for a `param`, case 4 would get evaluated over case 5. +Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have +as a value. The parameters are added in order of priority chosen in the if-elif-else statement in +`generate_normal_mongo_parameters()` in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). So, if +you added the fields `default`, `min`, and `max` for a `param`, case 4 would get evaluated over +case 5. 1. `param = rng.uniform(min, max)` @@ -218,41 +264,59 @@ So, if you added the fields `default`, `min`, and `max` for a `param`, case 4 wo "param": {"default": default} ``` - > Note: For the default case, please add the value `"fuzz_at": ["startup"]` (the default value gets set at "startup"). + > Note: For the default case, please add the value `"fuzz_at": ["startup"]` (the default value + > gets set at "startup"). -If you have a parameter that depends on another parameter being generated (see `throughputProbingInitialConcurrency` needing to be initialized before -`throughputProbingMinConcurrency` and `throughputProbingMaxConcurrency` as an example in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that -differs from the above cases, please do the following steps: +If you have a parameter that depends on another parameter being generated (see +`throughputProbingInitialConcurrency` needing to be initialized before +`throughputProbingMinConcurrency` and `throughputProbingMaxConcurrency` as an example in +[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that differs from the above cases, +please do the following steps: -1. Add the parameter and the needed information to [config_fuzzer_limits.py](./config_fuzzer_limits.py) (ensure to correctly add to the `mongod` or `mongos` sub-dictionary), including `"custom_fuzz_value_assignment": True` to indicate it requires special handling +1. Add the parameter and the needed information to + [config_fuzzer_limits.py](./config_fuzzer_limits.py) (ensure to correctly add to the `mongod` or + `mongos` sub-dictionary), including `"custom_fuzz_value_assignment": True` to indicate it + requires special handling In [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py): -2. Add the parameter's special handling in `generate_special_mongod_startup_parameters()` or `generate_special_mongos_startup_parameters()` for startup parameters, or `generate_special_runtime_parameters()` for runtime parameters +2. Add the parameter's special handling in `generate_special_mongod_startup_parameters()` or + `generate_special_mongos_startup_parameters()` for startup parameters, or + `generate_special_runtime_parameters()` for runtime parameters -> Note: Parameter dependencies (where one parameter's value constrains another) are currently only supported for startup fuzzing. Runtime fuzzing handles parameters individually. +> Note: Parameter dependencies (where one parameter's value constrains another) are currently only +> supported for startup fuzzing. Runtime fuzzing handles parameters individually. -If you add a flow control parameter, please add the the parameter's name to `flow_control_params` in `generate_mongod_parameters`. +If you add a flow control parameter, please add the the parameter's name to `flow_control_params` in +`generate_mongod_parameters`. -> Note: The main distinction between min/max vs. lower-bound/upper_bound is there is some transformation involving the lower and upper bounds, -> while the min/max should be the true min/max of the parameters. You should also include the true min/max of the parameter so this can be logged. -> If the min/max is not inclusive, this is added as a note above the parameter. +> Note: The main distinction between min/max vs. lower-bound/upper_bound is there is some +> transformation involving the lower and upper bounds, while the min/max should be the true min/max +> of the parameters. You should also include the true min/max of the parameter so this can be +> logged. If the min/max is not inclusive, this is added as a note above the parameter. ### Adding new WiredTiger parameters -WiredTiger parameters and their properties (e.g. min, max, default) are stored in [config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py). +WiredTiger parameters and their properties (e.g. min, max, default) are stored in +[config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py). -> These _can not_ be fuzzed with the [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) hook because they are only set on startup (these parameters are used in the wt configuration string). +> These _can not_ be fuzzed with the +> [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) +> hook because they are only set on startup (these parameters are used in the wt configuration +> string). -Below is a list of ways to fuzz configs which are supported without having to also change [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). - -Please ensure that you add it correctly to the `wt` (eviction parameters) or `wt_table` subdictionary. - -Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have as a value. - -The parameters are added in order of priority chosen in the if-elif-else statement in `generate_normal_wt_parameters()` in +Below is a list of ways to fuzz configs which are supported without having to also change [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). +Please ensure that you add it correctly to the `wt` (eviction parameters) or `wt_table` +subdictionary. + +Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have +as a value. + +The parameters are added in order of priority chosen in the if-elif-else statement in +`generate_normal_wt_parameters()` in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). + 1. `param = rng.choices(choices)`, where choices is an array Add: @@ -281,25 +345,32 @@ The parameters are added in order of priority chosen in the if-elif-else stateme "param": {"min": min, "max": max} ``` -If you have a parameter that depends on another parameter being generated (see `eviction_target` needing to be initialized before -`eviction_trigger` as an example in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that differs from the above cases, +If you have a parameter that depends on another parameter being generated (see `eviction_target` +needing to be initialized before `eviction_trigger` as an example in +[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that differs from the above cases, please do the following steps: -1. Add the parameter and the needed information to [config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py) (ensure to correctly add to the `wt` or `wt_table` sub-dictionary) +1. Add the parameter and the needed information to + [config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py) (ensure to correctly add to the `wt` + or `wt_table` sub-dictionary) In [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py): -2. Add the parameter to `excluded_normal_params` in `generate_eviction_configs()` or `generate_table_configs()` -3. Add the parameter's special handling in `generate_special_eviction_configs()` or `generate_special_table_configs()` +2. Add the parameter to `excluded_normal_params` in `generate_eviction_configs()` or + `generate_table_configs()` +3. Add the parameter's special handling in `generate_special_eviction_configs()` or + `generate_special_table_configs()` -> The main distinction between min/max vs. lower-bound/upper_bound is there is some transformation involving the lower and upper bounds, -> while the min/max should be the true min/max of the parameters. You should also include the true min/max of the parameter so this can be logged. -> If the min/max is not inclusive, this is added as a note above the parameter. +> The main distinction between min/max vs. lower-bound/upper_bound is there is some transformation +> involving the lower and upper bounds, while the min/max should be the true min/max of the +> parameters. You should also include the true min/max of the parameter so this can be logged. If +> the min/max is not inclusive, this is added as a note above the parameter. ## Exclusions - `jstests/libs/override_methods/config_fuzzer_incompatible_commands.js` - These commands are too impactful to run with the config fuzzer - The `does_not_support_config_fuzzer` jstest tag - - Tests with this tag may manually specify server parameters modified by the fuzzer or read global state that is modified in some way by the fuzzer. + - Tests with this tag may manually specify server parameters modified by the fuzzer or read global + state that is modified in some way by the fuzzer. - Just because a test is failing does not mean it is incompatible with the config fuzzer. diff --git a/buildscripts/resmokelib/hang_analyzer/README.md b/buildscripts/resmokelib/hang_analyzer/README.md index 5c3ca796ef7..264bdf2df0b 100644 --- a/buildscripts/resmokelib/hang_analyzer/README.md +++ b/buildscripts/resmokelib/hang_analyzer/README.md @@ -3,7 +3,9 @@ There are two main ways of running the core analyzer. 1. Running the core analyzer with local core dumps and binaries. -2. Running the core analyzer with core dumps and binaries from an evergreen task. Note that some analysis might fail if you are not on the same AMI (Amazon Machine Image) that the task was run on. +2. Running the core analyzer with core dumps and binaries from an evergreen task. Note that some + analysis might fail if you are not on the same AMI (Amazon Machine Image) that the task was run + on. To run the core analyzer with local core dumps and binaries: @@ -11,7 +13,9 @@ To run the core analyzer with local core dumps and binaries: python3 buildscripts/resmoke.py core-analyzer ``` -This will look for binaries in the build/install directory, and it will look for core dumps in the current directory. If your local environment is different you can include `--install-dir` and `--core-dir` in your invocation to specify other locations. +This will look for binaries in the build/install directory, and it will look for core dumps in the +current directory. If your local environment is different you can include `--install-dir` and +`--core-dir` in your invocation to specify other locations. To run the core analyzer with core dumps and binaries from an evergreen task: @@ -19,11 +23,15 @@ To run the core analyzer with core dumps and binaries from an evergreen task: python3 buildscripts/resmoke.py core-analyzer --task-id={task_id} ``` -This will download all of the core dumps and binaries from the task and put them into the configured `--working-dir`, this defaults to the `core-analyzer` directory. +This will download all of the core dumps and binaries from the task and put them into the configured +`--working-dir`, this defaults to the `core-analyzer` directory. -All of the task analysis will be added to the `analysis` directory inside the configured `--working-dir`. +All of the task analysis will be added to the `analysis` directory inside the configured +`--working-dir`. -Note: Currently the core analyzer only runs on linux. Windows uses the legacy hang analyzer but will be switched over when we run into issues or have time to do the transition. We have not tackled the problem of getting core dumps on macOS so we have no core dump analysis on that operating system. +Note: Currently the core analyzer only runs on linux. Windows uses the legacy hang analyzer but will +be switched over when we run into issues or have time to do the transition. We have not tackled the +problem of getting core dumps on macOS so we have no core dump analysis on that operating system. ### Getting core dumps @@ -37,28 +45,33 @@ sequenceDiagram Hang Analyzer ->> Core Dumps: Attach to pid and generate core dumps ``` -When a task times out, it hits the [timeout](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2694) section in the defined evergreen config. -In this timeout section, we run [this](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2302) task which runs the hang-analyzer with the following invocation: +When a task times out, it hits the +[timeout](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2694) +section in the defined evergreen config. In this timeout section, we run +[this](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2302) +task which runs the hang-analyzer with the following invocation: ``` python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python ``` -This tells the hang-analyzer to look for all of the python processes (we are specifically looking for resmoke) on the machine and to signal them. -When resmoke is [signaled](https://github.com/mongodb/mongo/blob/08a99b15eea7ae0952b2098710d565dd7f709ff6/buildscripts/resmokelib/sighandler.py#L25), it again invokes the hang analyzer with the specific pids of it's child processes. -It will look similar to this most of the time: +This tells the hang-analyzer to look for all of the python processes (we are specifically looking +for resmoke) on the machine and to signal them. When resmoke is +[signaled](https://github.com/mongodb/mongo/blob/08a99b15eea7ae0952b2098710d565dd7f709ff6/buildscripts/resmokelib/sighandler.py#L25), +it again invokes the hang analyzer with the specific pids of it's child processes. It will look +similar to this most of the time: ``` python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -k -c -d pid1,pid2,pid3 ``` -The things to note here are the `-k` which kills the process and `-c` which takes core dumps. -The resulting core dumps are put into the current running directory. +The things to note here are the `-k` which kills the process and `-c` which takes core dumps. The +resulting core dumps are put into the current running directory. #### When a test times out -An optional test timeout (`--testTimeout=N` seconds) can be used when running resmoke that will run the hang-analyzer on all processes related to that test. -When a test times out, it will analyze: +An optional test timeout (`--testTimeout=N` seconds) can be used when running resmoke that will run +the hang-analyzer on all processes related to that test. When a test times out, it will analyze: - The proccess the testcase created. - Any child of the testcase process. @@ -75,23 +88,31 @@ When a test times out, it will analyze: | |-mongo (ENV_MARKER=2, pgid 9) ``` -Caution: Should a process be created in a new process group as `bar` is in the above example, it may be missed on MacOS. If `foo` crashes/exits, `bar` is orphaned and reparented to the `init` process. It is no longer a "child" and it is not generally possible to read environment variables of arbitrary processes on MacOS with System Integrity Protection (SIP) enabled. +Caution: Should a process be created in a new process group as `bar` is in the above example, it may +be missed on MacOS. If `foo` crashes/exits, `bar` is orphaned and reparented to the `init` process. +It is no longer a "child" and it is not generally possible to read environment variables of +arbitrary processes on MacOS with System Integrity Protection (SIP) enabled. #### When a task fails normally -When a task fails normally, core dumps may also be generated by the linux kernel and put into the working directory. +When a task fails normally, core dumps may also be generated by the linux kernel and put into the +working directory. #### Note on archival/upload in Evergreen -We use a non-standard way of uploading core dumps to evergreen due to [timeout issues](https://jira.mongodb.org/browse/SERVER-73171) we were facing when archiving and uploading them normally through evergreen commands. -After investigation of the above issue, we found that compressing and uploading core dumps was slow for a couple reasons: +We use a non-standard way of uploading core dumps to evergreen due to +[timeout issues](https://jira.mongodb.org/browse/SERVER-73171) we were facing when archiving and +uploading them normally through evergreen commands. After investigation of the above issue, we found +that compressing and uploading core dumps was slow for a couple reasons: -1. Tarring all of the core dumps into one file takes up a lot of disk IO and disk IO was the bottleneck. +1. Tarring all of the core dumps into one file takes up a lot of disk IO and disk IO was the + bottleneck. 2. Gzip is single threaded. 3. Uploading a big file synchronously is not fast. -We made a [script](https://github.com/mongodb/mongo/blob/master/buildscripts/fast_archive.py) that gzips all of the core dumps in parallel and uploads them to S3 individually asynchronously. -This solved all of the problems listed above. +We made a [script](https://github.com/mongodb/mongo/blob/master/buildscripts/fast_archive.py) that +gzips all of the core dumps in parallel and uploads them to S3 individually asynchronously. This +solved all of the problems listed above. ### Generating the core analyzer task @@ -104,18 +125,26 @@ sequenceDiagram Generated Task ->> Core Analyzer Output: Overwrite output with
core dump analysis ``` -In the [post task](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2665) section, we [define](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2184) the evergreen function used to generate the core analyzer task. -This [script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/buildscripts/resmokelib/hang_analyzer/gen_hang_analyzer_tasks.py) runs on every task (passing or failing) and is independent of anything else that happened prior in the task and does all of the checks to ensure it should run. -These checks include: +In the +[post task](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2665) +section, we +[define](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2184) +the evergreen function used to generate the core analyzer task. This +[script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/buildscripts/resmokelib/hang_analyzer/gen_hang_analyzer_tasks.py) +runs on every task (passing or failing) and is independent of anything else that happened prior in +the task and does all of the checks to ensure it should run. These checks include: 1. The task is being run on an operating system supported by the core analyzer. 2. The task has any core dumps uploaded and attached to it. 3. At least one of the binaries uploaded is from a binary we know how to process. -The output from this script is a json file in the format evergreen expects. -We then pass this json file into the `generate.tasks` evergreen command to generate the task. +The output from this script is a json file in the format evergreen expects. We then pass this json +file into the `generate.tasks` evergreen command to generate the task. -After the task is generated, we have [another script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2213) that finds the task that was just generated and attaches it to the current task being ran. +After the task is generated, we have +[another script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2213) +that finds the task that was just generated and attaches it to the current task being ran. -The reason we upload a temporary file to the original task is to attach that s3 file link to the task. -Evergreen does not currently have a way to attach files to a task after it was ran so we need to upload something while the original task is in progress. +The reason we upload a temporary file to the original task is to attach that s3 file link to the +task. Evergreen does not currently have a way to attach files to a task after it was ran so we need +to upload something while the original task is in progress. diff --git a/buildscripts/resmokelib/powercycle/README.md b/buildscripts/resmokelib/powercycle/README.md index 25e191f1b69..f96e14e92e0 100644 --- a/buildscripts/resmokelib/powercycle/README.md +++ b/buildscripts/resmokelib/powercycle/README.md @@ -1,17 +1,15 @@ # Powercycle README -Power cycling is the process of turning hardware off and then turning it on again. -Powercycle test is designed to work across two machines, one machine is a "server" -that controls and monitors the workflow and a "client" that runs Mongo server and -is remotely crashed by "server" regularly. +Power cycling is the process of turning hardware off and then turning it on again. Powercycle test +is designed to work across two machines, one machine is a "server" that controls and monitors the +workflow and a "client" that runs Mongo server and is remotely crashed by "server" regularly. -In evergreen the localhost that runs the task acts as a "server" and the remote -host which is created by `host.create` evergreen command acts as a "client". +In evergreen the localhost that runs the task acts as a "server" and the remote host which is +created by `host.create` evergreen command acts as a "client". -Powercycle test is the part of resmoke. Python 3.13+ with python venv is required to -run the resmoke (python3 from [mongodbtoolchain](http://mongodbtoolchain.build.10gen.cc/) -is highly recommended). Python venv can be set up by running in the root mongo repo -directory: +Powercycle test is the part of resmoke. Python 3.13+ with python venv is required to run the resmoke +(python3 from [mongodbtoolchain](http://mongodbtoolchain.build.10gen.cc/) is highly recommended). +Python venv can be set up by running in the root mongo repo directory: ``` python3 -m venv python3-venv @@ -48,20 +46,18 @@ buildscripts/resmokelib/powercycle/__init__.py ### Set up EC2 instance -1. `Evergreen host.create command` - in Evergreen the remote host is created with - the same distro as the localhost runs and some initial connections are made to ensure - it's up before further steps -2. `Resmoke powercycle setup-host command` - prepares remote host via ssh to run - the powercycle test: +1. `Evergreen host.create command` - in Evergreen the remote host is created with the same distro as + the localhost runs and some initial connections are made to ensure it's up before further steps +2. `Resmoke powercycle setup-host command` - prepares remote host via ssh to run the powercycle + test: ``` python buildscripts/resmoke.py powercycle setup-host ``` Powercycle setup-host operations are located in -`buildscripts/resmokelib/powercycle/setup/__init__.py`. -`expansions.yml` file is used to load the configuration to run operations which is -created by `expansions.write` command in Evergreen. +`buildscripts/resmokelib/powercycle/setup/__init__.py`. `expansions.yml` file is used to load the +configuration to run operations which is created by `expansions.write` command in Evergreen. It runs several operations via ssh: @@ -69,12 +65,12 @@ It runs several operations via ssh: - copy `buildscripts` and `mongoDB executables` from localhost to the remote host - set up python venv on the remote host - set up curator to collect system & process stats on the remote host -- install [NotMyFault](https://docs.microsoft.com/en-us/sysinternals/downloads/notmyfault) - to crash Windows (only on Windows) +- install [NotMyFault](https://docs.microsoft.com/en-us/sysinternals/downloads/notmyfault) to crash + Windows (only on Windows) Remote operation via ssh implementation is located in -`buildscripts/resmokelib/powercycle/lib/remote_operations.py`. -The following operations are supported: +`buildscripts/resmokelib/powercycle/lib/remote_operations.py`. The following operations are +supported: - `copy_to` - copy files from the localhost to the remote host - `copy_from` - copy files from the remote host to the localhost @@ -82,9 +78,8 @@ The following operations are supported: ### Run powercycle test -`Resmoke powercycle run command` - runs the powercycle test on the localhost -which runs remote operations on the remote host via ssh and local validation -checks: +`Resmoke powercycle run command` - runs the powercycle test on the localhost which runs remote +operations on the remote host via ssh and local validation checks: ``` python buildscripts/resmoke.py powercycle run \ @@ -95,26 +90,26 @@ python buildscripts/resmoke.py powercycle run \ ###### Resmoke powercycle run arguments -The arguments for resmoke powercycle run command are defined in `add_subcommand()` -function in `buildscripts/resmokelib/powercycle/__init__.py`. When powercycle test -runs remote operations on the remote host it calls the copied version of this script -on the remote host. Thus, some resmoke powercycle run command arguments are needed -for the remote call and shouldn't be used when calling the script on the localhost. +The arguments for resmoke powercycle run command are defined in `add_subcommand()` function in +`buildscripts/resmokelib/powercycle/__init__.py`. When powercycle test runs remote operations on the +remote host it calls the copied version of this script on the remote host. Thus, some resmoke +powercycle run command arguments are needed for the remote call and shouldn't be used when calling +the script on the localhost. -`--taskName` argument is used to get powercycle task configurations that are stored -in `buildscripts/resmokeconfig/powercycle/powercycle_tasks.yml` +`--taskName` argument is used to get powercycle task configurations that are stored in +`buildscripts/resmokeconfig/powercycle/powercycle_tasks.yml` -There is a known issue with `--setParameter` mongod options incorrectly processed -from `mongod_options` that is described in [SERVER-47621](https://jira.mongodb.org/browse/SERVER-47621) +There is a known issue with `--setParameter` mongod options incorrectly processed from +`mongod_options` that is described in [SERVER-47621](https://jira.mongodb.org/browse/SERVER-47621) ###### Powercycle test implementation The powercycle test main implementation is located in `main()` function in `buildscripts/resmokelib/powercycle/powercycle.py`. -The value of `--remoteOperation` argument is used to distinguish if we are running the script -on the localhost or on the remote host. -`remote_handler()` function performs the following remote operations: +The value of `--remoteOperation` argument is used to distinguish if we are running the script on the +localhost or on the remote host. `remote_handler()` function performs the following remote +operations: - `noop` - do nothing - `crash_server` - internally crash the server @@ -157,17 +152,17 @@ When running on localhost the powercycle test loops do the following steps: ### Save diagnostics -`Resmoke powercycle save-diagnostics command` - copies powercycle diagnostics -files from the remote host to the localhost (mainly used by Evergreen): +`Resmoke powercycle save-diagnostics command` - copies powercycle diagnostics files from the remote +host to the localhost (mainly used by Evergreen): ``` python buildscripts/resmoke.py powercycle save-diagnostics ``` Powercycle save-diagnostics operations are located in -`buildscripts/resmokelib/powercycle/save_diagnostics/__init__.py`. -`expansions.yml` file is used to load the configuration to run operations which is -created by `expansions.write` command in Evergreen. +`buildscripts/resmokelib/powercycle/save_diagnostics/__init__.py`. `expansions.yml` file is used to +load the configuration to run operations which is created by `expansions.write` command in +Evergreen. It runs several operations via ssh: @@ -188,15 +183,14 @@ It runs several operations via ssh: ### Remote hang analyzer (optional) -`Resmoke powercycle remote-hang-analyzer command` - runs hang analyzer on the -remote host (mainly used by Evergreen): +`Resmoke powercycle remote-hang-analyzer command` - runs hang analyzer on the remote host (mainly +used by Evergreen): ``` $python buildscripts/resmoke.py powercycle remote-hang-analyzer ``` -Powercycle remote-hang-analyzer command calls resmoke hang analyzer on the -remote host and is located in -`buildscripts/resmokelib/powercycle/remote_hang_analyzer/__init__.py` -`expansions.yml` file is used to load the configuration to run this command which is -created by `expansions.write` command in Evergreen. +Powercycle remote-hang-analyzer command calls resmoke hang analyzer on the remote host and is +located in `buildscripts/resmokelib/powercycle/remote_hang_analyzer/__init__.py` `expansions.yml` +file is used to load the configuration to run this command which is created by `expansions.write` +command in Evergreen. diff --git a/buildscripts/resmokelib/testing/fixtures/README.md b/buildscripts/resmokelib/testing/fixtures/README.md index 6d792f78131..2b188dbea1c 100644 --- a/buildscripts/resmokelib/testing/fixtures/README.md +++ b/buildscripts/resmokelib/testing/fixtures/README.md @@ -4,24 +4,39 @@ Fixtures define a specific topology that tests run against. ## Supported Fixtures -Specify any of the following as the `fixture` in your [Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config: +Specify any of the following as the `fixture` in your +[Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config: -- [`BulkWriteFixture`](./bulk_write.py) - Fixture which provides JSTests with a set of clusters to run tests against. -- [`ExternalFixture`](./external.py) - Fixture which provides JSTests capability to connect to external (non-resmoke) cluster. -- [`ExternalShardedClusterFixture`](./shardedcluster.py) - Fixture to interact with external sharded cluster fixture. -- [`MongoDFixture`](./standalone.py) - Fixture which provides JSTests with a standalone mongod to run against. -- [`MongoTFixture`](./mongot.py) - Fixture which provides JSTests with a mongot to run alongside a mongod. -- [`MultiReplicaSetFixture`](./multi_replica_set.py) - Fixture which provides JSTests with a set of replica sets to run against. -- [`MultiShardedClusterFixture`](./multi_sharded_cluster.py) - Fixture which provides JSTests with a set of sharded clusters to run against. -- [`ReplicaSetFixture`](./replicaset.py) - Fixture which provides JSTests with a replica set to run against. -- [`ShardedClusterFixture`](./shardedcluster.py) - Fixture which provides JSTests with a sharded cluster to run against. - - Used when the MongoDB deployment is started by the JavaScript test itself with `MongoRunner`, `ReplSetTest`, or `ShardingTest`. -- [`YesFixture`](./yesfixture.py) - Fixture which spawns several `yes` executables to generate lots of log messages. +- [`BulkWriteFixture`](./bulk_write.py) - Fixture which provides JSTests with a set of clusters to + run tests against. +- [`ExternalFixture`](./external.py) - Fixture which provides JSTests capability to connect to + external (non-resmoke) cluster. +- [`ExternalShardedClusterFixture`](./shardedcluster.py) - Fixture to interact with external sharded + cluster fixture. +- [`MongoDFixture`](./standalone.py) - Fixture which provides JSTests with a standalone mongod to + run against. +- [`MongoTFixture`](./mongot.py) - Fixture which provides JSTests with a mongot to run alongside a + mongod. +- [`MultiReplicaSetFixture`](./multi_replica_set.py) - Fixture which provides JSTests with a set of + replica sets to run against. +- [`MultiShardedClusterFixture`](./multi_sharded_cluster.py) - Fixture which provides JSTests with a + set of sharded clusters to run against. +- [`ReplicaSetFixture`](./replicaset.py) - Fixture which provides JSTests with a replica set to run + against. +- [`ShardedClusterFixture`](./shardedcluster.py) - Fixture which provides JSTests with a sharded + cluster to run against. + - Used when the MongoDB deployment is started by the JavaScript test itself with `MongoRunner`, + `ReplSetTest`, or `ShardingTest`. +- [`YesFixture`](./yesfixture.py) - Fixture which spawns several `yes` executables to generate lots + of log messages. ## Interfaces - [`Fixture`](./interface.py) - Base class for all fixtures. -- [`MultiClusterFixture`](./interface.py) - Base class for fixtures that may consist of multiple independent participant clusters. - - The participant clusters can function independently without coordination, but are bound together only for some duration as they participate in some process such as a migration. The participant clusters are fixtures themselves. +- [`MultiClusterFixture`](./interface.py) - Base class for fixtures that may consist of multiple + independent participant clusters. + - The participant clusters can function independently without coordination, but are bound together + only for some duration as they participate in some process such as a migration. The participant + clusters are fixtures themselves. - [`NoOpFixture`](./interface.py) - A Fixture implementation that does not start any servers. - [`ReplFixture`](./interface.py) - Base class for all fixtures that support replication. diff --git a/buildscripts/resmokelib/testing/hooks/README.md b/buildscripts/resmokelib/testing/hooks/README.md index 0fb3c178084..8d42678189b 100644 --- a/buildscripts/resmokelib/testing/hooks/README.md +++ b/buildscripts/resmokelib/testing/hooks/README.md @@ -4,84 +4,145 @@ Hooks are a mechanism to run routines _around_ the tests, at the test content bo ## Supported hooks -Specify any of the following as the `hooks` in your [Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config: +Specify any of the following as the `hooks` in your +[Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config: -- [`AnalyzeShardKeysInBackground`](./analyze_shard_key.py) - A hook for running `analyzeShardKey` commands while a test is running. -- [`AntithesisLogging`](./antithesis_logging.py) - Prints antithesis commands before & after test run. +- [`AnalyzeShardKeysInBackground`](./analyze_shard_key.py) - A hook for running `analyzeShardKey` + commands while a test is running. +- [`AntithesisLogging`](./antithesis_logging.py) - Prints antithesis commands before & after test + run. - [`BackgroundInitialSync`](./initialsync.py) - Background Initial Sync - - After every test, this hook checks if a background node has finished initial sync and if so validates it, tears it down, and restarts it. - - This test accepts a parameter `n` that specifies a number of tests after which it will wait for replication to finish before validating and restarting the initial sync node. - - This requires the ReplicaSetFixture to be started with `start_initial_sync_node=True`. If used at the same time as `CleanEveryN`, the `n` value passed to this hook should be equal to the `n` value for `CleanEveryN`. -- [`CheckClusterIndexConsistency`](./cluster_index_consistency.py) - Checks that indexes are the same across chunks for the same collections. -- [`CheckMetadataConsistencyInBackground`](./metadata_consistency) - Check the metadata consistency of a sharded cluster. -- [`CheckOrphansDeleted`](./orphans.py) - Check if the range deleter failed to delete any orphan documents. -- [`CheckReplDBHashInBackground`](./dbhash_background.py) - A hook for comparing the dbhashes of all replica set members while a test is running. + - After every test, this hook checks if a background node has finished initial sync and if so + validates it, tears it down, and restarts it. + - This test accepts a parameter `n` that specifies a number of tests after which it will wait for + replication to finish before validating and restarting the initial sync node. + - This requires the ReplicaSetFixture to be started with `start_initial_sync_node=True`. If used + at the same time as `CleanEveryN`, the `n` value passed to this hook should be equal to the `n` + value for `CleanEveryN`. +- [`CheckClusterIndexConsistency`](./cluster_index_consistency.py) - Checks that indexes are the + same across chunks for the same collections. +- [`CheckMetadataConsistencyInBackground`](./metadata_consistency) - Check the metadata consistency + of a sharded cluster. +- [`CheckOrphansDeleted`](./orphans.py) - Check if the range deleter failed to delete any orphan + documents. +- [`CheckReplDBHashInBackground`](./dbhash_background.py) - A hook for comparing the dbhashes of all + replica set members while a test is running. - [`CheckReplDBHash`](./dbhash.py) - Check if the dbhashes match. -- [`CheckReplOplogs`](./oplog.py) - Check that `local.oplog.rs` matches on the primary and secondaries. -- [`CheckReplPreImagesConsistency`](./preimages_consistency.py) - Check that `config.system.preimages` is consistent between the primary and secondaries. -- [`CheckRoutingTableConsistency`](./routing_table_consistency.py) - Verifies the absence of corrupted entries in config.chunks and config.collections. -- [`CheckShardFilteringMetadata`](./shard_filtering_metadata.py) - Inspect filtering metadata on shards +- [`CheckReplOplogs`](./oplog.py) - Check that `local.oplog.rs` matches on the primary and + secondaries. +- [`CheckReplPreImagesConsistency`](./preimages_consistency.py) - Check that + `config.system.preimages` is consistent between the primary and secondaries. +- [`CheckRoutingTableConsistency`](./routing_table_consistency.py) - Verifies the absence of + corrupted entries in config.chunks and config.collections. +- [`CheckShardFilteringMetadata`](./shard_filtering_metadata.py) - Inspect filtering metadata on + shards - [`CleanEveryN`](./cleanup.py) - Restart the fixture after it has ran `n` tests. -- [`CleanupConcurrencyWorkloads`](./cleanup_concurrency_workloads.py) - Drop all databases, except those that have been excluded. - - For concurrency tests that run on different DBs, drop all databases except ones in `exclude_dbs`. For tests that run on the same DB, drop all databases except ones in `exclude_dbs` and the DB used by the test/workloads. For tests that run on the same collection, drop all collections in all databases except for `exclude_dbs` and the collection used by the test/workloads. +- [`CleanupConcurrencyWorkloads`](./cleanup_concurrency_workloads.py) - Drop all databases, except + those that have been excluded. + - For concurrency tests that run on different DBs, drop all databases except ones in + `exclude_dbs`. For tests that run on the same DB, drop all databases except ones in + `exclude_dbs` and the DB used by the test/workloads. For tests that run on the same collection, + drop all collections in all databases except for `exclude_dbs` and the collection used by the + test/workloads. - On mongod-related fixtures, this will clear the dbpath - [`ClusterParameter`](./cluster_parameter.py) - Sets the specified cluster server parameter. -- [`ContinuousAddRemoveShard`](./add_remove_shards.py) - Continuously adds and removes shards at regular intervals. If running with `configsvr` transitions, will transition in/out of config shard mode. -- [`ContinuousInitialSync`](./continuous_initial_sync.py) - Periodically initial sync nodes then step them up. -- [`ContinuousStepdown`](./stepdown.py) - regularly connect to replica sets and send a `replSetStepDown` command. -- [`ContinuousTransition`](./replicaset_transition_to_and_from_csrs.py) - connects to replica sets and transitions them from replica set to CSRS node in the background. -- [`DoReconfigInBackground`](./reconfig_background.py) - A hook for running a safe reconfig against a replica set while a test is running. -- [`DropConfigCacheCollections`](./drop_config_cache_collections.py) - A hook for dropping random entries of config.cache.collections in shards. -- [`DropSessionsCollection`](./drop_sessions_collection.py) - A hook for dropping and recreating config.system.sessions while tests are running. +- [`ContinuousAddRemoveShard`](./add_remove_shards.py) - Continuously adds and removes shards at + regular intervals. If running with `configsvr` transitions, will transition in/out of config shard + mode. +- [`ContinuousInitialSync`](./continuous_initial_sync.py) - Periodically initial sync nodes then + step them up. +- [`ContinuousStepdown`](./stepdown.py) - regularly connect to replica sets and send a + `replSetStepDown` command. +- [`ContinuousTransition`](./replicaset_transition_to_and_from_csrs.py) - connects to replica sets + and transitions them from replica set to CSRS node in the background. +- [`DoReconfigInBackground`](./reconfig_background.py) - A hook for running a safe reconfig against + a replica set while a test is running. +- [`DropConfigCacheCollections`](./drop_config_cache_collections.py) - A hook for dropping random + entries of config.cache.collections in shards. +- [`DropSessionsCollection`](./drop_sessions_collection.py) - A hook for dropping and recreating + config.system.sessions while tests are running. - [`DropUserCollections`](./drop_user_collections.py) - Drops all user collections. - [`EnableSpuriousWriteConflicts`](./enable_spurious_write_conflicts.py) - Toggles write conflicts. -- [`FCVUpgradeDowngradeInBackground`](./fcv_upgrade_downgrade.py) - A hook to run background FCV upgrade and downgrade against test servers while a test is running. -- [`FuzzRuntimeParameters`](./fuzz_runtime_parameters.py) - Regularly connect to nodes and sends them a `setParameter` command; uses the [Config Fuzzer](../../../../buildscripts/resmokelib/generate_fuzz_config/README.md). -- [`FuzzRuntimeStress`](./fuzz_runtime_stress.py) - Test hook that periodically changes the amount of stress the system is experiencing. +- [`FCVUpgradeDowngradeInBackground`](./fcv_upgrade_downgrade.py) - A hook to run background FCV + upgrade and downgrade against test servers while a test is running. +- [`FuzzRuntimeParameters`](./fuzz_runtime_parameters.py) - Regularly connect to nodes and sends + them a `setParameter` command; uses the + [Config Fuzzer](../../../../buildscripts/resmokelib/generate_fuzz_config/README.md). +- [`FuzzRuntimeStress`](./fuzz_runtime_stress.py) - Test hook that periodically changes the amount + of stress the system is experiencing. - [`FuzzerRestoreSettings`](./fuzzer_restore_settings.py) - Cleans up unwanted changes from fuzzer. -- [`GenerateAndCheckPerfResults`](./generate_and_check_perf_results.py) - Combine JSON results from individual benchmarks and check their reported values against any thresholds set for them. - - Combines test results from individual benchmark files to a single file. This is useful for generating the json file to feed into the Evergreen performance visualization plugin. +- [`GenerateAndCheckPerfResults`](./generate_and_check_perf_results.py) - Combine JSON results from + individual benchmarks and check their reported values against any thresholds set for them. + - Combines test results from individual benchmark files to a single file. This is useful for + generating the json file to feed into the Evergreen performance visualization plugin. - [`HelloDelays`](./hello_failures.py) - Sets Hello fault injections. - [`IntermediateInitialSync`](./initialsync.py) - Intermediate Initial Sync - - This hook accepts a parameter `n` that specifies a number of tests after which it will start up a node to initial sync, wait for replication to finish, and then validate the data. + - This hook accepts a parameter `n` that specifies a number of tests after which it will start up + a node to initial sync, wait for replication to finish, and then validate the data. - This requires the ReplicaSetFixture to be started with 'start_initial_sync_node=True'. - [`LagOplogApplicationInBackground`](./secondary_lag.py) - Toggles secondary oplog application lag. - [`LibfuzzerHook`](./cpp_libfuzzer.py) - Merges inputs after a fuzzer run. -- [`MagicRestoreEveryN`](./magic_restore.py) - Open a backup cursor and run magic restore process after `n` tests have run. +- [`MagicRestoreEveryN`](./magic_restore.py) - Open a backup cursor and run magic restore process + after `n` tests have run. - Requires the use of `MagicRestoreFixture`. -- [`PeriodicKillSecondaries`](./periodic_kill_secondaries.py) - Periodically kills the secondaries in a replica set. - - Also verifies that the secondaries can reach the SECONDARY state without having connectivity to the primary after an unclean shutdown. -- [`PeriodicStackTrace`](./periodic_stack_trace.py) - Test hook that sends the stacktracing signal to mongo processes at randomized intervals. -- [`QueryableServerHook`](./queryable_server_hook.py) - Starts the queryable server before each test for queryable restores. Restarts the queryable server between tests. -- [`RotateExecutionControlParams`](./rotate_execution_control_params.py) - Periodically rotates 'executionControlConcurrencyAdjustmentAlgorithm' and deprioritization server parameters to random valid values. -- [`RunChangeStreamsInBackground`](./change_streams.py) - Run in the background full cluster change streams while a test is running. Open and close the change stream every `1..10` tests (random using `config.RANDOM_SEED`). -- [`RunDBCheckInBackground`](./dbcheck_background.py) - A hook for running `dbCheck` on a replica set while a test is running. - - This includes dbhashes for all non-local databases and non-replicated system collections that match on the primary and secondaries. - - It also will check the performance results against any thresholds that are set for each benchmark. If no thresholds are set for a test, this hook should always pass. -- [`RunQueryStats`](./run_query_stats.py) - Runs `$queryStats` after every test, and clears the query stats store before every test. +- [`PeriodicKillSecondaries`](./periodic_kill_secondaries.py) - Periodically kills the secondaries + in a replica set. + - Also verifies that the secondaries can reach the SECONDARY state without having connectivity to + the primary after an unclean shutdown. +- [`PeriodicStackTrace`](./periodic_stack_trace.py) - Test hook that sends the stacktracing signal + to mongo processes at randomized intervals. +- [`QueryableServerHook`](./queryable_server_hook.py) - Starts the queryable server before each test + for queryable restores. Restarts the queryable server between tests. +- [`RotateExecutionControlParams`](./rotate_execution_control_params.py) - Periodically rotates + 'executionControlConcurrencyAdjustmentAlgorithm' and deprioritization server parameters to random + valid values. +- [`RunChangeStreamsInBackground`](./change_streams.py) - Run in the background full cluster change + streams while a test is running. Open and close the change stream every `1..10` tests (random + using `config.RANDOM_SEED`). +- [`RunDBCheckInBackground`](./dbcheck_background.py) - A hook for running `dbCheck` on a replica + set while a test is running. + - This includes dbhashes for all non-local databases and non-replicated system collections that + match on the primary and secondaries. + - It also will check the performance results against any thresholds that are set for each + benchmark. If no thresholds are set for a test, this hook should always pass. +- [`RunQueryStats`](./run_query_stats.py) - Runs `$queryStats` after every test, and clears the + query stats store before every test. - [`SimulateCrash`](./simulate_crash.py) - A hook to simulate crashes. - [`ValidateCollections`](./validate.py) - Run full validation. -- [`ValidateCollectionsInBackground`](./validate_background.py) - A hook to run background collection validation against test servers while a test is running. - - This will run on all collections in all databases on every stand-alone node, primary replica-set node, or primary shard node. -- [`ValidateDirectSecondaryReads`](./validate_direct_secondary_reads.py) - Only supported in suites that use `ReplicaSetFixture`. - - To be used with `set_read_preference_secondary.js` and `implicit_enable_profiler.js` in suites that read directly from secondaries in a replica set. Check the profiler collections of all databases at the end of the suite to verify that each secondary only ran the read commands it got directly from the shell. +- [`ValidateCollectionsInBackground`](./validate_background.py) - A hook to run background + collection validation against test servers while a test is running. + - This will run on all collections in all databases on every stand-alone node, primary replica-set + node, or primary shard node. +- [`ValidateDirectSecondaryReads`](./validate_direct_secondary_reads.py) - Only supported in suites + that use `ReplicaSetFixture`. + - To be used with `set_read_preference_secondary.js` and `implicit_enable_profiler.js` in suites + that read directly from secondaries in a replica set. Check the profiler collections of all + databases at the end of the suite to verify that each secondary only ran the read commands it + got directly from the shell. - [`WaitForReplication`](./wait_for_replication.py) - Wait for replication to complete. ## Interfaces -All hooks inherit from the [`buildscripts.resmokelib.testing.hooks.interface.Hook`](./interface.py) parent class and can override any subset of the following empty base methods: +All hooks inherit from the [`buildscripts.resmokelib.testing.hooks.interface.Hook`](./interface.py) +parent class and can override any subset of the following empty base methods: - `before_suite` - `before_test` - `after_test` - `after_suite` -At least 1 base method must be overridden, otherwise the hook will not do anything at all. During test suite execution, each hook runs its custom logic in the respective scenarios. Some customizable tasks that hooks can perform include: _validating data, deleting data, performing cleanup_, etc. +At least 1 base method must be overridden, otherwise the hook will not do anything at all. During +test suite execution, each hook runs its custom logic in the respective scenarios. Some customizable +tasks that hooks can perform include: _validating data, deleting data, performing cleanup_, etc. -- [`BGHook`](./bghook.py) - A hook that repeatedly calls `run_action()` in a background thread for the duration of the test suite. -- [`DataConsistencyHook`](./jsfile.py) - A hook for running a static JavaScript file that checks data consistency of the server. - - If the mongo shell process running the JavaScript file exits with a non-zero return code, then an `errors.ServerFailure` exception is raised to cause resmoke.py's test execution to stop. +- [`BGHook`](./bghook.py) - A hook that repeatedly calls `run_action()` in a background thread for + the duration of the test suite. +- [`DataConsistencyHook`](./jsfile.py) - A hook for running a static JavaScript file that checks + data consistency of the server. + - If the mongo shell process running the JavaScript file exits with a non-zero return code, then + an `errors.ServerFailure` exception is raised to cause resmoke.py's test execution to stop. - [`Hook`](./interface.py) - Common interface all Hooks will inherit from. - [`JSHook`](./jsfile.py) - A hook interface with a static JavaScript file to execute. -- [`PerClusterDataConsistencyHook`](./jsfile.py) - A hook that runs on each independent cluster of the fixture. +- [`PerClusterDataConsistencyHook`](./jsfile.py) - A hook that runs on each independent cluster of + the fixture. - The independent cluster itself may be another fixture. diff --git a/buildscripts/resmokelib/testing/testcases/README.md b/buildscripts/resmokelib/testing/testcases/README.md index 707f219af91..ffdd7d6c633 100644 --- a/buildscripts/resmokelib/testing/testcases/README.md +++ b/buildscripts/resmokelib/testing/testcases/README.md @@ -1,33 +1,52 @@ # TestCases -TestCases extend Python-based `unittest.TestCase` objects that resmoke can run as different "kinds" of tests. +TestCases extend Python-based `unittest.TestCase` objects that resmoke can run as different "kinds" +of tests. ## Supported TestCases -Specify any of the following as the `test_kind` in your [Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config: +Specify any of the following as the `test_kind` in your +[Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config: -- `all_versions_js_test`: [`AllVersionsJSTestCase`](./jstest.py) - Alias for JSTestCase for multiversion passthrough suites. - - It runs with all combinations of versions of replica sets and sharded clusters. The distinct name is picked up by task generation. +- `all_versions_js_test`: [`AllVersionsJSTestCase`](./jstest.py) - Alias for JSTestCase for + multiversion passthrough suites. + - It runs with all combinations of versions of replica sets and sharded clusters. The distinct + name is picked up by task generation. - `benchmark_test`: [`BenchmarkTestCase`](./benchmark_test.py) - A Benchmark test to execute. -- `bulk_write_cluster_js_test`: [`BulkWriteClusterTestCase`](./bulk_write_cluster_js_test.py) - A test to execute with connection data for multiple clusters passed through TestData. -- `cpp_integration_test`: [`CPPIntegrationTestCase`](./cpp_integration_test.py) - A C++ integration test to execute. -- `cpp_libfuzzer_test`: [`CPPLibfuzzerTestCase`](./cpp_libfuzzer_test.py) - A C++ libfuzzer test to execute. +- `bulk_write_cluster_js_test`: [`BulkWriteClusterTestCase`](./bulk_write_cluster_js_test.py) - A + test to execute with connection data for multiple clusters passed through TestData. +- `cpp_integration_test`: [`CPPIntegrationTestCase`](./cpp_integration_test.py) - A C++ integration + test to execute. +- `cpp_libfuzzer_test`: [`CPPLibfuzzerTestCase`](./cpp_libfuzzer_test.py) - A C++ libfuzzer test to + execute. - `cpp_unit_test`: [`CPPUnitTestCase`](./cpp_unittest.py) - A C++ unit test to execute. - `db_test`: [`DBTestCase`](./dbtest.py) - A dbtest to execute. -- `fsm_workload_test`: [`FSMWorkloadTestCase`](./fsm_workload_test.py) - A wrapper for several copies of a `_SingleFSMWorkloadTestCase` to execute. -- `js_test`: [`JSTestCase`](./jstest.py) - A wrapper for several copies of a `_SingleJSTestCase` to execute - - Around **75% of all suites use the `js_test` kind**. See [jstests/README.md](../../../../jstests/README.md) for specific guidance. +- `fsm_workload_test`: [`FSMWorkloadTestCase`](./fsm_workload_test.py) - A wrapper for several + copies of a `_SingleFSMWorkloadTestCase` to execute. +- `js_test`: [`JSTestCase`](./jstest.py) - A wrapper for several copies of a `_SingleJSTestCase` to + execute + - Around **75% of all suites use the `js_test` kind**. See + [jstests/README.md](../../../../jstests/README.md) for specific guidance. - `json_schema_test`: [`JSONSchemaTestCase`](./json_schema_test.py) - A JSON Schema test to execute. -- `magic_restore_js_test`: [`MagicRestoreTestCase`](./magic_restore_js_test.py) - A test to execute for running tests in a try/catch block. -- `mongos_test`: [`MongosTestCase`](./mongos_test.py) - A TestCase which runs a mongos binary with the given parameters. -- `multi_stmt_txn_passthrough`: [`MultiStmtTxnTestCase`](./multi_stmt_txn_test.py) - Test case for multi statement transactions. -- `parallel_fsm_workload_test`: [`ParallelFSMWorkloadTestCase`](./fsm_workload_test.py) - An FSM workload to execute. -- `pretty_printer_test`: [`PrettyPrinterTestCase`](./pretty_printer_testcase.py) - A pretty printer test to execute. +- `magic_restore_js_test`: [`MagicRestoreTestCase`](./magic_restore_js_test.py) - A test to execute + for running tests in a try/catch block. +- `mongos_test`: [`MongosTestCase`](./mongos_test.py) - A TestCase which runs a mongos binary with + the given parameters. +- `multi_stmt_txn_passthrough`: [`MultiStmtTxnTestCase`](./multi_stmt_txn_test.py) - Test case for + multi statement transactions. +- `parallel_fsm_workload_test`: [`ParallelFSMWorkloadTestCase`](./fsm_workload_test.py) - An FSM + workload to execute. +- `pretty_printer_test`: [`PrettyPrinterTestCase`](./pretty_printer_testcase.py) - A pretty printer + test to execute. - `py_test`: [`PyTestCase`](./pytest.py) - A python test to execute. -- `query_tester_self_test`: [`QueryTesterSelfTestCase`](./query_tester_self_test.py) - A QueryTester self-test to execute. -- `query_tester_server_test`: [`QueryTesterServerTestCase`](./query_tester_server_test.py) - A QueryTester server test to execute. -- `sdam_json_test`: [`SDAMJsonTestCase`](./sdam_json_test.py) - Server Discovery and Monitoring JSON test case. -- `server_selection_json_test`: [`ServerSelectionJsonTestCase`](./server_selection_json_test.py) - Server Selection JSON test case. +- `query_tester_self_test`: [`QueryTesterSelfTestCase`](./query_tester_self_test.py) - A QueryTester + self-test to execute. +- `query_tester_server_test`: [`QueryTesterServerTestCase`](./query_tester_server_test.py) - A + QueryTester server test to execute. +- `sdam_json_test`: [`SDAMJsonTestCase`](./sdam_json_test.py) - Server Discovery and Monitoring JSON + test case. +- `server_selection_json_test`: [`ServerSelectionJsonTestCase`](./server_selection_json_test.py) - + Server Selection JSON test case. - `sleep_test`: [`SleepTestCase`](./sleeptest.py) - SleepTestCase class. - `tla_plus_test`: [`TLAPlusTestCase`](./tla_plus_test.py) - A TLA+ specification to model-check. @@ -36,26 +55,36 @@ Specify any of the following as the `test_kind` in your [Suite](../../../../buil Top level interfaces: - [`TestCase`](./interface.py) - A test case to execute. The `run_test` method must be implemented. -- [`ProcessTestCase`](./interface.py) - Base class for TestCases that executes an external process. The `_make_process` method must be implemented. +- [`ProcessTestCase`](./interface.py) - Base class for TestCases that executes an external process. + The `_make_process` method must be implemented. Subclasses: -- [`JSRunnerFileTestCase`](./jsrunnerfile.py) - A test case with a static JavaScript runner file to execute. -- [`MultiClientsTestCase`](./jstest.py) - A wrapper for several copies of a SingleTestCase to execute. +- [`JSRunnerFileTestCase`](./jsrunnerfile.py) - A test case with a static JavaScript runner file to + execute. +- [`MultiClientsTestCase`](./jstest.py) - A wrapper for several copies of a SingleTestCase to + execute. - [`TestCaseFactory`](./interface.py) - Convenience interface to initialize and build test cases ## Fixture TestCases -These are testcases that are used to coordinate fixture lifecycles via resmoke's internal `FixtureTestCaseManager`. +These are testcases that are used to coordinate fixture lifecycles via resmoke's internal +`FixtureTestCaseManager`. -> NOTE This design does lead to seeing "extra" tests in a run, where a fixture sets up, your `N` tests are run, and the fixture tears down, so you see `N+2` "tests" passing via resmoke. +> NOTE This design does lead to seeing "extra" tests in a run, where a fixture sets up, your `N` +> tests are run, and the fixture tears down, so you see `N+2` "tests" passing via resmoke. - [`FixtureTestCase`](./fixture.py) - Base class for the fixture test cases. - [`FixtureSetupTestCase`](./fixture.py) - TestCase for setting up a fixture. - [`FixtureTeardownTestCase`](./fixture.py) - TestCase for tearing down a fixture. -- [`FixtureAbortTestCase`](./fixture.py) - TestCase for killing/aborting a fixture. Intended for use before archiving a failed test. - - When resmoke detects that a test has failed (and [archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is configured​), it dynamically generates a new `FixtureAbortTestCase` for immediate execution. This test case sends a `SIGABRT` to each running mongod process. +- [`FixtureAbortTestCase`](./fixture.py) - TestCase for killing/aborting a fixture. Intended for use + before archiving a failed test. + - When resmoke detects that a test has failed (and + [archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is + configured​), it dynamically generates a new `FixtureAbortTestCase` for immediate execution. + This test case sends a `SIGABRT` to each running mongod process. ## Testing TestCases -Self-tests for the testcases themselves can be found in [buildscripts/tests/resmokelib/testing/testcases/](../../../../buildscripts/tests/resmokelib/testing/testcases/) +Self-tests for the testcases themselves can be found in +[buildscripts/tests/resmokelib/testing/testcases/](../../../../buildscripts/tests/resmokelib/testing/testcases/) diff --git a/buildscripts/s3_binary/README.md b/buildscripts/s3_binary/README.md index 449f9b7b686..c8ae5a1853a 100644 --- a/buildscripts/s3_binary/README.md +++ b/buildscripts/s3_binary/README.md @@ -1,33 +1,55 @@ # S3 Binary -This is a small utility to help safely manage tool binaries that are stored in MongoDB's S3 bucket for the purpose of using in this repository's build, test, or release processes. +This is a small utility to help safely manage tool binaries that are stored in MongoDB's S3 bucket +for the purpose of using in this repository's build, test, or release processes. ### Security -Any time a binary is pulled down from the internet and executed, there is risk that the binary has been modified unintentionally. This tool creates a hash of the binary that the developer is uploads and stores a record of it in a programmatically accessible Python script (see `buildscripts/s3_binary/hashes.py`). When a tool uses the S3 binary, this interface forces a checksum of the binary before the binary is run, verifying the result against the value stored in `hashes.py` and stopping execution if it doesn't match. +Any time a binary is pulled down from the internet and executed, there is risk that the binary has +been modified unintentionally. This tool creates a hash of the binary that the developer is uploads +and stores a record of it in a programmatically accessible Python script (see +`buildscripts/s3_binary/hashes.py`). When a tool uses the S3 binary, this interface forces a +checksum of the binary before the binary is run, verifying the result against the value stored in +`hashes.py` and stopping execution if it doesn't match. ### Hermetic Guarantee -The other risk of relying on a binary stored in S3 is that if the binary is changed, that it will change the results of previously run tests or builds in continuous integration. This is not ideal since there are often cases where an old commit needs to be re-ran to reproduce user issues. Storing the hash in the repository and preventing modifications prevents accidental compatibility breaks of previous commits. +The other risk of relying on a binary stored in S3 is that if the binary is changed, that it will +change the results of previously run tests or builds in continuous integration. This is not ideal +since there are often cases where an old commit needs to be re-ran to reproduce user issues. Storing +the hash in the repository and preventing modifications prevents accidental compatibility breaks of +previous commits. ### Example Usage -Scenario: You have a developer tool called db-contrib-tool that you want to build into a binary, and then use that binary as part of a test process in 10gen/mongo. To use the s3_binary tool you would: +Scenario: You have a developer tool called db-contrib-tool that you want to build into a binary, and +then use that binary as part of a test process in 10gen/mongo. To use the s3_binary tool you would: 1. Create your binaries and put them into a single directory on your local system, ex: - /tmp/db-contrib-tool/db-contrib-tool-v1_windows.exe - /tmp/db-contrib-tool/db-contrib-tool-v1_linux + /tmp/db-contrib-tool/db-contrib-tool-v1_windows.exe /tmp/db-contrib-tool/db-contrib-tool-v1_linux -2. Invoke bazel run buildscripts/s3_binary:upload -- /tmp/db-contrib-tool s3://mdb-build-public/db-contrib-tool/v1 +2. Invoke bazel run buildscripts/s3_binary:upload -- /tmp/db-contrib-tool + s3://mdb-build-public/db-contrib-tool/v1 -3. Follow the prompts, this will then update your local `buildscripts/s3_binary/hashes.py` file mapping the s3 path of each binary to its sha256 hash. +3. Follow the prompts, this will then update your local `buildscripts/s3_binary/hashes.py` file + mapping the s3 path of each binary to its sha256 hash. -4. Update your test code to call: `download_s3_binary(f"s3://mdb-build-public/db-contrib-tool/v1/db-contrib-tool-v1_{os}{ext}")`. This will then automatically verify the download matches the hash at runtime. +4. Update your test code to call: + `download_s3_binary(f"s3://mdb-build-public/db-contrib-tool/v1/db-contrib-tool-v1_{os}{ext}")`. + This will then automatically verify the download matches the hash at runtime. -5. Create a commit with your new code that adds in the `download_s3_binary` call and the `buildscripts/s3_binary/hashes.py` modifications. +5. Create a commit with your new code that adds in the `download_s3_binary` call and the + `buildscripts/s3_binary/hashes.py` modifications. -The case above covers usage in Python. If using another language like starlark for Bazel dependencies, you would follow the same flow but copy the hashes into the starlark code instead of relying off of hashes.py. Please retain the modifications to hashes.py regardless to make it easy to use your binaries in python. +The case above covers usage in Python. If using another language like starlark for Bazel +dependencies, you would follow the same flow but copy the hashes into the starlark code instead of +relying off of hashes.py. Please retain the modifications to hashes.py regardless to make it easy to +use your binaries in python. ### Future Additions -In general, it's less error prone to have the entire flow of building, uploading, and using a binary all happen in an automated pipeline without developer interaction. In the future, this tool will be updated to be easily invocable from a continuous integration pipeline that performs the build and either returns the hashes to the user to be later committed, or automatically submits a PR to update them. +In general, it's less error prone to have the entire flow of building, uploading, and using a binary +all happen in an automated pipeline without developer interaction. In the future, this tool will be +updated to be easily invocable from a continuous integration pipeline that performs the build and +either returns the hashes to the user to be later committed, or automatically submits a PR to update +them. diff --git a/buildscripts/smoke_tests/README.md b/buildscripts/smoke_tests/README.md index 90fc77925fb..602fd2b2a64 100644 --- a/buildscripts/smoke_tests/README.md +++ b/buildscripts/smoke_tests/README.md @@ -55,8 +55,8 @@ bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-p ## Storage Execution -The smoke test suites for storage execution are divided up into components. The smoke test suite -for all of the components that storage execution owns can be run with the following: +The smoke test suites for storage execution are divided up into components. The smoke test suite for +all of the components that storage execution owns can be run with the following: ``` bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-bsoncolumn,server-collection-write-path,server-external-sorter,server-index-builds,server-key-string,server-storage-engine-integration,server-timeseries-bucket-catalog,server-tracking-allocators,server-ttl //... @@ -76,7 +76,8 @@ There are currently no smoke test integration tests for this component. ### Server-Collection-Write-Path -The unit and integration tests for the server-collection-write-path component can be run with the following: +The unit and integration tests for the server-collection-write-path component can be run with the +following: ``` bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-collection-write-path //... @@ -112,7 +113,8 @@ There are currently no smoke test integration tests for this component. ### Server-Storage-Engine-Integration -The unit and integration tests for the server-storage-engine-integration component can be run with the following: +The unit and integration tests for the server-storage-engine-integration component can be run with +the following: ``` bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-storage-engine-integration //... diff --git a/buildscripts/tests/resmoke_end2end/README.md b/buildscripts/tests/resmoke_end2end/README.md index 7fcd4d1f89b..091d9fa2121 100644 --- a/buildscripts/tests/resmoke_end2end/README.md +++ b/buildscripts/tests/resmoke_end2end/README.md @@ -10,7 +10,8 @@ mongodb_repo_root$ source python3-venv/bin/activate (python3-venv) mongodb_repo_root$ python buildscripts/resmoke.py run --suites resmoke_end2end_tests ``` -- Finer grained control of tests can also be run with by invoking python's unittest main by hand. E.g: +- Finer grained control of tests can also be run with by invoking python's unittest main by hand. + E.g: ``` (python3-venv) mongodb_repo_root$ python -m unittest -v buildscripts.tests.resmoke_end2end.test_resmoke.TestTestSelection.test_at_sign_as_replay_file diff --git a/docs/antithesis/README.md b/docs/antithesis/README.md index af2595c1481..8758d8da29b 100644 --- a/docs/antithesis/README.md +++ b/docs/antithesis/README.md @@ -4,24 +4,26 @@ Antithesis is a third party vendor with an environment that can perform network fuzzing. We can upload images containing `docker-compose.yml` files, which represent various MongoDB topologies, to -the Antithesis Docker registry. Antithesis runs `docker-compose up` from these images to spin up -the corresponding multi-container application in their environment and run a test suite. Network -fuzzing is performed on the topology while the test suite runs & a report is generated by -Antithesis identifying bugs. Check out -https://github.com/mongodb/mongo/wiki/Testing-MongoDB-with-Antithesis to see an example of how we -use Antithesis today. +the Antithesis Docker registry. Antithesis runs `docker-compose up` from these images to spin up the +corresponding multi-container application in their environment and run a test suite. Network fuzzing +is performed on the topology while the test suite runs & a report is generated by Antithesis +identifying bugs. Check out https://github.com/mongodb/mongo/wiki/Testing-MongoDB-with-Antithesis to +see an example of how we use Antithesis today. ## Base Images The `base_images` directory consists of the building blocks for creating a MongoDB test topology. -These images are uploaded to the Antithesis Docker registry [nightly](https://github.com/mongodb/mongo/blob/6cf8b162a61173eb372b54213def6dd61e1fd684/etc/evergreen_yml_components/variants/ubuntu/test_dev_master_and_lts_branches_only.yml#L28) during the -[`antithesis image build and push`](https://github.com/mongodb/mongo/blob/020632e3ae328f276b2c251417b5a39389af6141/etc/evergreen_yml_components/definitions.yml#L2823) function. +These images are uploaded to the Antithesis Docker registry +[nightly](https://github.com/mongodb/mongo/blob/6cf8b162a61173eb372b54213def6dd61e1fd684/etc/evergreen_yml_components/variants/ubuntu/test_dev_master_and_lts_branches_only.yml#L28) +during the +[`antithesis image build and push`](https://github.com/mongodb/mongo/blob/020632e3ae328f276b2c251417b5a39389af6141/etc/evergreen_yml_components/definitions.yml#L2823) +function. ### mongo_binaries -This image contains the latest `mongo`, `mongos` and `mongod` binaries. It can be used to -start a `mongod` instance, `mongos` instance or execute `mongo` commands. This is the main building -block for creating the System Under Test topology. +This image contains the latest `mongo`, `mongos` and `mongod` binaries. It can be used to start a +`mongod` instance, `mongos` instance or execute `mongo` commands. This is the main building block +for creating the System Under Test topology. ### workload @@ -36,16 +38,16 @@ buildscript/resmoke.py run --suite antithesis_concurrency_sharded_with_stepdowns **Every topology must have 1 workload container.** -Note: During `workload` image build, `evergreen/antithesis_image_build_and_push.sh` runs, which generates -"antithesis compatible" test suites and prepends them with `antithesis_`. These are the test suites -that can run in antithesis and are available from within the `workload` container. +Note: During `workload` image build, `evergreen/antithesis_image_build_and_push.sh` runs, which +generates "antithesis compatible" test suites and prepends them with `antithesis_`. These are the +test suites that can run in antithesis and are available from within the `workload` container. ### Dockerfile This assembles an image with the necessary files for spinning up the corresponding topology. It consists of a `docker-compose.yml`, a `logs` directory, a `scripts` directory and a `data` -directory. If this is structured properly, you should be able to copy the files & directories -from this image and run `docker-compose up` to set up the desired topology. +directory. If this is structured properly, you should be able to copy the files & directories from +this image and run `docker-compose up` to set up the desired topology. Example from what `buildscripts/resmokelib/testing/docker_cluster_image_builder.py` generates: @@ -67,8 +69,8 @@ therefore use `FROM scratch`. ### docker-compose.yml -This describes how to construct the corresponding topology using the -`mongo-binaries` and `workload` images. +This describes how to construct the corresponding topology using the `mongo-binaries` and `workload` +images. Example from `buildscripts/antithesis/topologies/sharded_cluster/docker-compose.yml`: @@ -162,15 +164,15 @@ networks: Each container must have a `command` in `docker-compose.yml` that runs an init script. The init script belongs in the `scripts` directory, which is included as a volume. The `command` should be -set like so: `/bin/bash /scripts/[script_name].sh` or `python3 /scripts/[script_name].py`. This is -a requirement for the topology to start up properly in Antithesis. +set like so: `/bin/bash /scripts/[script_name].sh` or `python3 /scripts/[script_name].py`. This is a +requirement for the topology to start up properly in Antithesis. When creating `mongod` or `mongos` instances, route the logs like so: -`--logpath /var/log/mongodb/mongodb.log` and utilize `volumes` -- as in `database1`. -This enables us to easily retrieve logs if a bug is detected by Antithesis. +`--logpath /var/log/mongodb/mongodb.log` and utilize `volumes` -- as in `database1`. This enables us +to easily retrieve logs if a bug is detected by Antithesis. -The `ipv4_address` should be set to `10.20.20.130` or higher if you do not want that container to -be affected by network fuzzing. For instance, you would likely not want the `workload` container +The `ipv4_address` should be set to `10.20.20.130` or higher if you do not want that container to be +affected by network fuzzing. For instance, you would likely not want the `workload` container to be affected by network fuzzing -- as shown in the example above. Use the `evergreen-latest-master` tag for all images. This is updated automatically in @@ -182,20 +184,26 @@ Take a look at `buildscripts/antithesis/topologies/sharded_cluster/scripts/mongo how to use util methods from `buildscripts/antithesis/topologies/sharded_cluster/scripts/utils.py` to set up the desired topology. You can also use simple shell scripts as in the case of `buildscripts/antithesis/topologies/sharded_cluster/scripts/database_init.py`. These init scripts -must not end in order to keep the underlying container alive. You can use an infinite while -loop for `python` scripts or you can use `tail -f /dev/null` for shell scripts. +must not end in order to keep the underlying container alive. You can use an infinite while loop for +`python` scripts or you can use `tail -f /dev/null` for shell scripts. ## How do I create a new topology for Antithesis testing? This should be done with care to ensure we are using our limited resources efficiently. -Create a new task extending the `antithesis_task_template`, tagged with `antithesis`, passing the specified `suite` to the `antithesis image build and push` task. See other examples to get started. +Create a new task extending the `antithesis_task_template`, tagged with `antithesis`, passing the +specified `suite` to the `antithesis image build and push` task. See other examples to get started. ## How do I test my suite in antithesis? -If you provide the evergreen parameter `schedule_antithesis_tests` to your evergreen patch, once we build the antithesis images in your evergreen patch we send antithesis an api request to run your newly created images for an hour. You will get emailed the report when it finishes running in antithesis. +If you provide the evergreen parameter `schedule_antithesis_tests` to your evergreen patch, once we +build the antithesis images in your evergreen patch we send antithesis an api request to run your +newly created images for an hour. You will get emailed the report when it finishes running in +antithesis. -Important Note: This will happen for every antithesis task you schedule in your patch. Please do not schedule more than 1 or 2 tasks with this parameter at a time or it will use up a lot of our testing time allocated with antithesis. +Important Note: This will happen for every antithesis task you schedule in your patch. Please do not +schedule more than 1 or 2 tasks with this parameter at a time or it will use up a lot of our testing +time allocated with antithesis. `evergreen patch --param schedule_antithesis_tests=true` @@ -203,10 +211,10 @@ Important Note: This will happen for every antithesis task you schedule in your ### Normal resmoke testing -Antithesis constantly runs your resmoke suite with one random test from the suite at a time. -We support this out-of-the-box with most resmoke suites that use python fixtures. -This is very similar to how tests run in evergreen. -Your antithesis tasks in evergreen will default to this if the `antithesis_test_composer_dir` var is not specified on the task. +Antithesis constantly runs your resmoke suite with one random test from the suite at a time. We +support this out-of-the-box with most resmoke suites that use python fixtures. This is very similar +to how tests run in evergreen. Your antithesis tasks in evergreen will default to this if the +`antithesis_test_composer_dir` var is not specified on the task. ### Test Composer @@ -222,4 +230,5 @@ Evergreen configuration details, see ## Additional Resources -If you are interested in leveraging Antithesis feel free to reach out to #ask-devprod-correctness or #server-testing on Slack. +If you are interested in leveraging Antithesis feel free to reach out to #ask-devprod-correctness or +#server-testing on Slack. diff --git a/docs/baton.md b/docs/baton.md index 5af0719f7e8..66628540f0d 100644 --- a/docs/baton.md +++ b/docs/baton.md @@ -1,11 +1,10 @@ # Server-Internal Baton Pattern -Batons are lightweight job queues in _mongod_ and _mongos_ processes that allow -recording the intent to execute a task (e.g., polling on a network socket) and -deferring its execution to a later time. Batons, often by reusing `Client` -threads and through the _Waitable_ interface, move the execution of scheduled -tasks out of the line, potentially hiding the execution cost from the critical -path. A total of four baton classes are available today: +Batons are lightweight job queues in _mongod_ and _mongos_ processes that allow recording the intent +to execute a task (e.g., polling on a network socket) and deferring its execution to a later time. +Batons, often by reusing `Client` threads and through the _Waitable_ interface, move the execution +of scheduled tasks out of the line, potentially hiding the execution cost from the critical path. A +total of four baton classes are available today: - [Baton][baton] - [DefaultBaton][defaultBaton] @@ -14,72 +13,74 @@ path. A total of four baton classes are available today: ## Baton Basics -All baton implementations extend _Baton_. They are tightly associated with an -`OperationContext` and its `Client` thread. An `OperationContext` that belongs -to a `ServiceContext` with a `TransportLayer` uses an `AsioNetworkingBaton`, -else a `DefaultBaton`. The baton is accessed through the `OperationContext` with -a call to `OperationContext::getBaton()`. +All baton implementations extend _Baton_. They are tightly associated with an `OperationContext` and +its `Client` thread. An `OperationContext` that belongs to a `ServiceContext` with a +`TransportLayer` uses an `AsioNetworkingBaton`, else a `DefaultBaton`. The baton is accessed through +the `OperationContext` with a call to `OperationContext::getBaton()`. -Each baton implementation exposes an interface to allow scheduling tasks on the -baton, to demand the awakening of the baton on client socket disconnect, and to -create a _SubBaton_. A _SubBaton_, for any of the baton types, is essentially a -handle to a local object that proxies scheduling requests to its underlying baton -until it is detached (e.g., through destruction of its handle). +Each baton implementation exposes an interface to allow scheduling tasks on the baton, to demand the +awakening of the baton on client socket disconnect, and to create a _SubBaton_. A _SubBaton_, for +any of the baton types, is essentially a handle to a local object that proxies scheduling requests +to its underlying baton until it is detached (e.g., through destruction of its handle). -Additionally, a _NetworkingBaton_ enables consumers of a transport layer to -execute I/O themselves, rather than delegating it to other threads. They are -special batons that are able to poll network sockets, which is not feasible -through other baton types. This is essential for minimizing context switches and -improving the readability of stack traces. +Additionally, a _NetworkingBaton_ enables consumers of a transport layer to execute I/O themselves, +rather than delegating it to other threads. They are special batons that are able to poll network +sockets, which is not feasible through other baton types. This is essential for minimizing context +switches and improving the readability of stack traces. -A baton runs automatically when blocking on its associated `OperationContext` -with a call to `OperationContext::waitForConditionOrInterrupt()`. Many different -apis that take in or use an _Interruptible_ will eventually call into this method -(e.g. `Future::get(...)`, `OperationContext::sleepUntil(...)`, etc.). +A baton runs automatically when blocking on its associated `OperationContext` with a call to +`OperationContext::waitForConditionOrInterrupt()`. Many different apis that take in or use an +_Interruptible_ will eventually call into this method (e.g. `Future::get(...)`, +`OperationContext::sleepUntil(...)`, etc.). ### DefaultBaton -DefaultBaton is the most basic baton implementation. This baton provides the -platform to execute tasks while a client thread awaits an event or a timeout, -essentially paving the way towards utilizing idle cycles of client threads for -useful work. Tasks can be scheduled on this baton through its associated -`OperationContext` and using `OperationContext::getBaton()::schedule(...)`. +DefaultBaton is the most basic baton implementation. This baton provides the platform to execute +tasks while a client thread awaits an event or a timeout, essentially paving the way towards +utilizing idle cycles of client threads for useful work. Tasks can be scheduled on this baton +through its associated `OperationContext` and using `OperationContext::getBaton()::schedule(...)`. -Note that because _Baton_ extends an _OutOfLineExecutor_, it can be used as the -executor to run work on an `ExecutorFuture`. +Note that because _Baton_ extends an _OutOfLineExecutor_, it can be used as the executor to run work +on an `ExecutorFuture`. ### AsioNetworkingBaton -The AsioNetworkingBaton can schedule and run tasks similarly to the _DefaultBaton_, -but it also implements the _NetworkingBaton_ interface to provide a networking -reactor. It can register sessions to monitor and will utilize `poll(2)` and -`eventfd(2)` to wait until I/O can be performed on the socket or until interrupted. +The AsioNetworkingBaton can schedule and run tasks similarly to the _DefaultBaton_, but it also +implements the _NetworkingBaton_ interface to provide a networking reactor. It can register sessions +to monitor and will utilize `poll(2)` and `eventfd(2)` to wait until I/O can be performed on the +socket or until interrupted. -This baton is primarily used for egress networking where it gets scheduled to send -off a command after a connection is made (see the relevant code [here][asioNetworkingBatonScheduling]). -This means that the AsioNetworkingBaton will normally perform socket I/O without -needing to poll. It only registers a session for polling if another read or -write is needed on the socket (e.g. [registering a session during socket read][asioNetworkingBatonPollingSetup]). +This baton is primarily used for egress networking where it gets scheduled to send off a command +after a connection is made (see the relevant code [here][asioNetworkingBatonScheduling]). This means +that the AsioNetworkingBaton will normally perform socket I/O without needing to poll. It only +registers a session for polling if another read or write is needed on the socket (e.g. [registering +a session during socket read][asioNetworkingBatonPollingSetup]). -In order for an egress session to use the baton, it must be specified as an -argument to `TaskExecutor::scheduleRemoteCommand(...)`. +In order for an egress session to use the baton, it must be specified as an argument to +`TaskExecutor::scheduleRemoteCommand(...)`. Note that this baton is only available for Linux. ## Example -For an example of scheduling a task on the `OperationContext` baton, see -[here][example]. +For an example of scheduling a task on the `OperationContext` baton, see [here][example]. ## Considerations -Since any task scheduled on a baton is intended for out-of-line execution, it -must be non-blocking and preferably short-lived to ensure forward progress. +Since any task scheduled on a baton is intended for out-of-line execution, it must be non-blocking +and preferably short-lived to ensure forward progress. -[baton]: https://github.com/mongodb/mongo/blob/5906d967c3144d09fab6a4cc1daddb295df19ffb/src/mongo/db/baton.h#L61-L178 -[defaultBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/db/default_baton.h#L46-L75 -[networkingBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton.h#L61-L96 -[asioNetworkingBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton_asio_linux.h#L60-L529 -[asioNetworkingBatonScheduling]: https://github.com/mongodb/mongo/blob/46b8c49b4e13cc4c8389b2822f9e30dd73b81d6e/src/mongo/executor/network_interface_tl.cpp#L910 -[asioNetworkingBatonPollingSetup]: https://github.com/mongodb/mongo/blob/eab4ec41cc2b28bf0a38eb813f9690e1bfa6c9a6/src/mongo/transport/asio/asio_session_impl.cpp#L666-L696 -[example]: https://github.com/mongodb/mongo/blob/262e5a961fa7221bfba5722aeea2db719f2149f5/src/mongo/s/multi_statement_transaction_requests_sender.cpp#L91-L99 +[baton]: + https://github.com/mongodb/mongo/blob/5906d967c3144d09fab6a4cc1daddb295df19ffb/src/mongo/db/baton.h#L61-L178 +[defaultBaton]: + https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/db/default_baton.h#L46-L75 +[networkingBaton]: + https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton.h#L61-L96 +[asioNetworkingBaton]: + https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton_asio_linux.h#L60-L529 +[asioNetworkingBatonScheduling]: + https://github.com/mongodb/mongo/blob/46b8c49b4e13cc4c8389b2822f9e30dd73b81d6e/src/mongo/executor/network_interface_tl.cpp#L910 +[asioNetworkingBatonPollingSetup]: + https://github.com/mongodb/mongo/blob/eab4ec41cc2b28bf0a38eb813f9690e1bfa6c9a6/src/mongo/transport/asio/asio_session_impl.cpp#L666-L696 +[example]: + https://github.com/mongodb/mongo/blob/262e5a961fa7221bfba5722aeea2db719f2149f5/src/mongo/s/multi_statement_transaction_requests_sender.cpp#L91-L99 diff --git a/docs/branching/README.md b/docs/branching/README.md index f4e681fc8e1..b0a3e84952a 100644 --- a/docs/branching/README.md +++ b/docs/branching/README.md @@ -1,6 +1,7 @@ # Branching -This document describes branching task regarding file updates in `10gen/mongo` repository that should be done on a new branch immediately after a branch cut. +This document describes branching task regarding file updates in `10gen/mongo` repository that +should be done on a new branch immediately after a branch cut. ## Table of contents @@ -14,11 +15,14 @@ This document describes branching task regarding file updates in `10gen/mongo` r ### GitHub App credentials -Add GitHub app credentials (app id and key) in the new project settings, eg. https://spruce.corp.mongodb.com/project/mongodb-mongo-v8.3/settings/github-app-settings (additional MANA permissions may be required, else coordinate with Release team contacts). +Add GitHub app credentials (app id and key) in the new project settings, eg. +https://spruce.corp.mongodb.com/project/mongodb-mongo-v8.3/settings/github-app-settings (additional +MANA permissions may be required, else coordinate with Release team contacts). ## 2. Create working branch -To save time during the branch cut these branching changes could be done beforehand, but not too early to avoid extra file conflicts, and then rebased on a new `vX.Y` branch. +To save time during the branch cut these branching changes could be done beforehand, but not too +early to avoid extra file conflicts, and then rebased on a new `vX.Y` branch. Create a working branch from `master` or from a new `vX.Y` branch if it already exists: @@ -30,13 +34,16 @@ git checkout -b vX.Y-branching-task ## 2. Update files -**IMPORTANT!** All of these changes should be a separate commit, but they should be pushed together in the same commit-queue task. +**IMPORTANT!** All of these changes should be a separate commit, but they should be pushed together +in the same commit-queue task. -The reason they should be pushed as separate commits is in the case of needing to revert one aspect of this entire task. +The reason they should be pushed as separate commits is in the case of needing to revert one aspect +of this entire task. > See [8.2 branching PR](https://github.com/mongodb/mongo/pull/38920/commits) for reference. -Some have some automated steps you can run, but please double-check their edits. Initialize the version here, used throughout: +Some have some automated steps you can run, but please double-check their edits. Initialize the +version here, used throughout: ```sh VERSION=8.3 @@ -51,7 +58,9 @@ sed -i "s/master/v$VERSION/g" copy.bara.sky sed -i 's/branch = "master"/branch = "v'"$VERSION"'"/' buildscripts/sync_repo_with_copybara.py ``` -For each file [`copy.bara.sky`](../../copy.bara.sky) and [`sync_repo_with_copybara.py`](../../buildscripts/sync_repo_with_copybara.py), the "master" branch references should be replaced with the new branch name. +For each file [`copy.bara.sky`](../../copy.bara.sky) and +[`sync_repo_with_copybara.py`](../../buildscripts/sync_repo_with_copybara.py), the "master" branch +references should be replaced with the new branch name. ### Evergreen YAML configurations @@ -63,16 +72,23 @@ Run the following automation and verify results: sed -i "s/suffix\"] = \"latest\"/suffix\"] = \"v$VERSION-latest\"/g" buildscripts/generate_version_expansions.py ``` -In the file [`buildscripts/generate_version_expansions.py`](../../buildscripts/generate_version_expansions.py), the "latest" suffixes should be replaced with the new branch name. +In the file +[`buildscripts/generate_version_expansions.py`](../../buildscripts/generate_version_expansions.py), +the "latest" suffixes should be replaced with the new branch name. #### 2. Nightly YAML -[`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) will be used as YAML configuration in the new `mongodb-mongo-vX.Y` evergreen project. +[`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) will be used as YAML configuration in +the new `mongodb-mongo-vX.Y` evergreen project. -This will move some build variants from `etc/evergreen.yml` to continue running on a new branch project. More information about build variants after branching is [here](../evergreen-testing/yaml_configuration/buildvariants.md#build-variants-after-branching). +This will move some build variants from `etc/evergreen.yml` to continue running on a new branch +project. More information about build variants after branching is +[here](../evergreen-testing/yaml_configuration/buildvariants.md#build-variants-after-branching). -- Copy over commit-queue aliases and patch aliases from [`etc/evergreen.yml`](../../etc/evergreen.yml) -- Update "include" section: comment out or uncomment file includes as instructions in the comments suggest. +- Copy over commit-queue aliases and patch aliases from + [`etc/evergreen.yml`](../../etc/evergreen.yml) +- Update "include" section: comment out or uncomment file includes as instructions in the comments + suggest. #### 3. Burn-in tasks @@ -82,7 +98,12 @@ Run the following automation and verify results: sed -i '/burn_in_tag_include_build_variants/{N;N;N;d;}' etc/evergreen_yml_components/variants/misc/misc.yml ``` -In the file [`etc/evergreen_yml_components/variants/misc/misc.yml`](../../etc/evergreen_yml_components/variants/misc/misc.yml), build variant names in the ["burn_in_tag_include_build_variants" expansion](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/etc/evergreen_yml_components/variants/misc/misc.yml#L21) that are _not_ included in [`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) are _removed_. +In the file +[`etc/evergreen_yml_components/variants/misc/misc.yml`](../../etc/evergreen_yml_components/variants/misc/misc.yml), +build variant names in the +["burn_in_tag_include_build_variants" expansion](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/etc/evergreen_yml_components/variants/misc/misc.yml#L21) +that are _not_ included in [`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) are +_removed_. #### 4. Suggested to Required @@ -94,7 +115,9 @@ sed -i 's@display_name: "\* Amazon Linux 2023 arm64 Enterprise"@display_name: "! sed -i 's/tags: \["suggested", "forbid_tasks_tagged_with_experimental"\]/tags: ["required", "forbid_tasks_tagged_with_experimental"]/g' etc/evergreen_yml_components/variants/amazon/test_dev.yml ``` -For the variant `enterprise-amazon-linux2023-arm64` in [`etc/evergreen_yml_components/variants/amazon/test_dev.yml`](../../etc/evergreen_yml_components/variants/amazon/test_dev.yml), replace: +For the variant `enterprise-amazon-linux2023-arm64` in +[`etc/evergreen_yml_components/variants/amazon/test_dev.yml`](../../etc/evergreen_yml_components/variants/amazon/test_dev.yml), +replace: - "\*" with "!" in their display names - "suggested" variant tag with "required" @@ -116,10 +139,12 @@ sed -i 's/!.incompatible_all_feature_flags/!.requires_all_feature_flags/g' $FILE For the build variant names: -- in [`etc/evergreen_yml_components/variants/windows/test_dev.yml`](../../etc/evergreen_yml_components/variants/windows/test_dev.yml): +- in + [`etc/evergreen_yml_components/variants/windows/test_dev.yml`](../../etc/evergreen_yml_components/variants/windows/test_dev.yml): - `enterprise-windows-all-feature-flags-required` - `enterprise-windows-all-feature-flags-non-essential` -- in [`etc/evergreen_yml_components/variants/sanitizer/test_dev.yml`](../../etc/evergreen_yml_components/variants/sanitizer/test_dev.yml): +- in + [`etc/evergreen_yml_components/variants/sanitizer/test_dev.yml`](../../etc/evergreen_yml_components/variants/sanitizer/test_dev.yml): - `linux-debug-aubsan-lite-all-feature-flags-required` @@ -130,9 +155,12 @@ For the build variant names: #### 6. Sys-perf YAML -[`etc/system_perf.yml`](../../etc/system_perf.yml) will be used as YAML configuration for a new `sys-perf-X.Y` evergreen project +[`etc/system_perf.yml`](../../etc/system_perf.yml) will be used as YAML configuration for a new +`sys-perf-X.Y` evergreen project -> Ensure that [DSI](https://github.com/10gen/dsi/blob/master/evergreen/system_perf/README.md#branching) has been updated with new branches +> Ensure that +> [DSI](https://github.com/10gen/dsi/blob/master/evergreen/system_perf/README.md#branching) has been +> updated with new branches Run the following automation and verify results: @@ -146,8 +174,13 @@ sed -i "s@evergreen/system_perf/master/variants.yml@evergreen/system_perf/$VERSI In the file [`etc/system_perf.yml`](../../etc/system_perf.yml), the following should be reflected: - Remove `evergreen/system_perf/master/master_variants.yml` from "include" section -- With the exception of `base.yml`, update all other entries that contain `master` in the path to contain `X.Y` in the path instead. (e.g. `evergreen/system_perf/master/variants.yml` should become `evergreen/system_perf/X.Y/variants.yml`). -- Update the [evergreen project variable](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-and-Distro-Settings#variables) `compile_project` in the new sys-perf-X.Y evergreen project to point to the new mongodb-mongo-vX.Y branch +- With the exception of `base.yml`, update all other entries that contain `master` in the path to + contain `X.Y` in the path instead. (e.g. `evergreen/system_perf/master/variants.yml` should become + `evergreen/system_perf/X.Y/variants.yml`). +- Update the + [evergreen project variable](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-and-Distro-Settings#variables) + `compile_project` in the new sys-perf-X.Y evergreen project to point to the new mongodb-mongo-vX.Y + branch #### 7. Evergreen project validation @@ -157,7 +190,10 @@ Run the following automation and verify results: sed -i 's/RELEASE_BRANCH = False/RELEASE_BRANCH = True/g' buildscripts/validate_evg_project_config.py ``` -In file [`buildscripts/validate_evg_project_config.py`](../../buildscripts/validate_evg_project_config.py), the `RELEASE_BRANCH` variable should be set to `True` to leverage a specialized shortcut conditional to `evaluate` the project, not `validate`. +In file +[`buildscripts/validate_evg_project_config.py`](../../buildscripts/validate_evg_project_config.py), +the `RELEASE_BRANCH` variable should be set to `True` to leverage a specialized shortcut conditional +to `evaluate` the project, not `validate`. #### 8. Coverity @@ -167,7 +203,8 @@ Run the following automation and verify results: sed -i "s/stream: mongo.master/stream: mongo.v$VERSION/g" etc/coverity.yml ``` -In the file [`etc/coverity.yml`](../../etc/coverity.yml), the "stream" should be updated to the new branch. +In the file [`etc/coverity.yml`](../../etc/coverity.yml), the "stream" should be updated to the new +branch. #### Finally: format and lint @@ -179,7 +216,8 @@ Run linters and formatters and fix anything that couldn't be autofixed. ## 3. Test changes -In case working branch was created from `master` branch, rebase it on a new `vX.Y` branch and fix file conflicts if any. +In case working branch was created from `master` branch, rebase it on a new `vX.Y` branch and fix +file conflicts if any. Schedule required patch on a new `mongodb-mongo-vX.Y` project: @@ -187,7 +225,8 @@ Schedule required patch on a new `mongodb-mongo-vX.Y` project: evergreen patch -p mongodb-mongo-vX.Y -a required ``` -If patch results reveal that some steps are missing or outdated in this file, make sure to update the branching documentation on a "master" branch accordingly. +If patch results reveal that some steps are missing or outdated in this file, make sure to update +the branching documentation on a "master" branch accordingly. ## 4. Merge changes diff --git a/docs/building.md b/docs/building.md index 0430d6ecaa5..d73acbd93ea 100644 --- a/docs/building.md +++ b/docs/building.md @@ -1,8 +1,7 @@ # Building MongoDB -Please note that prebuilt binaries are available on -[mongodb.org](http://www.mongodb.org/downloads) and may be the easiest -way to get started, rather than building from source. +Please note that prebuilt binaries are available on [mongodb.org](http://www.mongodb.org/downloads) +and may be the easiest way to get started, rather than building from source. To build MongoDB, you will need: @@ -20,13 +19,13 @@ To build MongoDB, you will need: - On Ubuntu, the lzma library is required. Install `liblzma-dev` - On Amazon Linux, the xz-devel library is required. `yum install xz-devel` - Python 3.13 -- About 13 GB of free disk space for the core binaries (`mongod`, - `mongos`, and `mongo`). +- About 13 GB of free disk space for the core binaries (`mongod`, `mongos`, and `mongo`). -If using a newer version of a C++ compiler than listed above, it may work. However the versions listed above have been verified to work. +If using a newer version of a C++ compiler than listed above, it may work. However the versions +listed above have been verified to work. -MongoDB supports the following architectures: arm64, ppc64le, s390x, -and x86-64. More detailed platform instructions can be found below. +MongoDB supports the following architectures: arm64, ppc64le, s390x, and x86-64. More detailed +platform instructions can be found below. ## Quick (re)Start @@ -45,23 +44,21 @@ If you only want to build the database server `mongod`: $ bazel build install-mongod -**_Note_**: For C++ compilers that are newer than the supported -version, the compiler may issue new warnings that cause MongoDB to -fail to build since the build system treats compiler warnings as -errors. To ignore the warnings, pass the switch -`--disable_warnings_as_errors=True` to the bazel command. +**_Note_**: For C++ compilers that are newer than the supported version, the compiler may issue new +warnings that cause MongoDB to fail to build since the build system treats compiler warnings as +errors. To ignore the warnings, pass the switch `--disable_warnings_as_errors=True` to the bazel +command. $ bazel build install-mongod --disable_warnings_as_errors=True -If you want to build absolutely everything (`mongod`, `mongo`, unit -tests, etc): +If you want to build absolutely everything (`mongod`, `mongo`, unit tests, etc): $ bazel build --build_tag_filters=mongo_binary //src/mongo/... ## Bazel Targets -The following targets can be named on the bazel command line to build and -install a subset of components: +The following targets can be named on the bazel command line to build and install a subset of +components: - `install-mongod` - `install-mongos` @@ -69,16 +66,15 @@ install a subset of components: - `install-dist` (includes all server components) - `install-devcore` (includes `mongod`, `mongos`, and `jstestshell` (formerly `mongo` shell)) -**_NOTE_**: The `install-core` and `install-dist` targets are _not_ -guaranteed to be identical. The `install-core` target will only ever include a -minimal set of "core" server components, while `install-dist` is intended -for a functional end-user installation. If you are testing, you should use the -`install-devcore` or `install-dist` targets instead. +**_NOTE_**: The `install-core` and `install-dist` targets are _not_ guaranteed to be identical. The +`install-core` target will only ever include a minimal set of "core" server components, while +`install-dist` is intended for a functional end-user installation. If you are testing, you should +use the `install-devcore` or `install-dist` targets instead. ## Where to find Binaries -The build system will produce an installation tree into `bazel-bin/install`, as well -individual install target trees like `bazel-bin/`. +The build system will produce an installation tree into `bazel-bin/install`, as well individual +install target trees like `bazel-bin/`. ## Windows @@ -97,8 +93,6 @@ To install dependencies on Debian or Ubuntu systems: ## OS X -Install Xcode 16.4 or newer. Make sure macOS 15.5 platform -is installed. +Install Xcode 16.4 or newer. Make sure macOS 15.5 platform is installed. -Install llvm and lld, version 19 from brew: -brew install llvm@19 lld@19 +Install llvm and lld, version 19 from brew: brew install llvm@19 lld@19 diff --git a/docs/change_streams.md b/docs/change_streams.md index 1f33ac33518..70804a41d2e 100644 --- a/docs/change_streams.md +++ b/docs/change_streams.md @@ -5,25 +5,23 @@ current version of master, if not explicitly stated otherwise. Implementation de versions may vary slightly. Change streams are a convenient way for an application to monitor changes made to the data in a -deployment. -The events produced by change streams are called "change events". The event data is produced from -the oplog(s) of the deployment. -The events that are emitted by change streams include +deployment. The events produced by change streams are called "change events". The event data is +produced from the oplog(s) of the deployment. The events that are emitted by change streams include - DML events: emitted for operations that insert, update, replace, or delete individual documents. - DDL events: emitted for operations that create, drop, or modify collections, databases, or views. -- Data placement events: emitted for operations that define or modify the placement of data inside - a sharded cluster. +- Data placement events: emitted for operations that define or modify the placement of data inside a + sharded cluster. - Cluster topology events: emitted for operations that add or remove shards in a sharded cluster. Which exact event types are emitted by a change stream depends on the change stream configuration and the deployment type. Change streams are mainly used by customer applications and tools to keep track of changes to the -data in a deployment, in order to relay these updates to external systems. -Some of MongoDB's own tools and components are also based on change streams, e.g. _mongosync_ (C2C), -Atlas Search, Atlas Stream Processing, and the resharding process. -The component that opens a change stream and pulls events from it is called the "consumer". +data in a deployment, in order to relay these updates to external systems. Some of MongoDB's own +tools and components are also based on change streams, e.g. _mongosync_ (C2C), Atlas Search, Atlas +Stream Processing, and the resharding process. The component that opens a change stream and pulls +events from it is called the "consumer". ## Change Stream Guarantees @@ -31,17 +29,16 @@ Change Streams provide various guarantees: - Ordering: change streams deliver events in the order they originally occurred within the target namespace (e.g., collection, database, or entire cluster). The order is based on the sequence in - which the operations were applied to the oplog. - In a sharded cluster, the events from multiple oplogs will be merged deterministically into a - single, ordered stream of change events. + which the operations were applied to the oplog. In a sharded cluster, the events from multiple + oplogs will be merged deterministically into a single, ordered stream of change events. - Durability and reproducability: change streams are based on the internal oplog, which is part of the deployment's replication mechanism. Change streams only deliver events after they have been committed to a majority of nodes and durably persisted, ensuring they will not be rolled back. - Exactly-once delivery: every event in a change stream is emitted exactly once, and no event that matches the change stream filter is skipped. - Resumability: change stream consumption can be interrupted due to transient errors (e.g. network - issues, node failures, application errors), but it can be resumed from the exact point where - the consumption stopped. This is made possible by the resume token (`_id` field) that accompanies + issues, node failures, application errors), but it can be resumed from the exact point where the + consumption stopped. This is made possible by the resume token (`_id` field) that accompanies every change event, which acts as a bookmark. This allows to the consumer to continue processing changes from the last known position without missing events. @@ -71,9 +68,8 @@ opened against standalone _mongod_ instances, as there is no oplog to generate t standalone mode. In replica set deployments, the change stream can be opened directly on any replica set member of -the deployment. -In sharded cluster deployments, the change stream must be opened against any of the deployment's -_mongos_ processes. +the deployment. In sharded cluster deployments, the change stream must be opened against any of the +deployment's _mongos_ processes. A change stream is opened by executing an `aggregate` command with a pipeline that contains at least the `$changeStream` pipeline stage. @@ -115,9 +111,8 @@ db.getSiblingDB("testDB").runCommand({ ``` The `aggregate` parameter must be set to `1` for database-level change streams, and the command must -be executed inside the desired database. -The internal namespace that is used by database-level change streams is `.$cmd.aggregate` -(where `` is the actual name of the database). +be executed inside the desired database. The internal namespace that is used by database-level +change streams is `.$cmd.aggregate` (where `` is the actual name of the database). ### Opening an All-Cluster Change Stream @@ -161,9 +156,8 @@ into smaller fragments, in order to avoid running into `BSONObjectTooLarge` erro ### Change Stream Start Time When opening a change stream without specifying an explicit point in time, the change stream will be -opened using the current time, and will report only change events that happened after that point -in time. -The current time here is +opened using the current time, and will report only change events that happened after that point in +time. The current time here is - the time of the latest majority-committed operation for replica set change streams, or - the value of the cluster's vector clock for sharded cluster change streams. @@ -174,9 +168,8 @@ parameter is specified as a logical timestamp. ### Resuming Change Streams -Change streams allow the consumer to resume the change stream after an error occurred. -To support resumability, change streams report a "resume token" inside the `_id` field of every -emitted event. +Change streams allow the consumer to resume the change stream after an error occurred. To support +resumability, change streams report a "resume token" inside the `_id` field of every emitted event. To resume a change stream after an error occurred, the resume token of a previously consumed event can be passed in one of the parameters `resumeAfter` or `startAfter` when opening a change stream. @@ -198,8 +191,7 @@ with a different `$match` expression may lead to different events being returned the event with the original resume token not being found in the new change stream. The resume tokens that are emitted by change streams are string values that contain a hexadecimal -encoding of the internal resume token data. -The internal resume token data contains +encoding of the internal resume token data. The internal resume token data contains - the cluster time of an event. - the version of the resume token format. @@ -212,11 +204,13 @@ The internal resume token data contains Resume tokens are versioned. Currently only version 2 is supported. Future versions may introduce new resume token versions. Client applications should treat resume -tokens as opaque identifiers and should not make any assumptions about the format or internals -or resume tokens, nor should they rely on the internal implementation details of resume tokens. +tokens as opaque identifiers and should not make any assumptions about the format or internals or +resume tokens, nor should they rely on the internal implementation details of resume tokens. -Resume tokens are serialized and deserialized by the [ResumeToken](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L148) -class. The resume token internal data is stored in [ResumeTokenData](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L51). +Resume tokens are serialized and deserialized by the +[ResumeToken](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L148) +class. The resume token internal data is stored in +[ResumeTokenData](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L51). #### Resume Token Types @@ -225,12 +219,12 @@ There are two types of resume tokens: - event resume tokens - high watermark resume tokens -The former stem from actual change events. -High watermark token are a special kind of change stream resume token that represent a logical -position in the global change stream ordered only by cluster time, not a specific event. +The former stem from actual change events. High watermark token are a special kind of change stream +resume token that represent a logical position in the global change stream ordered only by cluster +time, not a specific event. -High watermark tokens sort strictly before any real event token at the same cluster time. -That is, a high‑watermark token for time T sorts ahead of all events whose cluster time >= T. +High watermark tokens sort strictly before any real event token at the same cluster time. That is, a +high‑watermark token for time T sorts ahead of all events whose cluster time >= T. #### Decoding Resume Tokens @@ -267,43 +261,42 @@ by the consumer or the change stream runs into an error. Also, unused cursors ar garbage-collected after a period of inactivity. When opening a change stream on a sharded cluster, the targeted `mongos` instance will open the -required cursors on the relevant shards of the cluster and also the config server. Here, the `mongos` -instance will also automatically open additional cursors in case new shards are added to the -cluster. All this is abstracted from the consumer of the change stream. The consumer of the change -stream will only see a single cursor and interact with _mongos_, which handles the complexity of -managing the underlying shard cursors. +required cursors on the relevant shards of the cluster and also the config server. Here, the +`mongos` instance will also automatically open additional cursors in case new shards are added to +the cluster. All this is abstracted from the consumer of the change stream. The consumer of the +change stream will only see a single cursor and interact with _mongos_, which handles the complexity +of managing the underlying shard cursors. If a change stream cursor can be successfully established, the cursor id is returned to the consumer. The consumer can then use the cursor id to pull change events from the change stream by issuing follow-up `getMore` commands to this cursor. -If a change stream cursor cannot be successfully opened, the initial `aggregate` command will -return an error, and the returned cursor id will be `0`. In this case, no events can be consumed -from the change stream, and the consumer needs to resolve the error. +If a change stream cursor cannot be successfully opened, the initial `aggregate` command will return +an error, and the returned cursor id will be `0`. In this case, no events can be consumed from the +change stream, and the consumer needs to resolve the error. ### Change Stream errors When a change stream is opened at a specific point in time, it is validated that the oplog of all -participating nodes actually contains data for this point in time. -If the oplog does not contain any data for the exact point in time or before, it would be possible -that the requested data has already fallen off the oplog. -In case no oplog entry can be found that is at least as old as the specified timetamp, opening the -change stream will fail with error code `OplogQueryMinTsMissing`. -This validation happens for all change streams, regardless if the start timestamp is specified via -the `resumeAfter`, `startAfter` or `startAtOperationTime` parameters, or if the start time is -implied from the current time. -An exception in which opening a change stream at a later point in time than the timestamp of the -first present oplog entry is permitted is for new shard primaries. -New shard primary can be added to an existing cluster at any point in time. When a new shard primary -is added, its first oplog entry will be a no-op entry with `msg` == `initiating set` (on ASC) or -`msg` == `new primary` (on DSC). +participating nodes actually contains data for this point in time. If the oplog does not contain any +data for the exact point in time or before, it would be possible that the requested data has already +fallen off the oplog. In case no oplog entry can be found that is at least as old as the specified +timetamp, opening the change stream will fail with error code `OplogQueryMinTsMissing`. This +validation happens for all change streams, regardless if the start timestamp is specified via the +`resumeAfter`, `startAfter` or `startAtOperationTime` parameters, or if the start time is implied +from the current time. An exception in which opening a change stream at a later point in time than +the timestamp of the first present oplog entry is permitted is for new shard primaries. New shard +primary can be added to an existing cluster at any point in time. When a new shard primary is added, +its first oplog entry will be a no-op entry with `msg` == `initiating set` (on ASC) or `msg` == +`new primary` (on DSC). -The code for this can be found [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/classic/collection_scan.cpp#L195-L227). +The code for this can be found +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/classic/collection_scan.cpp#L195-L227). Another common error is `ChangeStreamHistoryLost`. This error is raised when a change stream is opened with a resume token that cannot be found (anymore) in any of the participating nodes' oplogs. -This can either happen when the resume event has actually fallen off the oplog, or, when a -change stream is resumed with the resume token from another change stream with a different `$match` +This can either happen when the resume event has actually fallen off the oplog, or, when a change +stream is resumed with the resume token from another change stream with a different `$match` expression. In this case, the new change stream may filter out the resume event due to the different `$match` expression, so it cannot be found anymore. @@ -342,9 +335,9 @@ request: - `maxTimeMS`: maximum server-side waiting time for producing events. The `getMore` command will fill the response with up to `batchSize` results if that many events are -available. A response can also contain less events than the specified `batchSize`. -Regardless of the specified batch size, the maximum response size limit of 16MB will be honored, in -order to prevent responses from getting too large. +available. A response can also contain less events than the specified `batchSize`. Regardless of the +specified batch size, the maximum response size limit of 16MB will be honored, in order to prevent +responses from getting too large. A change stream response is returned to the consumer when @@ -353,14 +346,13 @@ A change stream response is returned to the consumer when would make it exceed the 16MB size limit. In case the change stream cursor has reached the end of the oplog and there are currently no events -to return, the response will be returned immediately if it already contains at least one event. -If the response is empty, the change stream will wait for at most `maxTimeMS` for new oplog entries -to arrive. -If no new oplog entries arrive within `maxTimeMS`, an empty response will be returned. If new oplog -entries arrive within `maxTimeMS` and at least one of them matches the change stream's filter, the -matching event will be returned immediately. If oplog entries arrive but do not match the change -stream's filter, the change stream will wait for matching oplog entries until `maxTimeMS` is fully -expired. +to return, the response will be returned immediately if it already contains at least one event. If +the response is empty, the change stream will wait for at most `maxTimeMS` for new oplog entries to +arrive. If no new oplog entries arrive within `maxTimeMS`, an empty response will be returned. If +new oplog entries arrive within `maxTimeMS` and at least one of them matches the change stream's +filter, the matching event will be returned immediately. If oplog entries arrive but do not match +the change stream's filter, the change stream will wait for matching oplog entries until `maxTimeMS` +is fully expired. ### Generic Event layout @@ -379,8 +371,8 @@ The following generic fields are added for change streams that were opened with - `collectionUUID`: UUID of the collection for which the event occurred, if applicable. - `operationDescription`: populated for DDL events. -Most other fields are event type-specific, so they are only present for specific events. -A few such fields include: +Most other fields are event type-specific, so they are only present for specific events. A few such +fields include: - `documentKey`: the `_id` value of the affected document, populated for DML events. May contain the shard key values for sharded collections. @@ -389,9 +381,11 @@ A few such fields include: value than `default`. - `updateDescription` / `rawUpdateDescription`: contains details for "update" events. -The majority of change stream event fields are emitted by the `ChangeStreamDefaultEventTransformation` -object [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321). This object is called by the `ChangeStreamEventTransform` -stage [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75). +The majority of change stream event fields are emitted by the +`ChangeStreamDefaultEventTransformation` object +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321). +This object is called by the `ChangeStreamEventTransform` stage +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75). A custom `$project` stage in the change stream pipeline can be used to suppress certain fields. @@ -401,8 +395,8 @@ Emitted change events can get large, especially if they contain pre- or post-ima the events can exceed the maximum BSON object size of 16MB, which can lead to `BSONObjectTooLarge` errors when trying to process these change stream events. -To split large change stream events into multiple smaller chunks, change stream consumers can add -a `$changeStreamSplitLargeEvent` stage as the last step of their change stream pipeline, e.g. +To split large change stream events into multiple smaller chunks, change stream consumers can add a +`$changeStreamSplitLargeEvent` stage as the last step of their change stream pipeline, e.g. ```js db.getSiblingDB("testDB").runCommand({ @@ -419,8 +413,10 @@ db.getSiblingDB("testDB").runCommand({ }); ``` -The splitting is performed by the `ChangeStreamSplitLargeEventStage` stage [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_split_large_event_stage.cpp#L72), -using [this helper function](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_split_event_helpers.cpp#L63). +The splitting is performed by the `ChangeStreamSplitLargeEventStage` stage +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_split_large_event_stage.cpp#L72), +using +[this helper function](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_split_event_helpers.cpp#L63). The change stream consumer is responsible for assembling the split event fragments into a single event later. @@ -434,10 +430,9 @@ close the change stream cursor in specific situations: - the target collection is renamed - the parent database of the target collection is dropped - in database-level change streams, the change stream is invalidated if the target database is - dropped. - In case a change stream gets invalidated by any of the above situations, it will emit a special - "invalidate" event to inform the consumer that further processing is not possible. - There are no "invalidate" events in all-cluster change streams. + dropped. In case a change stream gets invalidated by any of the above situations, it will emit a + special "invalidate" event to inform the consumer that further processing is not possible. There + are no "invalidate" events in all-cluster change streams. Issuing of change stream invalidate events is implemented in the `ChangeStreamCheckInvalidateStage` [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106-L157). @@ -445,12 +440,13 @@ Issuing of change stream invalidate events is implemented in the `ChangeStreamCh ## Change Stream Parameters The behavior of change streams can be controlled via various parameters that can be passed with the -initial `aggregate` command used to open the change stream. -The parameters are defined in an [IDL file](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.idl#L84). +initial `aggregate` command used to open the change stream. The parameters are defined in an +[IDL file](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.idl#L84). The parameters that are provided when opening the change stream are automatically validated using mechanisms provided by the IDL framework. Additional validation of the change stream parameters is -performed [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.cpp#L391). +performed +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.cpp#L391). Invalid change stream parameters are immediately rejected with appropriate errors. ### `fullDocument` @@ -466,17 +462,16 @@ The following values are possible: may not be the same version of the document that was present when the "update" change event was originally recorded. If no document can be found by the lookup, the `fullDocument` field will contain `null`. -- `whenAvailable`: the `fullDocument` field will be populated with the post-image for the event. - The post-image is generated on the fly from a stored pre-image and applying a delta update from - the event on top of it. If no post-image is available, the `fullDocument` field will contain - `null`. +- `whenAvailable`: the `fullDocument` field will be populated with the post-image for the event. The + post-image is generated on the fly from a stored pre-image and applying a delta update from the + event on top of it. If no post-image is available, the `fullDocument` field will contain `null`. - `required`: populates the `fullDocument` field with the post-image for the event. Post-images are generated in the same way as in `whenAvailable`. If no post-image can be generated, this will abort the change stream with a `NoMatchingDocument` error. -The latter two options rely on pre-images to be enabled for the target collection(s). -When pre-images are enabled, they are written synchronously with the regular "update" oplog entry, -and change stream events aren’t returned until both have been majority-committed. +The latter two options rely on pre-images to be enabled for the target collection(s). When +pre-images are enabled, they are written synchronously with the regular "update" oplog entry, and +change stream events aren’t returned until both have been majority-committed. Post-images for "update" events are added to change events by the `ChangeStreamAddPostImage` stage [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84). @@ -506,29 +501,25 @@ parameters are: #### `showExpandedEvents` (public) The `showExpandedEvents` flag can be used to make a change stream return both additional event types -and additional fields. -The flag defaults to `false`. In this mode, change streams will only return DML events and no DDL -events. -When setting `showExpandedEvents` to `true`, change streams will also emit events for various DDL -operations. -In addition, setting `showExpandedEvents` will make change streams return the additional fields -`collectionUUID` (for various change stream event types) and `updateDescription.disambiguatedPaths` -(for update events). +and additional fields. The flag defaults to `false`. In this mode, change streams will only return +DML events and no DDL events. When setting `showExpandedEvents` to `true`, change streams will also +emit events for various DDL operations. In addition, setting `showExpandedEvents` will make change +streams return the additional fields `collectionUUID` (for various change stream event types) and +`updateDescription.disambiguatedPaths` (for update events). #### `matchCollectionUUIDForUpdateLookup` (public) The `matchCollectionUUIDForUpdateLookup` field can be used to ensure that "updateLookup" operations are performed on the correct collection in case multiple collections with the same name have existed -over time. -This is relevant, because change streams can be opened retroactively on collections that were already -dropped and may have been recreated with the same name but different contents afterwards. +over time. This is relevant, because change streams can be opened retroactively on collections that +were already dropped and may have been recreated with the same name but different contents +afterwards. The flag defaults to `false`. In this case, "updateLookup" operations will not verify that the looked-up document is actually from the same collection "generation" as the change event the -document was looked up for. -If set to `true`, "updateLookup" operations will compare the collection UUID of the change event -with the UUID of the collection. If there is a UUID mismatch, the returned `fullDocument` field of -the event will be set to `null`. +document was looked up for. If set to `true`, "updateLookup" operations will compare the collection +UUID of the change event with the UUID of the collection. If there is a UUID mismatch, the returned +`fullDocument` field of the event will be set to `null`. #### `allChangesForCluster` (public) @@ -539,29 +530,28 @@ automatically when opening an all-cluster change stream. The `showSystemEvents` flag can be used to make change streams return events for collections inside the `system` namespace. These are not emitted by default. Setting `showSystemEvents` to `true` will -also include events related to system collections in the change stream. -The flag defaults to `false` and is internal. +also include events related to system collections in the change stream. The flag defaults to `false` +and is internal. #### `showMigrationEvents` (internal) The `showMigrationEvents` flag can be used to make change streams return DML events that are happening during chunk migrations. If set to `true`, insert and delete events related to chunk -migrations will be reported as if they were regular events. -The flag defaults to `false` and is internal. +migrations will be reported as if they were regular events. The flag defaults to `false` and is +internal. #### `showCommitTimestamp` (internal) The `showCommitTimestamp` flag can be used to include the transaction commit timestamp inside DML -events that were part of a prepared transaction. -The flag defaults to `true` and is internal. It is used by the resharding. +events that were part of a prepared transaction. The flag defaults to `true` and is internal. It is +used by the resharding. #### `showRawUpdateDescription` (internal) The `showRawUpdateDescription` flag can be used to make change streams emit the raw, internal format -used for "update" oplog entries. -If set to `true`, emitted change stream "update" events will contain a `rawUpdateDescription` field. -The default is `false`. In this case, emitted change stream "update" events will contain the regular -`updateDescription` field. +used for "update" oplog entries. If set to `true`, emitted change stream "update" events will +contain a `rawUpdateDescription` field. The default is `false`. In this case, emitted change stream +"update" events will contain the regular `updateDescription` field. #### `allowToRunOnConfigDB` (internal) @@ -572,9 +562,9 @@ server to keep track of shard additions and removals in the deployment. #### `$_passthroughToShard` (internal) In sharded cluster deployments, all change streams are supposed to be opened on _mongos_. _mongos_ -will open the required cursors to the data shards and the config server on the consumer's behalf. -If the consumer only wants to target a specific shard of the cluster, they can use the `$_passthroughToShard` -aggregation parameter to limit the change stream to a single shard. +will open the required cursors to the data shards and the config server on the consumer's behalf. If +the consumer only wants to target a specific shard of the cluster, they can use the +`$_passthroughToShard` aggregation parameter to limit the change stream to a single shard. For example, to open a collection-level change stream targeting only one of the cluster's shards (identified by the value in `shardId`), the following example code can be used: @@ -592,8 +582,8 @@ db.getSiblingDB("testDB").runCommand({ }); ``` -Using `$_passthroughToShard` will bypass the regular cluster shard targeting for change streams -and open a replica set change stream pipeline (only) on the targeted shard. The change events that +Using `$_passthroughToShard` will bypass the regular cluster shard targeting for change streams and +open a replica set change stream pipeline (only) on the targeted shard. The change events that mongos retrieves from the single shard will be returned as is, without using a merge pipeline on _mongos_. @@ -609,23 +599,26 @@ stream against a _mongos_ instance. The _mongos_ instance will then use the clus information to open the cursors on the config server and the data shards on behalf of the consumer. Because of the ordering guarantee provided by change streams, _mongos_ must wait until all cursors have either responded with events, or ran into a timeout and reported that currently no more events -are available for them. -The latter is why change streams in a sharded cluster can have higher latency than change streams -in replica sets. +are available for them. The latter is why change streams in a sharded cluster can have higher +latency than change streams in replica sets. For sharded cluster change streams, the merging of the multiple streams of change events from the -different cursors is performed by the [`AsyncResultsMerger`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/s/query/exec/async_results_merger.h#L100). +different cursors is performed by the +[`AsyncResultsMerger`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/s/query/exec/async_results_merger.h#L100). ## Change Stream Pipeline Building -A change stream pipeline issued by a consumer contains the `$changeStream` meta stage. -This stage is expanded internally into multiple `DocumentSource`s [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_pipeline_helpers.cpp#L171). +A change stream pipeline issued by a consumer contains the `$changeStream` meta stage. This stage is +expanded internally into multiple `DocumentSource`s +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_pipeline_helpers.cpp#L171). -The change stream `DocumentSource`s are located in the `src/mongo/db/pipeline` directory [here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline), among other `DocumentSource`s that -are not related to change streams. -The `DocumentSource`s are only used for pipeline building and optimization, but they are converted -into execution `Stage`s later when the change stream is executed. -These `Stage`s are located in the `src/mongo/db/exec/agg` directory [here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg). +The change stream `DocumentSource`s are located in the `src/mongo/db/pipeline` directory +[here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline), +among other `DocumentSource`s that are not related to change streams. The `DocumentSource`s are only +used for pipeline building and optimization, but they are converted into execution `Stage`s later +when the change stream is executed. These `Stage`s are located in the `src/mongo/db/exec/agg` +directory +[here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg). ### Replica Set Pipelines @@ -634,13 +627,14 @@ On a replica set, the `$changeStream` stage is expanded into the following inter - `$_internalChangeStreamOplogMatch` - `$_internalChangeStreamUnwindTransaction` - `$_internalChangeStreamTransform` -- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level change - streams) +- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level + change streams) - `$_internalChangeStreamCheckResumability` -- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to `off`) +- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to + `off`) - `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`) -- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token is - not a high water mark token) +- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token + is not a high water mark token) - user-defined `$match` expression (only present if the user's change stream pipeline contains a `$match` stage) - user-defined `$project` expression (only present if the user's change stream pipeline contains a @@ -648,8 +642,8 @@ On a replica set, the `$changeStream` stage is expanded into the following inter - `$_internalChangeStreamSplitLargeEvent` (only present if the change stream is opened with the `$changeStreamSplitLargeEvent` pipeline step) -The change stream pipeline on replica sets will also contain a `$match` stage to filter out all non-DML -change events in case `showExpandedEvents` is not set. +The change stream pipeline on replica sets will also contain a `$match` stage to filter out all +non-DML change events in case `showExpandedEvents` is not set. ### Sharded Cluster Pipelines @@ -659,10 +653,11 @@ following internal stages: - `$_internalChangeStreamOplogMatch` - `$_internalChangeStreamUnwindTransaction` - `$_internalChangeStreamTransform` -- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level change - streams) +- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level + change streams) - `$_internalChangeStreamCheckResumability` -- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to `off`) +- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to + `off`) - `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`) - user-defined `$match` expression (only present if the user's change stream pipeline contains a `$match` stage) @@ -674,8 +669,8 @@ following internal stages: --- - `$_internalChangeStreamHandleTopologyChange` -- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token is - not a high water mark token) +- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token + is not a high water mark token) Additionally, the change stream pipeline on a sharded cluster will contain a `$match` stage to filter out all non-DML change events in case `showExpandedEvents` is not set. @@ -685,9 +680,9 @@ After building the initial pipeline stages, _mongos_ will split the pipeline int - a part that is executed on data shards ("shard pipeline") and - a part that is executed on _mongos_ ("merge pipeline"). -The pipeline split point is above the `$_internalChangeStreamHandleTopologyChange` stage. -_mongos_ will also add a `$mergeCursors` stage that aggregates the responses from different shards -and the config server into a single, sorted stream. +The pipeline split point is above the `$_internalChangeStreamHandleTopologyChange` stage. _mongos_ +will also add a `$mergeCursors` stage that aggregates the responses from different shards and the +config server into a single, sorted stream. #### Data Shard Pipeline @@ -696,15 +691,16 @@ The shard pipeline will look like this: - `$_internalChangeStreamOplogMatch` - `$_internalChangeStreamUnwindTransaction` - `$_internalChangeStreamTransform` -- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level change - streams) +- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level + change streams) - `$_internalChangeStreamCheckResumability` -- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to `off`) +- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to + `off`) - `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`) - user-defined `$match` expression (only present if the user's change stream pipeline contains a `$match` stage) -- user-defined `$project` expression (only present if the change stream pipeline contains a `$project` - stage) +- user-defined `$project` expression (only present if the change stream pipeline contains a + `$project` stage) - `$_internalChangeStreamSplitLargeEvent` (only present if the change stream is opened with the `$changeStreamSplitLargeEvent` pipeline step) @@ -714,16 +710,18 @@ The merge pipeline on _mongos_ will look like this: - `$mergeCursors` - `$_internalChangeStreamHandleTopologyChange` -- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token is - not a high water mark token) +- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token + is not a high water mark token) ### Details of individual Pipeline Stages #### `$_internalChangeStreamOplogMatch` -This stage is responsible for reading data from the oplog and filtering out irrelevant events. -The `DocumentSourceChangeStreamOplogMatch` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.h#L61). -The oplog filter for the stage is built [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.cpp#L79). +This stage is responsible for reading data from the oplog and filtering out irrelevant events. The +`DocumentSourceChangeStreamOplogMatch` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.h#L61). +The oplog filter for the stage is built +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.cpp#L79). There is no `Stage` equivalent for `DocumentSourceChangeStreamOplogMatch`, as it will be turned into a `$cursor` stage for execution. @@ -731,28 +729,35 @@ a `$cursor` stage for execution. #### `$_internalChangeStreamUnwindTransaction` This stage is responsible for "unwinding" (expanding) multiple operations that are contained in an -"applyOps" oplog entry into individual events. -The `DocumentSourceChangeStreamUnwindTransaction` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.h#L71). -The `ChangeStreamUnwindTransactionStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.cpp#L83). +"applyOps" oplog entry into individual events. The `DocumentSourceChangeStreamUnwindTransaction` +code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.h#L71). +The `ChangeStreamUnwindTransactionStage` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.cpp#L83). #### `$_internalChangeStreamTransform` This stage is responsible for converting oplog entries into change events. It will build a change -event document for every oplog entry that enters this stage. -Event fields are added based on the change stream configuration. -The `DocumentSourceChangeStreamTransform` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_transform.h#L60). -The `ChangeStreamTransformStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75). -The actual event transformation happens inside `ChangeStreamDefaultEventTransformation` [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321). +event document for every oplog entry that enters this stage. Event fields are added based on the +change stream configuration. The `DocumentSourceChangeStreamTransform` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_transform.h#L60). +The `ChangeStreamTransformStage` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75). +The actual event transformation happens inside `ChangeStreamDefaultEventTransformation` +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321). #### `$_internalChangeStreamCheckInvalidate` This stage is responsible for creating change stream "invalidate" events and is only added for -collection-level and database-level change streams. -The `DocumentSourceChangeStreamCheckInvalidate` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_invalidate.h#L65). -The `ChangeStreamCheckInvalidate` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106). +collection-level and database-level change streams. The `DocumentSourceChangeStreamCheckInvalidate` +code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_invalidate.h#L65). +The `ChangeStreamCheckInvalidate` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106). When an invalidate event is encountered, the stage will first emit an "invalidate" event, and then -throws a `ChangeStreamInvalidated` exception on the next call. The [`ChangeStreamInvalidatedInfo`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_invalidation_info.h#L47). +throws a `ChangeStreamInvalidated` exception on the next call. The +[`ChangeStreamInvalidatedInfo`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_invalidation_info.h#L47). exception type contains the error code `ChangeStreamInvalidated`. #### `$_internalChangeStreamCheckResumability` @@ -761,18 +766,22 @@ This stage checks if the oplog has enough history to resume the change stream, a events up to the given resume point. If no data for the resume point can be found in the oplog anymore, it will throw a `ChangeStreamHistoryLost` error. -The `DocumentSourceChangeStreamCheckResumability` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_resumability.h#L79). -The `ChangeStreamCheckResumabilityStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_resumability_stage.cpp#L68). +The `DocumentSourceChangeStreamCheckResumability` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_resumability.h#L79). +The `ChangeStreamCheckResumabilityStage` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_resumability_stage.cpp#L68). #### `$_internalChangeStreamAddPreImage` This stage is responsible for adding pre-image data to "update", "replace" and "delete" events. It is only added to change stream pipelines if the `fullDocumentBeforeChange` parameter is not set to -`off`. -If enabled, the stage relies on the pre-images stored in the system's pre-image system collection. +`off`. If enabled, the stage relies on the pre-images stored in the system's pre-image system +collection. -The `DocumentSourceChangeStreamAddPreImage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_pre_image.h#L67). -The `ChangeStreamAddPreImageStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_pre_image_stage.cpp#L67). +The `DocumentSourceChangeStreamAddPreImage` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_pre_image.h#L67). +The `ChangeStreamAddPreImageStage` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_pre_image_stage.cpp#L67). #### `$_internalChangeStreamAddPostImage` @@ -780,23 +789,24 @@ This stage is responsible for adding post-image data to "update" events. It is o stream pipelines if the `fullDocument` parameter is not set to `default`. If `fullDocument` is set to `updateLookup`, the stage will perform a lookup for the current version -of a document that was updated by an "update" event, and store it in the `fullDocument` field of -the "update" event if present. The lookup is performed using the `_id` value of the document from -the change event. As the lookup is executed at a different point in time than when the change event -was recorded, it is possible that the lookup finds a different version of the document than the one -that was active when the change event was recorded. This can happen if the document was updated -again between the change event and the lookup. The lookup may also find no document at all if the -document was deleted after the "update" event, but before the lookup. -In case the lookup cannot find a document with the requested `_id`, it will populate the -`fullDocument` field with a value of `null`. +of a document that was updated by an "update" event, and store it in the `fullDocument` field of the +"update" event if present. The lookup is performed using the `_id` value of the document from the +change event. As the lookup is executed at a different point in time than when the change event was +recorded, it is possible that the lookup finds a different version of the document than the one that +was active when the change event was recorded. This can happen if the document was updated again +between the change event and the lookup. The lookup may also find no document at all if the document +was deleted after the "update" event, but before the lookup. In case the lookup cannot find a +document with the requested `_id`, it will populate the `fullDocument` field with a value of `null`. If `fullDocument` is set to `whenAvailable` or `required`, the stage will make use of the stored pre-image of the document in the system's pre-image system collection. It will fetch the pre-image and then apply the delta that is stored in the "update" change event on top of it, and store the result in the `fullDocument` field. -The `DocumentSourceChangeStreamAddPostImage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_post_image.h#L63). -The `ChangeStreamAddPostImageStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84). +The `DocumentSourceChangeStreamAddPostImage` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_post_image.h#L63). +The `ChangeStreamAddPostImageStage` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84). #### `$_internalChangeStreamEnsureResumeTokenPresent` @@ -805,18 +815,22 @@ the change stream parameters is actually in the stream. The stage is only presen stream resume token is not a high water mark token. If the resume token cannot be found in the stream, it will throw a `ChangeStreamFatalError`. -The `DocumentSourceChangeStreamEnsureResumeTokenPresent` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_ensure_resume_token_present.h#L51). -The `ChangeStreamEnsureResumeTokenPresent` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_ensure_resume_token_present_stage.cpp#L67). +The `DocumentSourceChangeStreamEnsureResumeTokenPresent` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_ensure_resume_token_present.h#L51). +The `ChangeStreamEnsureResumeTokenPresent` code is +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_ensure_resume_token_present_stage.cpp#L67). #### `$_internalChangeStreamHandleTopologyChange` This stage is only present in sharded cluster change streams and is always part of the _mongos_ merge pipeline. The stage is responsible for opening additional cursors to shards that have been -added to the cluster. It will handle "insert" events into the `config.shards` collection that -were observed from the config server. +added to the cluster. It will handle "insert" events into the `config.shards` collection that were +observed from the config server. -The `DocumentSourceChangeStreamHandleTopologyChange` code can be found [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_handle_topology_change.h#L63). -The `ChangeStreamHandleTopologyChangeStage` code can be found [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_handle_topology_change_stage.cpp#L121). +The `DocumentSourceChangeStreamHandleTopologyChange` code can be found +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_handle_topology_change.h#L63). +The `ChangeStreamHandleTopologyChangeStage` code can be found +[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_handle_topology_change_stage.cpp#L121). ## Missing documentation (to be completed) diff --git a/docs/command_dispatch.md b/docs/command_dispatch.md index 3b607c7729b..e47e716820b 100644 --- a/docs/command_dispatch.md +++ b/docs/command_dispatch.md @@ -1,75 +1,70 @@ # Command Dispatch -Command dispatch refers to the general process by which client requests are -taken from the network, parsed, sanitized, then finally run on databases. +Command dispatch refers to the general process by which client requests are taken from the network, +parsed, sanitized, then finally run on databases. ## Service Entry Points -[Service entry points][service_entry_point_h] fulfill the transition from the -transport layer into command implementations. For each incoming connection -from a client (in the form of a [session][session_h] object), a new dedicated -thread is spawned then detached, and is also assigned a new [session workflow] -[session_workflow_h], responsible for maintaining the workflow of a -single client connection during its lifetime. Central to the entry point is the -`handleRequest()` function, which manages the server-side logic of processing -requests and returns a response message indicating the result of the -corresponding request message. This function is currently implemented by several -subclasses of the parent `ServiceEntryPoint` in order to account for the -differences in processing requests between the shard and router roles -- these -distinctions are reflected in the `ServiceEntryPointRouterRole` and -`ServiceEntryPointShardRole` subclasses (see [here][service_entry_point_router_role_h] -and [here][service_entry_point_shard_role.h]). +[Service entry points][service_entry_point_h] fulfill the transition from the transport layer into +command implementations. For each incoming connection from a client (in the form of a +[session][session_h] object), a new dedicated thread is spawned then detached, and is also assigned +a new [session workflow] [session_workflow_h], responsible for maintaining the workflow of a single +client connection during its lifetime. Central to the entry point is the `handleRequest()` function, +which manages the server-side logic of processing requests and returns a response message indicating +the result of the corresponding request message. This function is currently implemented by several +subclasses of the parent `ServiceEntryPoint` in order to account for the differences in processing +requests between the shard and router roles -- these distinctions are reflected in the +`ServiceEntryPointRouterRole` and `ServiceEntryPointShardRole` subclasses (see +[here][service_entry_point_router_role_h] and [here][service_entry_point_shard_role.h]). ## Strategy -One area in which the _mongos_ entry point differs from its _mongod_ counterpart -is in its usage of the [Strategy class][strategy_h]. `Strategy` operates as a -legacy interface for processing client read, write, and command requests; there -is a near 1-to-1 mapping between its constituent functions and request types -(e.g. `writeOp()` for handling write operation requests, `getMore()` for a -getMore request, etc.). These functions comprise the backbone of the _mongos_ -entry point's `handleRequest()` -- that is to say, when a valid request is -received, it is sieved and ultimately passed along to the appropriate Strategy -class member function. The significance of using the Strategy class specifically -with the _mongos_ entry point is that it [facilitates query routing to -shards][mongos_router] in _addition_ to running queries against targeted -databases (see [s/transaction_router.h][transaction_router_h] for finer -details). +One area in which the _mongos_ entry point differs from its _mongod_ counterpart is in its usage of +the [Strategy class][strategy_h]. `Strategy` operates as a legacy interface for processing client +read, write, and command requests; there is a near 1-to-1 mapping between its constituent functions +and request types (e.g. `writeOp()` for handling write operation requests, `getMore()` for a getMore +request, etc.). These functions comprise the backbone of the _mongos_ entry point's +`handleRequest()` -- that is to say, when a valid request is received, it is sieved and ultimately +passed along to the appropriate Strategy class member function. The significance of using the +Strategy class specifically with the _mongos_ entry point is that it [facilitates query routing to +shards][mongos_router] in _addition_ to running queries against targeted databases (see +[s/transaction_router.h][transaction_router_h] for finer details). ## Commands -The [Command class][commands_h] serves as a means of cataloging a server command -as well as ascribing various attributes and behaviors to commands via the [type -system][template_method_pattern], that will likely be used during the lifespan -of a particular server. Construction of a Command should only occur during -server startup. When a new Command is constructed, that Command is stored in a -global `CommandRegistry` object for future reference. There are two kinds of -Command subclasses: `BasicCommand` and `TypedCommand`. +The [Command class][commands_h] serves as a means of cataloging a server command as well as +ascribing various attributes and behaviors to commands via the [type +system][template_method_pattern], that will likely be used during the lifespan of a particular +server. Construction of a Command should only occur during server startup. When a new Command is +constructed, that Command is stored in a global `CommandRegistry` object for future reference. There +are two kinds of Command subclasses: `BasicCommand` and `TypedCommand`. -A major distinction between the two is in their implementation of the `parse()` -member function. `parse()` takes in a request and returns a handle to a single -invocation of a particular Command (represented by a `CommandInvocation`), that -can then be used to run the Command. The `BasicCommand::parse()` is a naive -implementation that merely forwards incoming requests to the Invocation and -makes sure that the Command does not support document sequences. The -implementation of `TypedCommand::parse()`, on the other hand, varies depending -on the Request type parameter the Command takes in. Since the `TypedCommand` -accepts requests generated by IDL, the parsing function associated with a usable -Request type must allow it to be parsed as an IDL command. In handling requests, -both the _mongos_ and _mongod_ entry points interact with the Command subclasses -through the `CommandHelpers` struct in order to parse requests and ultimately -run them as Commands. +A major distinction between the two is in their implementation of the `parse()` member function. +`parse()` takes in a request and returns a handle to a single invocation of a particular Command +(represented by a `CommandInvocation`), that can then be used to run the Command. The +`BasicCommand::parse()` is a naive implementation that merely forwards incoming requests to the +Invocation and makes sure that the Command does not support document sequences. The implementation +of `TypedCommand::parse()`, on the other hand, varies depending on the Request type parameter the +Command takes in. Since the `TypedCommand` accepts requests generated by IDL, the parsing function +associated with a usable Request type must allow it to be parsed as an IDL command. In handling +requests, both the _mongos_ and _mongod_ entry points interact with the Command subclasses through +the `CommandHelpers` struct in order to parse requests and ultimately run them as Commands. ## Admission control -To ensure stability of our servers, we have implemented different admission control mechanisms to prevent data-nodes from becoming overloaded with operations. When implementing a new command, it's important to decide whether the command will be subject to one of the admission controls in place and understand the resulting outcomes. +To ensure stability of our servers, we have implemented different admission control mechanisms to +prevent data-nodes from becoming overloaded with operations. When implementing a new command, it's +important to decide whether the command will be subject to one of the admission controls in place +and understand the resulting outcomes. -For example, user commands may be subject to Ingress Admission Control, which happens in the [ServiceEntryPoint][IngressControl]. -For information on admission control and how to implement admission control into a new command, please see [Admission Control README][ACReadMe] +For example, user commands may be subject to Ingress Admission Control, which happens in the +[ServiceEntryPoint][IngressControl]. For information on admission control and how to implement +admission control into a new command, please see [Admission Control README][ACReadMe] ## See Also -For details on transport internals, including ingress networking, see [this document][transport_internals]. +For details on transport internals, including ingress networking, see [this +document][transport_internals]. [service_entry_point_h]: ../src/mongo/transport/service_entry_point.h [session_h]: ../src/mongo/transport/session.h @@ -85,4 +80,5 @@ For details on transport internals, including ingress networking, see [this docu [template_method_pattern]: https://en.wikipedia.org/wiki/Template_method_pattern [transport_internals]: ../src/mongo/transport/README.md [ACReadMe]: ../src/mongo/db/admission/README.md -[IngressControl]: https://github.com/mongodb/mongo/blob/a86c7f5de2a5de4d2f49e40e8970754ec6a5ba6c/src/mongo/db/service_entry_point_shard_role.cpp#L1803 +[IngressControl]: + https://github.com/mongodb/mongo/blob/a86c7f5de2a5de4d2f49e40e8970754ec6a5ba6c/src/mongo/db/service_entry_point_shard_role.cpp#L1803 diff --git a/docs/contexts.md b/docs/contexts.md index 0624d745731..c1122e5a176 100644 --- a/docs/contexts.md +++ b/docs/contexts.md @@ -14,9 +14,9 @@ dynamically extensible. A `ServiceContext` represents all of the state of a single Mongo server process, which may be either a `mongod` or a `mongos`. It creates and manages the previously mentioned `Client`s and `OperationContext`s, as well as a `TransportLayer` for performing network operations, a -`PeriodicRunner` for running housekeeping tasks periodically, a `StorageEngine` for interacting -with the actual database itself, and a set of time sources. In general, every Mongo server process -has a single `ServiceContext`, known as the _global_ `ServiceContext`. Typical uses of the global +`PeriodicRunner` for running housekeeping tasks periodically, a `StorageEngine` for interacting with +the actual database itself, and a set of time sources. In general, every Mongo server process has a +single `ServiceContext`, known as the _global_ `ServiceContext`. Typical uses of the global `ServiceContext` outside of server initialization and shutdown include looking up `Client` or `OperationContext` information for a particular thread or operation, or killing one or more running operations during, e.g., a primary replica step-down. The global `ServiceContext` is created during @@ -28,16 +28,16 @@ The `ServiceContext` associated with a given `Client` object can be fetched in a using [`Client::getServiceContext()`][client-get-service-context-url] when possible. As of time of writing, every server process only maintains a single `ServiceContext`, but preferring `Client::getServiceContext()` or `ServiceContext::getCurrentServiceContext()` over -[`ServiceContext::getGlobalServiceContext()`][get-global-service-context-url] will allow us to -more easily maintain multiple `ServiceContext`s per server process if desired in the future. +[`ServiceContext::getGlobalServiceContext()`][get-global-service-context-url] will allow us to more +easily maintain multiple `ServiceContext`s per server process if desired in the future. ## [`Client`][client-url] Each logical connection to a Mongo service is managed by a `Client` object, where a logical -connection may be a user or an internal process that needs to run a command or query on the database. -Construction of a `Client` object is typically performed with a call to `makeClient` on the global -`ServiceContext`, which can then be attached to any thread of execution, or with a call to -[`Client::initThread`][client-init-thread-url] which constructs a `Client` on the global +connection may be a user or an internal process that needs to run a command or query on the +database. Construction of a `Client` object is typically performed with a call to `makeClient` on +the global `ServiceContext`, which can then be attached to any thread of execution, or with a call +to [`Client::initThread`][client-init-thread-url] which constructs a `Client` on the global `ServiceContext` and binds it to the current thread. All operations executed by the `Client` will take place on that `Client`’s associated thread serially over the network connection managed by the `Session` object that was passed into the `Client`’s constructor. If no `Session` is passed to the @@ -70,13 +70,13 @@ operations. The semantics of the `Client` lock are summarized in the table below [`Client::cc()`][client-cc-url] may be used to get the `Client` object associated with the currently executing thread. Prefer passing `Client` objects as parameters over calls to `Client::cc()` when -possible. A [`ThreadClient`][thread-client-url] is an RAII-style class which may be used to construct -and bind a `Client` to the current running thread and automatically unbind it once the `ThreadClient` -goes out of scope. An [`AlternativeClientRegion`][acr-url] is another RAII-style class which may be -used to temporarily bind a `Client` object to the currently running thread (holding any currently -bound `Client` in reserve), rebinding the current thread’s old `Client` to the current thread upon -falling out of scope. [`ClientStrand`][client-strand-url] functions similarly, but also provides an -`Executor` interface for binding a `Client` to an arbitrary thread. +possible. A [`ThreadClient`][thread-client-url] is an RAII-style class which may be used to +construct and bind a `Client` to the current running thread and automatically unbind it once the +`ThreadClient` goes out of scope. An [`AlternativeClientRegion`][acr-url] is another RAII-style +class which may be used to temporarily bind a `Client` object to the currently running thread +(holding any currently bound `Client` in reserve), rebinding the current thread’s old `Client` to +the current thread upon falling out of scope. [`ClientStrand`][client-strand-url] functions +similarly, but also provides an `Executor` interface for binding a `Client` to an arbitrary thread. ## [`OperationContext`][operation-context-url] @@ -92,23 +92,37 @@ performed asynchronously. ### Interruptibility -`OperationContext`s implement the [`Interruptible`][interruptible-url] interface, which allows them to -be killed by their associated `Client`s (or, by proxy, their owning `ServiceContext`). See -[this comment block][opctx-interruptible-comment-block-url] for more details on when and how +`OperationContext`s implement the [`Interruptible`][interruptible-url] interface, which allows them +to be killed by their associated `Client`s (or, by proxy, their owning `ServiceContext`). See [this +comment block][opctx-interruptible-comment-block-url] for more details on when and how `OperationContext`s are interrupted. -[service-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L141 -[decorable-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/decorable.h -[client-get-service-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L117 -[get-global-service-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L755 -[client-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h -[client-init-thread-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L75 -[client-cc-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L372 -[thread-client-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L320 -[acr-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L347 -[client-strand-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client_strand.h -[operation-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.h +[service-context-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L141 +[decorable-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/decorable.h +[client-get-service-context-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L117 +[get-global-service-context-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L755 +[client-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h +[client-init-thread-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L75 +[client-cc-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L372 +[thread-client-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L320 +[acr-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L347 +[client-strand-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client_strand.h +[operation-context-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.h [kill-op-url]: https://docs.mongodb.com/manual/reference/command/killOp/ -[baton-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/baton.h -[interruptible-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/interruptible.h -[opctx-interruptible-comment-block-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.cpp#L281 +[baton-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/baton.h +[interruptible-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/interruptible.h +[opctx-interruptible-comment-block-url]: + https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.cpp#L281 diff --git a/docs/cpp_style.md b/docs/cpp_style.md index 5ccd3f27da2..6240958f338 100644 --- a/docs/cpp_style.md +++ b/docs/cpp_style.md @@ -1,128 +1,111 @@ # MongoDB Server C++ Style Guide -This document describes common conventions used in the MongoDB server codebase. -The document is about C++, but there are a few places where JavaScript style is -discussed as well. +This document describes common conventions used in the MongoDB server codebase. The document is +about C++, but there are a few places where JavaScript style is discussed as well. -A firmly established style guide can make source files unsurprising as they are -more easily navigable and regular in shape. +A firmly established style guide can make source files unsurprising as they are more easily +navigable and regular in shape. -Style rules can eliminate wasted time on minor issues in code reviews. An author -should endeavor to be style-compliant before sending a pull request for review. -This should accelerate code reviews and establish consistent expectations on code. +Style rules can eliminate wasted time on minor issues in code reviews. An author should endeavor to +be style-compliant before sending a pull request for review. This should accelerate code reviews and +establish consistent expectations on code. -The guide is carefully considered by very experienced C++ engineers. C++ code -can be complex, and there are subtle correctness and maintainability risks that -can arise from certain antipatterns addressed by the guide. Style adherence -enables code authors and their reviewers to productively write safer code -without having to first rediscover those problems for themselves. +The guide is carefully considered by very experienced C++ engineers. C++ code can be complex, and +there are subtle correctness and maintainability risks that can arise from certain antipatterns +addressed by the guide. Style adherence enables code authors and their reviewers to productively +write safer code without having to first rediscover those problems for themselves. ## Feedback (MongoDB internal) This is maintained by the Server Programmability team. -- Use `#server-programmability` on Slack for discussion and clarifications. - Contributors outside of MongoDB can use Jira instead. -- For change proposals, please feel free to add entries to the - MongoDB C++ Style Guide Proposals document pinned to that channel. -- Jira and PRs are fine for small fixes unrelated to C++ style, such as - typos, formatting, phrasing, and comments. +- Use `#server-programmability` on Slack for discussion and clarifications. Contributors outside of + MongoDB can use Jira instead. +- For change proposals, please feel free to add entries to the MongoDB C++ Style Guide Proposals + document pinned to that channel. +- Jira and PRs are fine for small fixes unrelated to C++ style, such as typos, formatting, phrasing, + and comments. ## Style ## Names of Identifiers -There's some truth in the old joke that naming is the hardest problem in -programming. It's impossible to write catch-all rules for naming, but we can set -guidelines with the intention of avoiding friction in reviews and having some -expectation of general consistency across our codebase. +There's some truth in the old joke that naming is the hardest problem in programming. It's +impossible to write catch-all rules for naming, but we can set guidelines with the intention of +avoiding friction in reviews and having some expectation of general consistency across our codebase. -- Types use `TitleCase`. First letter of each word is uppercase. Following - letters are lowercase. +- Types use `TitleCase`. First letter of each word is uppercase. Following letters are lowercase. -- Functions and variables use `camelCase`. First letter of each word after the - first is uppercase. The first letter of each word, except the first, is - uppercase. +- Functions and variables use `camelCase`. First letter of each word after the first is uppercase. + The first letter of each word, except the first, is uppercase. -- Namespaces use `snake_case`. No uppercase letters, and words separated by underscores. - (See "[Namespaces](#namespaces)" section below). +- Namespaces use `snake_case`. No uppercase letters, and words separated by underscores. (See + "[Namespaces](#namespaces)" section below). -- Spelling: Take care to avoid misspellings in names. - This is more than aesthetic. It is easier on readers. - Misspelled names can harm confidence in code quality. - Misspelled names might be skipped by code searches. - Our convention is to use US English spelling. +- Spelling: Take care to avoid misspellings in names. This is more than aesthetic. It is easier on + readers. Misspelled names can harm confidence in code quality. Misspelled names might be skipped + by code searches. Our convention is to use US English spelling. -- Identifier names should be short but clear. Long sentence-like names - become a laborious comparison exercise for readers, and can form a "wall of - text" that can bury significant C++ keywords and operators. Local variable names - can be particularly brief without causing confusion, provided that the enclosing - functions remain compact and focused. +- Identifier names should be short but clear. Long sentence-like names become a laborious comparison + exercise for readers, and can form a "wall of text" that can bury significant C++ keywords and + operators. Local variable names can be particularly brief without causing confusion, provided that + the enclosing functions remain compact and focused. -- Repetition and redundancy in names should be avoided. A function name doesn't - need to restate the types of its arguments, for example. The arguments can - usually speak for themselves, but explicit disambiguation may be desirable in - some cases. +- Repetition and redundancy in names should be avoided. A function name doesn't need to restate the + types of its arguments, for example. The arguments can usually speak for themselves, but explicit + disambiguation may be desirable in some cases. -- Word abbreviations should be used carefully. When used, they should be applied - very consistently and documented well. This keeps users from having to guess - which words are abbreviated and which are not. +- Word abbreviations should be used carefully. When used, they should be applied very consistently + and documented well. This keeps users from having to guess which words are abbreviated and which + are not. -- Private members are usually named with a leading underscore (e.g. `_detail`). - This applies to data members more consistently than to functions. Identifiers - with a leading underscore followed by an uppercase letter are reserved by - C++, and must not be used. Therefore, the leading `_` should not be used with - private types and typedefs. Double underscores `__` must be avoided as well. - See [article](https://devblogs.microsoft.com/oldnewthing/20230109-00/?p=107685). +- Private members are usually named with a leading underscore (e.g. `_detail`). This applies to data + members more consistently than to functions. Identifiers with a leading underscore followed by an + uppercase letter are reserved by C++, and must not be used. Therefore, the leading `_` should not + be used with private types and typedefs. Double underscores `__` must be avoided as well. See + [article](https://devblogs.microsoft.com/oldnewthing/20230109-00/?p=107685). ### Constants -Constants are either ordinary variables `varName` or with a `k` as a prefix -word, like `kVarName`. You'll see both in the codebase and either is acceptable. -You may also find some older code using `MACRO_STYLE` for constants. -That should not be used in new code outside of macros. +Constants are either ordinary variables `varName` or with a `k` as a prefix word, like `kVarName`. +You'll see both in the codebase and either is acceptable. You may also find some older code using +`MACRO_STYLE` for constants. That should not be used in new code outside of macros. ### Test Access -Some entities are defined in an API purely to facilitate test access and -testability. We conventionally tack a `_forTest` suffix (or a `ForTest` suffix -for types) onto its name as an indicator that it should not be used by non-test -code. +Some entities are defined in an API purely to facilitate test access and testability. We +conventionally tack a `_forTest` suffix (or a `ForTest` suffix for types) onto its name as an +indicator that it should not be used by non-test code. ## Class Definitions -While class and struct are largely equivalent in C++, this codebase uses a -convention where structs are used for simple collections of data -(possibly with methods), while classes are used for new abstractions. As a rule, -all data in a struct should be public and all data in a class should be private. -If you are unsure which to use, consider whether there are any invariants that -need to be upheld, either within or between members. If there are not, then a -struct may be appropriate. +While class and struct are largely equivalent in C++, this codebase uses a convention where structs +are used for simple collections of data (possibly with methods), while classes are used for new +abstractions. As a rule, all data in a struct should be public and all data in a class should be +private. If you are unsure which to use, consider whether there are any invariants that need to be +upheld, either within or between members. If there are not, then a struct may be appropriate. -If a type is a struct or struct-like class, then consider omitting all -constructors and letting it be a [C++ aggregate](https://en.cppreference.com/w/cpp/language/aggregate_initialization), which allows some flexibility -in initialization syntax. +If a type is a struct or struct-like class, then consider omitting all constructors and letting it +be a [C++ aggregate](https://en.cppreference.com/w/cpp/language/aggregate_initialization), which +allows some flexibility in initialization syntax. -If a type has invariant-preserving constructors, special behaviors, and internal -private details, it's not a `struct`. It's subjective, but structs should be a -mostly straightforward aggregation of data members. +If a type has invariant-preserving constructors, special behaviors, and internal private details, +it's not a `struct`. It's subjective, but structs should be a mostly straightforward aggregation of +data members. -Consider a somewhat canonical example of a `Date`, consisting of `year`, -`month`, `dayOfMonth`. The valid range of a `dayOfMonth` depends on `year` and -`month`, so this type either has an invariant, or it has to be allowed to be in -an invalid state. If the invariants of this type are enforced by the type's -constructors and setters, then it should be a `class`. +Consider a somewhat canonical example of a `Date`, consisting of `year`, `month`, `dayOfMonth`. The +valid range of a `dayOfMonth` depends on `year` and `month`, so this type either has an invariant, +or it has to be allowed to be in an invalid state. If the invariants of this type are enforced by +the type's constructors and setters, then it should be a `class`. -It's possible to leave such a `Date` type as a `struct` and enforce these -invariants from the outside through careful discipline among its users. This is -what C APIs have to do. We should prefer using data encapsulation and -`class` for such complex objects. +It's possible to leave such a `Date` type as a `struct` and enforce these invariants from the +outside through careful discipline among its users. This is what C APIs have to do. We should prefer +using data encapsulation and `class` for such complex objects. ### Order of Class Members -Within a class or struct definition, try to stick to this ordering by default. A -consistent convention makes it easier for a reader to quickly understand and -navigate a class declaration. +Within a class or struct definition, try to stick to this ordering by default. A consistent +convention makes it easier for a reader to quickly understand and navigate a class declaration. Group public API at the top, and details at the bottom. @@ -145,13 +128,12 @@ Within each of these visibility sections, there's a preferred order of declarati - Member functions - Data members -As always, technical concerns override style, and this order sometimes cannot be -exactly followed for technical reasons, but it should be the predominant -weakly-binding preference when laying out a class in the absence of motivation -to diverge from it. -Private data members have a leading underscore followed by a camel case name like `_fooBarBaz`. -Protected members may or may not have a leading underscore, depending on how -logically internal they are. This convention doesn't apply to types. +As always, technical concerns override style, and this order sometimes cannot be exactly followed +for technical reasons, but it should be the predominant weakly-binding preference when laying out a +class in the absence of motivation to diverge from it. Private data members have a leading +underscore followed by a camel case name like `_fooBarBaz`. Protected members may or may not have a +leading underscore, depending on how logically internal they are. This convention doesn't apply to +types. ### Naming of Class Members @@ -176,41 +158,39 @@ private: ### User-facing Names That Include Units (not strictly a C++ issue) -This section applies to names that users can see, like BSON field names or -server parameters, but not necessarily to C++ identifiers. +This section applies to names that users can see, like BSON field names or server parameters, but +not necessarily to C++ identifiers. -In things like `serverStatus`, include the units in the field name if there is -any chance of ambiguity. For example, `writtenMB` or `timeMs`. +In things like `serverStatus`, include the units in the field name if there is any chance of +ambiguity. For example, `writtenMB` or `timeMs`. -- For bytes: use `MB` and show in megabytes unless you know it will be tiny. - Note you can use a float so `0.1MB` is fine to show. +- For bytes: use `MB` and show in megabytes unless you know it will be tiny. Note you can use a + float so `0.1MB` is fine to show. - Durations: - - Use milliseconds by default. - Prefer the suffix `Millis`, but be aware that `Ms` is also used. - - Use `Secs` and a floating point number for times that are - expected to be very long. + - Use milliseconds by default. Prefer the suffix `Millis`, but be aware that `Ms` is also used. + - Use `Secs` and a floating point number for times that are expected to be very long. - For microseconds, use `Micros` as the suffix (e.g., `timeMicros`). ## Documentation -- API docs should appear directly above the thing being documented and use `/**` or `///` style comments. +- API docs should appear directly above the thing being documented and use `/**` or `///` style + comments. -- If it fits, a comment can be to the right of a variable with `///< doc`. - (See [Doxygen syntax](https://www.doxygen.nl/manual/docblocks.html#memberdoc)). - The `<` is important, as it tells tooling such as clangd to bind backwards to the preceding - decl rather than the following one. +- If it fits, a comment can be to the right of a variable with `///< doc`. (See + [Doxygen syntax](https://www.doxygen.nl/manual/docblocks.html#memberdoc)). The `<` is important, + as it tells tooling such as clangd to bind backwards to the preceding decl rather than the + following one. -- We don't run Doxygen or recommend other Doxygen markup, this style of comment - delimiter distinguishes API docs from other comments. +- We don't run Doxygen or recommend other Doxygen markup, this style of comment delimiter + distinguishes API docs from other comments. -- Use complete, grammatical sentences for API docs. Reviewers should pay attention - to the clarity of documentation as it would appear to a reasonably-experienced - server engineer who may not be a domain expert on the code. +- Use complete, grammatical sentences for API docs. Reviewers should pay attention to the clarity of + documentation as it would appear to a reasonably-experienced server engineer who may not be a + domain expert on the code. -- Avoid overly conversational tone, unnecessary personal references (like "I", - or "Pat"), slang, or jargon. Comments should strive for professionalism, but - without rigid formality. +- Avoid overly conversational tone, unnecessary personal references (like "I", or "Pat"), slang, or + jargon. Comments should strive for professionalism, but without rigid formality. - Comment syntax @@ -230,42 +210,38 @@ void complexFunction(int x, int y) { } ``` -- Give the right amount of information. Make some attempt to give the gist of - complex processes. Avoid being unnecessarily vague to avoid explanation - that would be helpful to the consumer of the API. Conversely, try to avoid - going too much into implementation details in doc-comments (or at least - clearly state when doing so using words like "currently") unless those details - are part of the API that consumers should rely on. +- Give the right amount of information. Make some attempt to give the gist of complex processes. + Avoid being unnecessarily vague to avoid explanation that would be helpful to the consumer of the + API. Conversely, try to avoid going too much into implementation details in doc-comments (or at + least clearly state when doing so using words like "currently") unless those details are part of + the API that consumers should rely on. -- Comments should be descriptive rather than imperative, e.g. - "Frobnicates the widget", not "Frobnicate the widget". The subject of the - initial sentence is assumed to be the thing being documented and should - generally be omitted, e.g. don't say "This function frobnicates the widget". +- Comments should be descriptive rather than imperative, e.g. "Frobnicates the widget", not + "Frobnicate the widget". The subject of the initial sentence is assumed to be the thing being + documented and should generally be omitted, e.g. don't say "This function frobnicates the widget". ```c++ /** Calculates the sum. (GOOD: descriptive verb) */ /** Calculate the sum. (BAD: imperative verb) */ ``` -There's no need to be very formal about their formatting or use elaborate -Doxygen/Javadoc etc tags. A smattering of text-like markdown is good. Some IDE -features or other tooling might pick up on it, but it shouldn't interfere with the -primary use case of viewing the comments as text while browsing a header file. +There's no need to be very formal about their formatting or use elaborate Doxygen/Javadoc etc tags. +A smattering of text-like markdown is good. Some IDE features or other tooling might pick up on it, +but it shouldn't interfere with the primary use case of viewing the comments as text while browsing +a header file. -Reader attention is a precious resource, so try to write concise comments, and -obvious things need not get a comment. Comments should be adding information. -Do not restate the name and signature, unless there is a subtle detail that -should be highlighted. +Reader attention is a precious resource, so try to write concise comments, and obvious things need +not get a comment. Comments should be adding information. Do not restate the name and signature, +unless there is a subtle detail that should be highlighted. -Assume the reader knows the language. Special member functions like the copy -constructor do not need comments saying what they are. `operator==` should only -get a comment if there is something interesting about it like omitting a member, -or being order-sensitive. +Assume the reader knows the language. Special member functions like the copy constructor do not need +comments saying what they are. `operator==` should only get a comment if there is something +interesting about it like omitting a member, or being order-sensitive. -Most classes and functions should default to having at least a 1-liner comment, -but sometimes context and good naming can make even that a redundant formality -to be omitted. While this is a subjective decision, remember that later readers -will need more hints than the original implementers. +Most classes and functions should default to having at least a 1-liner comment, but sometimes +context and good naming can make even that a redundant formality to be omitted. While this is a +subjective decision, remember that later readers will need more hints than the original +implementers. ```c++ /** @@ -280,18 +256,17 @@ will need more hints than the original implementers. ### TODOs -To cite a ticket as a TODO in the code, use this format, with a short reason for -the link. A Jira bot will create reminders when the cited target ticket is -resolved. The target of the TODO cannot be the current ticket. Suppose -SERVER-12345 was a ticket to fix the frobber, and we're documenting some +To cite a ticket as a TODO in the code, use this format, with a short reason for the link. A Jira +bot will create reminders when the cited target ticket is resolved. The target of the TODO cannot be +the current ticket. Suppose SERVER-12345 was a ticket to fix the frobber, and we're documenting some workaround code: ```c++ // TODO(SERVER-12345): Remove this code when the frobber works again. ``` -In comments, a function may be referred to using just its name `foo`, or by `foo()`, -or `foo(int,int)`, depending on context and whether the other forms are ambiguous. +In comments, a function may be referred to using just its name `foo`, or by `foo()`, or +`foo(int,int)`, depending on context and whether the other forms are ambiguous. ## C++ Code @@ -300,104 +275,96 @@ conventions. This section presents more substantial technical issues. ### Minimal Syntax -If a keyword or operator is a "noise" word with no technical benefit, omit it. -The philosophy here is that it's better to write the code as plainly as -possible. Code should not look like it's doing something special when it isn't. +If a keyword or operator is a "noise" word with no technical benefit, omit it. The philosophy here +is that it's better to write the code as plainly as possible. Code should not look like it's doing +something special when it isn't. Some examples of "noise" syntax: -- Redundantly marking members and bases as `public`, `protected` or `private`, - etc when they already are. +- Redundantly marking members and bases as `public`, `protected` or `private`, etc when they already + are. - Marking a function decl to be `extern` (they're already extern). - Using `virtual` on a function that's already `override` or `final` (see "[Overriding Virtuals](#overriding-virtuals)"). ### Constructors -Constructors that can be called with single arguments should be `explicit`, -unless implicit conversion is desired, in which case use `explicit(false)` to -explicitly show that intent. -Non-unary constructors should NOT be `explicit` unless it is important to -disable bare braced initialization. If a constructor takes a variable number of arguments -such that it is possibly unary, make it `explicit`. +Constructors that can be called with single arguments should be `explicit`, unless implicit +conversion is desired, in which case use `explicit(false)` to explicitly show that intent. Non-unary +constructors should NOT be `explicit` unless it is important to disable bare braced initialization. +If a constructor takes a variable number of arguments such that it is possibly unary, make it +`explicit`. ### `= default` -Prefer `= default;` when needed over defining an empty or trivial function body `{}`. -But where possible, it is usually better to omit the declarations for lifetime methods -entirely and let the compiler declare them implicitly. +Prefer `= default;` when needed over defining an empty or trivial function body `{}`. But where +possible, it is usually better to omit the declarations for lifetime methods entirely and let the +compiler declare them implicitly. -Consider that for some classes it may be useful to declare a function normally -in a `.h` file and provide `= default;` as the implementation in a `.cpp` file. +Consider that for some classes it may be useful to declare a function normally in a `.h` file and +provide `= default;` as the implementation in a `.cpp` file. ### Noexcept -The `noexcept` feature is easy to overuse. Do not use it solely as "documentation" -since it affects runtime behavior. It's a large topic, covered in the [Exception -Architecture](https://github.com/mongodb/mongo/blob/master/docs/exception_architecture.md#using-noexcept) +The `noexcept` feature is easy to overuse. Do not use it solely as "documentation" since it affects +runtime behavior. It's a large topic, covered in the +[Exception Architecture](https://github.com/mongodb/mongo/blob/master/docs/exception_architecture.md#using-noexcept) document. ### Overriding Virtuals -Use `override` wherever it can be used. Tighten this to `final` when necessary, -and where further overrides would introduce opportunities to break base class -guarantees. +Use `override` wherever it can be used. Tighten this to `final` when necessary, and where further +overrides would introduce opportunities to break base class guarantees. Each declaration should have at most one `virtual`, `override`, or `final`. -Like many style rules, there are rare technical situations to bend this rule. In -this case it can be used to force compilation errors on unintentional hiding. +Like many style rules, there are rare technical situations to bend this rule. In this case it can be +used to force compilation errors on unintentional hiding. -If a class is known to be a leaf in a hierarchy of polymorphic types, annotating -the class with `final` can be a useful optimization to enable its `virtual` -functions to be devirtualized in some contexts. +If a class is known to be a leaf in a hierarchy of polymorphic types, annotating the class with +`final` can be a useful optimization to enable its `virtual` functions to be devirtualized in some +contexts. ### Rules For `.h` Files - Use `#pragma once` as an include guard, as the first line after the copyright notice. -- No unnamed namespaces in headers at all. - (See the "Namespaces" section below). +- No unnamed namespaces in headers at all. (See the "Namespaces" section below). -- Use `inline` or `extern` on namespace-scope variables in headers, so that each - translation unit does not get its own copy. Note that `inline` variables - provide some init order guarantees which may add a small startup cost, so - define them as `constexpr` or `constinit` if possible. +- Use `inline` or `extern` on namespace-scope variables in headers, so that each translation unit + does not get its own copy. Note that `inline` variables provide some init order guarantees which + may add a small startup cost, so define them as `constexpr` or `constinit` if possible. -- Keep complex code out of headers. If a function is not performance sensitive, and it - is longer than a few lines, put it in the corresponding .cpp file. This practice - should help to reduce the number of include statements needed in headers, - which is good for modularity and for compilation speed. That said, simple - getters and setters should generally be inline. +- Keep complex code out of headers. If a function is not performance sensitive, and it is longer + than a few lines, put it in the corresponding .cpp file. This practice should help to reduce the + number of include statements needed in headers, which is good for modularity and for compilation + speed. That said, simple getters and setters should generally be inline. ### Rules For `.cpp` Files -Entities with "external linkage" are usable from outside the .cpp file where -they are defined. It's the default linkage for functions, variables, and types -defined at namespace scope, making this unintentional exporting a common error -in C++. +Entities with "external linkage" are usable from outside the .cpp file where they are defined. It's +the default linkage for functions, variables, and types defined at namespace scope, making this +unintentional exporting a common error in C++. -Export with intent. Avoid defining anything with external linkage unless it's -declared in the header. We don't want to have surprising link-time name -collisions or other multi-definition problems as the codebase evolves. -When code has no more callers, it can be readily identified as dead code if it has -internal linkage. +Export with intent. Avoid defining anything with external linkage unless it's declared in the +header. We don't want to have surprising link-time name collisions or other multi-definition +problems as the codebase evolves. When code has no more callers, it can be readily identified as +dead code if it has internal linkage. -Use either unnamed namespaces or `static` to make definitions with "internal -linkage". These are private to the .cpp file in which they appear. -(See "[Linkage](https://en.cppreference.com/w/cpp/language/storage_duration#Linkage)"). +Use either unnamed namespaces or `static` to make definitions with "internal linkage". These are +private to the .cpp file in which they appear. (See +"[Linkage](https://en.cppreference.com/w/cpp/language/storage_duration#Linkage)"). ### API Conventions #### Integer Ranks -We don't typically use the `long` or `long long` integer ranks, except in the -BSON API or when interfacing with third_party or system APIs. In particular, we -should never use plain `long` directly unless required by some outside API since -it is 32 bits on some of our supported platforms. We use `int`, `size_t`, and -the explicit width typedefs `int32_t`, `uint32_t`, `int64_t`, `uint64_t`, etc. -Prefer `size_t` for string/array/container/sequence sizes and indexes, since -that's what C++ does. +We don't typically use the `long` or `long long` integer ranks, except in the BSON API or when +interfacing with third_party or system APIs. In particular, we should never use plain `long` +directly unless required by some outside API since it is 32 bits on some of our supported platforms. +We use `int`, `size_t`, and the explicit width typedefs `int32_t`, `uint32_t`, `int64_t`, +`uint64_t`, etc. Prefer `size_t` for string/array/container/sequence sizes and indexes, since that's +what C++ does. #### `const` @@ -405,34 +372,30 @@ that's what C++ does. - `const` is not required on local variables. -- Making `const` data members of a movable class can lead to problems with - move and assign operations, and is usually not necessary. On the other hand, - it can be useful for types that are never moved or copied. In particular, for - types that are accessed concurrently it is useful to mark members that are - not modified after construction as `const` because they cannot participate in - data races. +- Making `const` data members of a movable class can lead to problems with move and assign + operations, and is usually not necessary. On the other hand, it can be useful for types that are + never moved or copied. In particular, for types that are accessed concurrently it is useful to + mark members that are not modified after construction as `const` because they cannot participate + in data races. -- Don't use `volatile` qualifications. It's an oft-misunderstood feature and - only appropriate in very precise technical scenarios. +- Don't use `volatile` qualifications. It's an oft-misunderstood feature and only appropriate in + very precise technical scenarios. ### Strings -- We do not use `std::string_view`. Use `StringData` from `base/string_data.h` instead. - For interoperability with functions that accept or return `std::string_view` - (e.g. `std::string`), use the pair of conversion functions - `toStdStringViewForInterop` and `toStringDataForInterop`. +- We do not use `std::string_view`. Use `StringData` from `base/string_data.h` instead. For + interoperability with functions that accept or return `std::string_view` (e.g. `std::string`), use + the pair of conversion functions `toStdStringViewForInterop` and `toStringDataForInterop`. -- Working with `char*` strings can be notoriously error-prone. Convert such data to - `StringData` or `std::string` for safety, or use utilities in `util/str.h` for - this sort of thing. +- Working with `char*` strings can be notoriously error-prone. Convert such data to `StringData` or + `std::string` for safety, or use utilities in `util/str.h` for this sort of thing. ### Performing String Formatting -There are at least two kinds of generic string formatting available. We have -stream-oriented formatting with `StringBuilder` and its wrapper `str::stream()` -(using a stripped-down `std::ostream`-like API), and newer `libfmt` formatting -(using Python-like syntax). We do not use `std::format`. `sprintf`-style -formatting is very rarely used. +There are at least two kinds of generic string formatting available. We have stream-oriented +formatting with `StringBuilder` and its wrapper `str::stream()` (using a stripped-down +`std::ostream`-like API), and newer `libfmt` formatting (using Python-like syntax). We do not use +`std::format`. `sprintf`-style formatting is very rarely used. ```c++ #include @@ -446,15 +409,13 @@ formatting is very rarely used. ### Output Parameters -Use pointers or mutable references as "in/out" or "output" parameters, -but prefer returning values to using pure output parameters. -Mutable references used to be banned, but this is no longer the case, and -they are now encouraged for many cases, especially if the callee will not -require the reference to be valid after returning. That said, some types, -such as `OperationContext` are conventionally passed by pointer. -It is best to stick to established conventions for such types to avoid -needing a lot of additional `&opCtx` and `*opCtx` noise at call sites -between functions using different conventions. +Use pointers or mutable references as "in/out" or "output" parameters, but prefer returning values +to using pure output parameters. Mutable references used to be banned, but this is no longer the +case, and they are now encouraged for many cases, especially if the callee will not require the +reference to be valid after returning. That said, some types, such as `OperationContext` are +conventionally passed by pointer. It is best to stick to established conventions for such types to +avoid needing a lot of additional `&opCtx` and `*opCtx` noise at call sites between functions using +different conventions. ```c++ void appendData(const std::string& tag, std::vector& out) { @@ -479,17 +440,15 @@ void appendData(const std::string& tag, std::vector& out) { } // namespace foo ``` -- Do not use "using directives" (i.e. `using namespace foo;`) for arbitrary - namespaces as a naming shortcut. Some namespaces are designed to be used this - way in restricted contexts, but still never at namespace-scope in header - files. These carefully curated namespaces contain only a few definitions. - Examples of these limited exceptional namespaces would include: +- Do not use "using directives" (i.e. `using namespace foo;`) for arbitrary namespaces as a naming + shortcut. Some namespaces are designed to be used this way in restricted contexts, but still never + at namespace-scope in header files. These carefully curated namespaces contain only a few + definitions. Examples of these limited exceptional namespaces would include: - - The `std::literals`, `fmt::literals`, and similar namespaces that hold - user-defined literal operators. Using directives are necessary for importing - user-defined literals. - - The `std::placeholders` namespace containing `_1`, `_2`, for use with the - `std::bind` API (which we have banned anyway). + - The `std::literals`, `fmt::literals`, and similar namespaces that hold user-defined literal + operators. Using directives are necessary for importing user-defined literals. + - The `std::placeholders` namespace containing `_1`, `_2`, for use with the `std::bind` API (which + we have banned anyway). As an alternative, a namespace _alias_ may help to declutter local scopes. @@ -498,32 +457,30 @@ void appendData(const std::string& tag, std::vector& out) { namespace bfs = boost::filesystem; ``` -- No unnamed namespaces in headers at all. - They can produce subtle correctness risks, particularly in the form of +- No unnamed namespaces in headers at all. They can produce subtle correctness risks, particularly + in the form of [ODR (One Definition Rule)](https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule) violations. -- In .cpp files, use unnamed namespaces to strip definitions of their linkage. - Headers should generally only be declaring entitiees with external linkage. +- In .cpp files, use unnamed namespaces to strip definitions of their linkage. Headers should + generally only be declaring entitiees with external linkage. -- Most server code should be in the `mongo` namespace, and we have several - sub-namespaces nested within that, often used to help organize code by team, by - project, or by large feature. +- Most server code should be in the `mongo` namespace, and we have several sub-namespaces nested + within that, often used to help organize code by team, by project, or by large feature. -- Defining a new nested namespace as an API point is cheap, but can be a little - fiddly for users if we have too many of them, so they should be substantial and - relatively coarse-grained (a handful per team). +- Defining a new nested namespace as an API point is cheap, but can be a little fiddly for users if + we have too many of them, so they should be substantial and relatively coarse-grained (a handful + per team). -- Use a component-unique namespace, eg `future_details` or `duration_detail`, to - give names to pseudo-"private" details in headers. It's important to include - the component name here. Using `mongo::detail` or `mongo::internal` doesn't - mitigate the problem of name collisions between components. +- Use a component-unique namespace, eg `future_details` or `duration_detail`, to give names to + pseudo-"private" details in headers. It's important to include the component name here. Using + `mongo::detail` or `mongo::internal` doesn't mitigate the problem of name collisions between + components. -- As a matter of namespace etiquette and modularity, avoid using anything in a - component's `detail` or `internal` -suffixed namespaces from outside the - component. If you need to use such a private name, that should ideally involve - a conversation with the code owners about promoting it out of the detail - namespace. +- As a matter of namespace etiquette and modularity, avoid using anything in a component's `detail` + or `internal` -suffixed namespaces from outside the component. If you need to use such a private + name, that should ideally involve a conversation with the code owners about promoting it out of + the detail namespace. - Combine immediately-nested namespace blocks where possible: @@ -574,29 +531,30 @@ Status withEarlyReturns() { #### Range-Based `for` Loops -[Range-based for loops](https://en.cppreference.com/w/cpp/language/range-for) can have subtle issues. -The usual practice is to use a forwarding reference (`auto&&`) as the item variable. Applying this -pattern as a default practice prevents subtle copies and conversions of the range elements. +[Range-based for loops](https://en.cppreference.com/w/cpp/language/range-for) can have subtle +issues. The usual practice is to use a forwarding reference (`auto&&`) as the item variable. +Applying this pattern as a default practice prevents subtle copies and conversions of the range +elements. ```c++ for (auto&& item : someRange) ``` -For ranges that have pair or tuple elements, particularly maps, it's common to -use structured bindings to give names to the parts of the item: +For ranges that have pair or tuple elements, particularly maps, it's common to use structured +bindings to give names to the parts of the item: ```c++ for (auto&& [key, value]: someMap) ``` -It's worth a note of caution about the dangers of the range expression in a -range-based for loop, as this is a common and subtle source of bugs. +It's worth a note of caution about the dangers of the range expression in a range-based for loop, as +this is a common and subtle source of bugs. -The range expression is bound to an implicit range variable, and its lifetime -will be extended if it's a temporary, as usual with C++ initializers. +The range expression is bound to an implicit range variable, and its lifetime will be extended if +it's a temporary, as usual with C++ initializers. -But other temporaries created in the initializer expression will die after the -initializer. They are not extended to the lifetime of the for loop. +But other temporaries created in the initializer expression will die after the initializer. They are +not extended to the lifetime of the for loop. ```c++ // ok: temporary is bound to implicit range variable. @@ -606,49 +564,49 @@ initializer. They are not extended to the lifetime of the for loop. for (auto&& item: obj().view()) ``` -The rules here change in C++23, such that all temporaries in the range initializer are extended. -The fix is a theoretically a breaking change for some code. But the risk tradeoff -overwhelmingly favored making this change anyway. +The rules here change in C++23, such that all temporaries in the range initializer are extended. The +fix is a theoretically a breaking change for some code. But the risk tradeoff overwhelmingly favored +making this change anyway. -> [!WARNING] -> The compilers we are using have not all implemented this feature yet, even on the v5 toolchain. So -> we still need to be extremely careful with range expressions that rely on +> [!WARNING] The compilers we are using have not all implemented this feature yet, even on the v5 +> toolchain. So we still need to be extremely careful with range expressions that rely on > intermediate temporaries. -It would be helpful to read the [CppReference](https://en.cppreference.com/w/cpp/language/range-for#Temporary_range_initializer) on this topic. -Some good [bug examples](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2644r0.pdf) -are listed in the single-page ISO C++ proposal to fix the problem. +It would be helpful to read the +[CppReference](https://en.cppreference.com/w/cpp/language/range-for#Temporary_range_initializer) on +this topic. Some good +[bug examples](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2644r0.pdf) are listed in +the single-page ISO C++ proposal to fix the problem. ### Assertions -This is a large topic. See the [Exception Architecture](https://github.com/mongodb/mongo/blob/master/docs/exception_architecture.md) architecture guide. +This is a large topic. See the +[Exception Architecture](https://github.com/mongodb/mongo/blob/master/docs/exception_architecture.md) +architecture guide. ### Logging and Output We use a custom logging system, documented in the -[Logging](https://github.com/mongodb/mongo/blob/master/docs/logging.md) -architecture guide. Direct output to `stdout` or `stderr` streams is only done -by special server code. +[Logging](https://github.com/mongodb/mongo/blob/master/docs/logging.md) architecture guide. Direct +output to `stdout` or `stderr` streams is only done by special server code. ### Numeric Constants Large, round numeric constants should be written in a user-friendly way. -- If a number is derived from a simple numeric expression, expressing it as an - expression can help a reader verify and maintain it. For example, prefer - `50 * 1024 * 1024` to `52'428'800`. +- If a number is derived from a simple numeric expression, expressing it as an expression can help a + reader verify and maintain it. For example, prefer `50 * 1024 * 1024` to `52'428'800`. -- Use digit separators `'` for large numeric constants. 3-digit groups for - decimal. Conventionally, use 4-digit or 8-digit groups for hexadecimal or - binary. +- Use digit separators `'` for large numeric constants. 3-digit groups for decimal. Conventionally, + use 4-digit or 8-digit groups for hexadecimal or binary. - Use a bit-shifted form for power-of-two exponentiation. eg, `1<<13` to express 213. - Make sure the "1" is wide enough for the shift if it's large (e.g. `uint64_t{1} << 52`). - A `* 1024` sequence is also acceptable, as it's a recognizable idiom for kiB and MiB expressions. + Make sure the "1" is wide enough for the shift if it's large (e.g. `uint64_t{1} << 52`). A + `* 1024` sequence is also acceptable, as it's a recognizable idiom for kiB and MiB expressions. -- Do not assume suffixes like `ULL` will produce specifically typed quantities like `uint64`. - Use a numeric literal and the compiler will give it a wide-enough type. - Where the exact type matters, use an explicitly typed expression. +- Do not assume suffixes like `ULL` will produce specifically typed quantities like `uint64`. Use a + numeric literal and the compiler will give it a wide-enough type. Where the exact type matters, + use an explicitly typed expression. ```c++ const int tenMillion = 10'000'000; @@ -661,102 +619,94 @@ arrayBuilder.append(uint64_t{1234}); // Force argument type. ### Casting -- Do not use C-style cast syntax (parentheses around the preceding type) ever. - See [this CGL rule](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast) +- Do not use C-style cast syntax (parentheses around the preceding type) ever. See + [this CGL rule](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast) and [this Google rule](https://google.github.io/styleguide/cppguide.html#Casting) for discussion. - Use `static_cast` as needed. Use `const_cast` when necessary. -- Be aware that `dynamic_cast`, unlike other casts, is done at runtime. You - should always check for `dynamic_cast` returning null pointer. +- Be aware that `dynamic_cast`, unlike other casts, is done at runtime. You should always check for + `dynamic_cast` returning null pointer. -- `reinterpret_cast` should be used sparingly. It is typically done for - low-level layout conversions and accessing objects in ways that may break the - protections of the type system and exhibit undefined behavior if misapplied. +- `reinterpret_cast` should be used sparingly. It is typically done for low-level layout conversions + and accessing objects in ways that may break the protections of the type system and exhibit + undefined behavior if misapplied. -- When down-casting from a base type where the program logic guarantees that - the runtime type is correct, consider using `checked_cast` from - `mongo/base/checked_cast.h`. It is equivalent to `static_cast` in release builds, - but adds an invariant to debug builds that ensures the cast is valid. +- When down-casting from a base type where the program logic guarantees that the runtime type is + correct, consider using `checked_cast` from `mongo/base/checked_cast.h`. It is equivalent to + `static_cast` in release builds, but adds an invariant to debug builds that ensures the cast is + valid. ### RAII and Smart Pointers -- Embrace RAII (Resource Acquisition Is Initialization). This means that resources - should generally be managed by objects that automatically release them when - going out of scope. +- Embrace RAII (Resource Acquisition Is Initialization). This means that resources should generally + be managed by objects that automatically release them when going out of scope. -- By default, the assumption in our codebase is that raw pointers are - views/borrows and never owning. Document exceptions to that rule, and try to - avoid having owning raw pointers as part of your API. +- By default, the assumption in our codebase is that raw pointers are views/borrows and never + owning. Document exceptions to that rule, and try to avoid having owning raw pointers as part of + your API. -- Make heavy use of smart pointers such as `std::unique_ptr` and `std::shared_ptr`. - For some types we use `boost::intrusive_ptr` instead. +- Make heavy use of smart pointers such as `std::unique_ptr` and `std::shared_ptr`. For some types + we use `boost::intrusive_ptr` instead. -- Generally, bare calls to `new`/`delete` and `malloc`/`free` outside of the - implementation of an RAII type should be red flags and draw extra scrutiny in - review. Prefer factory functions like `std::make_unique` and - `std::make_shared`. +- Generally, bare calls to `new`/`delete` and `malloc`/`free` outside of the implementation of an + RAII type should be red flags and draw extra scrutiny in review. Prefer factory functions like + `std::make_unique` and `std::make_shared`. -- Use `ScopeGuard` or `ON_BLOCK_EXIT` to protect other resources that must be - released (e.g. `fopen`/`fclose` pairs), or perform some other action when - leaving scope. It is often a good idea to put "undo X" logic right after the - "do X" logic rather than at the bottom of the function to ensure that the - logic stays correct if someone adds an early return or throws. Or, write an - object to do this for you via its constructor and destructor. +- Use `ScopeGuard` or `ON_BLOCK_EXIT` to protect other resources that must be released (e.g. + `fopen`/`fclose` pairs), or perform some other action when leaving scope. It is often a good idea + to put "undo X" logic right after the "do X" logic rather than at the bottom of the function to + ensure that the logic stays correct if someone adds an early return or throws. Or, write an object + to do this for you via its constructor and destructor. ### The `WithLock` Convention -It is common practice in our codebase for a larger "business logic" class to -have an obvious primary mutex member. These tend to have some private functions -that require that this mutex be held. These functions often take a -`WithLock` as the first parameter to document the contract and provide some -checking of the callers. The parameter should usually be unnamed. This is a -technical check that forces callers to present a lock-holding resource handle -(e.g. `unique_lock`) to call the function. See -[with_lock.h](../src/mongo/util/concurrency/with_lock.h). +It is common practice in our codebase for a larger "business logic" class to have an obvious primary +mutex member. These tend to have some private functions that require that this mutex be held. These +functions often take a `WithLock` as the first parameter to document the contract and provide some +checking of the callers. The parameter should usually be unnamed. This is a technical check that +forces callers to present a lock-holding resource handle (e.g. `unique_lock`) to call the function. +See [with_lock.h](../src/mongo/util/concurrency/with_lock.h). ## Files (Physical Design) ### Components -A component is a grouping of classes, entities, and functions that is built as a -single packaged unit. There are 1 or more components in a library. A component -should represent a grouping of functionality and interrelated classes and -functions that work together. +A component is a grouping of classes, entities, and functions that is built as a single packaged +unit. There are 1 or more components in a library. A component should represent a grouping of +functionality and interrelated classes and functions that work together. -A component normally consists of a `.h`, a `.cpp`, and a `_test.cpp` file. -Source filenames use lowercase words separated by underscores (i.e. snake_case). +A component normally consists of a `.h`, a `.cpp`, and a `_test.cpp` file. Source filenames use +lowercase words separated by underscores (i.e. snake_case). -In uncommon cases, there are other files in the component for technical or -internal organizational reasons. These might be a `foo_internal.h` auxiliary -header, or a `foo_test_part4.cpp` test fragment, but these extra files are not -meant to serve as its main interface or present its main idea. They're helper -details and they should have the component name as a prefix of their file names. +In uncommon cases, there are other files in the component for technical or internal organizational +reasons. These might be a `foo_internal.h` auxiliary header, or a `foo_test_part4.cpp` test +fragment, but these extra files are not meant to serve as its main interface or present its main +idea. They're helper details and they should have the component name as a prefix of their file +names. -A component will commonly be dominated by a single dominant class, and for -discoverability, it should therefore use that class name, in snake_case, as its -filename. That said, we have no rule limiting the number of declarations in a -file, and it is useful to define related classes together in a single component. +A component will commonly be dominated by a single dominant class, and for discoverability, it +should therefore use that class name, in snake_case, as its filename. That said, we have no rule +limiting the number of declarations in a file, and it is useful to define related classes together +in a single component. ### Using `#include` -- To make a declaration available, we require inclusion of a header file that - provides it. There should not be any implicit reliance on transitive includes, - even if the code compiles. As an exception to this general rule, `foo.cpp` and - `foo_test.cpp` do not need to duplicate the includes from `foo.h`. +- To make a declaration available, we require inclusion of a header file that provides it. There + should not be any implicit reliance on transitive includes, even if the code compiles. As an + exception to this general rule, `foo.cpp` and `foo_test.cpp` do not need to duplicate the includes + from `foo.h`. -- Do not make forward declarations to avoid an inclusion. It may be tempting to - do this as an optimization, but we don't do it, as there are correctness and - modularity risks. +- Do not make forward declarations to avoid an inclusion. It may be tempting to do this as an + optimization, but we don't do it, as there are correctness and modularity risks. -- Do not include headers that are not needed. Do not blindly copy large blocks - of include statements. +- Do not include headers that are not needed. Do not blindly copy large blocks of include + statements. -- An "umbrella" interface header may provide several related transitive - includes, but these umbrella headers should be documented as such, and they - should be provided by the library maintainer. Use IWYU (include what you use) - pragma comments to prevent tools and editors from incorrectly auto-suggesting - the private headers. +- An "umbrella" interface header may provide several related transitive includes, but these umbrella + headers should be documented as such, and they should be provided by the library maintainer. Use + IWYU (include what you use) pragma comments to prevent tools and editors from incorrectly + auto-suggesting the private headers. In the public header (e.g. `unittest/unittest.h`): @@ -771,14 +721,14 @@ file, and it is useful to define related classes together in a single component. // IWYU pragma: friend "mongo/unittest/.*" ``` -- A header should also be "self-contained", and include everything it needs. It - must not rely on other headers having been included above it by its users. +- A header should also be "self-contained", and include everything it needs. It must not rely on + other headers having been included above it by its users. -- Use "double quotes" to include headers under `mongo/`, and \ - for headers under `third_party/`, or for system libraries. +- Use "double quotes" to include headers under `mongo/`, and \ for headers under + `third_party/`, or for system libraries. -- Always use the forward relative path from `mongo/src/`. "Forward" means to not - refer to the parent directory `../`. +- Always use the forward relative path from `mongo/src/`. "Forward" means to not refer to the parent + directory `../`. - Don't use `third_party/` as part of include paths. Use `<>` and omit it. @@ -793,20 +743,18 @@ file, and it is useful to define related classes together in a single component. ### Ordering and Grouping of C++ `#include` Directives -We have a standard order for the include directives at the top of a C++ file. -It is automatically applied by our configuration of clang-format. -The purpose of this ordering is to keep the list organized to aid in visual -scanning, and to catch headers that are missing includes. +We have a standard order for the include directives at the top of a C++ file. It is automatically +applied by our configuration of clang-format. The purpose of this ordering is to keep the list +organized to aid in visual scanning, and to catch headers that are missing includes. -The include directives are organized into several blocks. -Within each block, the include directives are sorted alphabetically. -Follow each block with a blank line. +The include directives are organized into several blocks. Within each block, the include directives +are sorted alphabetically. Follow each block with a blank line. - Main header - For the `.cpp` and `_test.cpp` files of a component, include the component's - `.h` file if applicable as the first include. This is a safety practice that - helps us ensure that a `.h` file doesn't rely on any preceding inclusions. + For the `.cpp` and `_test.cpp` files of a component, include the component's `.h` file if + applicable as the first include. This is a safety practice that helps us ensure that a `.h` file + doesn't rely on any preceding inclusions. - First-party headers @@ -822,8 +770,8 @@ Follow each block with a blank line. - Unnamespaced headers - Include directives using `<>`, with no `/` in path. - Typically these are system C headers ending in `.h` + Include directives using `<>`, with no `/` in path. Typically these are system C headers ending in + `.h` E.g. ``. @@ -833,8 +781,8 @@ Follow each block with a blank line. E.g. ``, ``. -To summarize, a typical .cpp file "classy.cpp" might have up to 5 sorted blocks of -include directives: +To summarize, a typical .cpp file "classy.cpp" might have up to 5 sorted blocks of include +directives: ```c++ /** (Copyright notice would appear at the top, then...) */ @@ -853,13 +801,12 @@ include directives: #include ``` -Any headers that are conditionally included under the control of `#if` -directives (if technically possible) will appear after these blocks. +Any headers that are conditionally included under the control of `#if` directives (if technically +possible) will appear after these blocks. -Clang-format will not reorder includes across anything other than a blank line -or other includes. In the rare case where some header must be included before -or after all other headers, you can use a comment line to separate it from -other includes like: +Clang-format will not reorder includes across anything other than a blank line or other includes. In +the rare case where some header must be included before or after all other headers, you can use a +comment line to separate it from other includes like: ```cpp #include @@ -868,13 +815,13 @@ other includes like: #include ``` -If you see a comment line in old code that is unintentionally preventing proper -header ordering, you are encouraged to clean that up when adding or removing -includes. +If you see a comment line in old code that is unintentionally preventing proper header ordering, you +are encouraged to clean that up when adding or removing includes. ### For `js` Files (JavaScript only) -- Disable formatting for [template literals](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) +- Disable formatting for + [template literals](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) ```js // clang-format off @@ -884,10 +831,9 @@ newCode = `load("${overridesFile}"); (${jsCode})();`; ### Copyright Notices -- All new C++ files added to the MongoDB code base that will be upstreamed for - public consumption (such as anything upstreamed to `mongodb/mongo`) should - use the following copyright notice and SSPL license language, substituting - the current year for `YYYY` as appropriate: +- All new C++ files added to the MongoDB code base that will be upstreamed for public consumption + (such as anything upstreamed to `mongodb/mongo`) should use the following copyright notice and + SSPL license language, substituting the current year for `YYYY` as appropriate: ```c++ /** @@ -930,8 +876,8 @@ newCode = `load("${overridesFile}"); (${jsCode})();`; ## Basic Formatting Conventions in C++ Code -There are several matters of file formatting expected in source files, and we -enforce these when we can. If you use our recommended +There are several matters of file formatting expected in source files, and we enforce these when we +can. If you use our recommended [config](https://github.com/mongodb/mongo/blob/master/.vscode_defaults/linux-virtual-workstation.code-workspace) for VSCode, much of this will be handled automatically for you. @@ -943,20 +889,19 @@ for VSCode, much of this will be handled automatically for you. - Limit lines to 100 columns. -- Use Posix text format for source files. - All lines (including the final line) end with a LF (ASCII "line feed" aka `\n`) character. - We don't use the Windows CRLF (`\r\n`) line endings in source files. +- Use Posix text format for source files. All lines (including the final line) end with a LF (ASCII + "line feed" aka `\n`) character. We don't use the Windows CRLF (`\r\n`) line endings in source + files. - In VS Code, `files.eol` should be set to "\n", and `files.insertFinalNewline` - set to true to help with this. A Git config option on Windows can convert - line endings automatically (`core.autocrlf`). + In VS Code, `files.eol` should be set to "\n", and `files.insertFinalNewline` set to true to help + with this. A Git config option on Windows can convert line endings automatically + (`core.autocrlf`). ### Braces -Our braces style is that the opening brace appears at the end of the line. We -do not open a new line just for the opening brace that is part of a control flow -structure (`if`, `while`, etc). -Braces are optional for sufficiently simple statements. +Our braces style is that the opening brace appears at the end of the line. We do not open a new line +just for the opening brace that is part of a control flow structure (`if`, `while`, etc). Braces are +optional for sufficiently simple statements. ```c++ if (condition) @@ -982,17 +927,16 @@ Braces are optional for sufficiently simple statements. All JS files must be linted by ESLint before they are formatted by clang-format. -We use [ESLint](http://eslint.org/) to lint JS code. ESLint is a JS -linting tool that uses the config file located at `.eslintrc.yml`, in the root -of the mongo repository, to control the linting of the JS code. +We use [ESLint](http://eslint.org/) to lint JS code. ESLint is a JS linting tool that uses the +config file located at `.eslintrc.yml`, in the root of the mongo repository, to control the linting +of the JS code. -[Plugins](http://eslint.org/docs/user-guide/integrations) are available for most -editors that will automatically run ESLint on file save. It is recommended to -use one of these plugins. +[Plugins](http://eslint.org/docs/user-guide/integrations) are available for most editors that will +automatically run ESLint on file save. It is recommended to use one of these plugins. -Use the wrapper script `buildscripts/eslint.py` to check that the JS code is -linted correctly as well as to fix linting errors in the code. This wrapper -selects the appropriate version of eslint to be used. +Use the wrapper script `buildscripts/eslint.py` to check that the JS code is linted correctly as +well as to fix linting errors in the code. This wrapper selects the appropriate version of eslint to +be used. ```sh python buildscripts/eslint.py lint # lint js code @@ -1001,25 +945,22 @@ python buildscripts/eslint.py fix # auto-fix js code ### Clang-Format -All code changes must be formatted by -[clang-format](http://clang.llvm.org/docs/ClangFormat.html) before they are -checked in. Use `bazel run format` to reformat C++ and JS code. -Clang-format is a C/C++ & JS code formatting tool that uses the config files -located at `src/mongo/.clang-format` and `jstests/.clang-format` to control the -format of the code. The version and configuration of clang-format is selected by -`bazel run format`. +All code changes must be formatted by [clang-format](http://clang.llvm.org/docs/ClangFormat.html) +before they are checked in. Use `bazel run format` to reformat C++ and JS code. Clang-format is a +C/C++ & JS code formatting tool that uses the config files located at `src/mongo/.clang-format` and +`jstests/.clang-format` to control the format of the code. The version and configuration of +clang-format is selected by `bazel run format`. -Plugins are available for most editors that will automatically run clang-format -on file save. +Plugins are available for most editors that will automatically run clang-format on file save. -Clang-format is essential, but we should not let it create unreadable code. -There are some ways to keep it from producing a mess: +Clang-format is essential, but we should not let it create unreadable code. There are some ways to +keep it from producing a mess: - It will not join a line that ends in a (potentially empty) `//` comment. - It also recognizes comma-terminated lists as significant hints. -- As a last resort, it honors `clang-format off` and `clang-format on` in comments. - This should only be used where it is really important, since it may result in indentation - drift with the surrounding code as we upgrade clang-format or change settings. +- As a last resort, it honors `clang-format off` and `clang-format on` in comments. This should only + be used where it is really important, since it may result in indentation drift with the + surrounding code as we upgrade clang-format or change settings. ```c++ void clangFormatExamples() { @@ -1067,11 +1008,9 @@ void clangFormatExamples() { - CppCon "Back to Basics" track playlist. [link](https://www.youtube.com/playlist?list=PLHTh1InhhwT4TJaHBVWzvBOYhp27UO7mI) -- "A Tour of C++", Stroustrup. - ISBN: 9780133549003 +- "A Tour of C++", Stroustrup. ISBN: 9780133549003 -- "Large-Scale C++: Process and Architecture, Volume 1", Lakos. - ISBN 9780133927665 +- "Large-Scale C++: Process and Architecture, Volume 1", Lakos. ISBN 9780133927665 - All of Herb Sutter's "Exceptional" series of books. @@ -1084,13 +1023,14 @@ void clangFormatExamples() { - [MongoDB C++ Style Guide Proposals](https://docs.google.com/document/d/1nvmEnjw-5DNFIoXPa7WzM1PbOOl1fN19jl1sz9cpzAg) Roadmap and suggestion box for this document. -- [Server Code Style](https://github.com/mongodb/mongo/wiki/Server-Code-Style) on mongo github wiki to be replaced by this document. +- [Server Code Style](https://github.com/mongodb/mongo/wiki/Server-Code-Style) on mongo github wiki + to be replaced by this document. -- [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) We used to default - to this for all things not explicitly covered by our own guide, but that is no longer the case. +- [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) We used to default to + this for all things not explicitly covered by our own guide, but that is no longer the case. -- [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines) Interesting reading. - Diverges significantly at times from our style. +- [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines) Interesting + reading. Diverges significantly at times from our style. - [cppreference.com](https://cppreference.com) The best C++ reference site @@ -1099,6 +1039,5 @@ void clangFormatExamples() { - [Compiler Explorer](https://goldbolt.org) Great for demonstrating C++ ideas on multiple compilers. - [VSCode workspace file](https://github.com/mongodb/mongo/blob/master/.vscode_defaults/linux-virtual-workstation.code-workspace) - A default configuration for server engineers who use VSCode. It's configured - to handle editor configuration and formatting issues in accordance with this - guide. + A default configuration for server engineers who use VSCode. It's configured to handle editor + configuration and formatting issues in accordance with this guide. diff --git a/docs/devcontainer-setup.md b/docs/devcontainer-setup.md index 685f36bc574..f5f4e7788cb 100644 --- a/docs/devcontainer-setup.md +++ b/docs/devcontainer-setup.md @@ -4,8 +4,10 @@ **👉 Please visit the new [Dev Container Documentation](./devcontainer/README.md) for:** -- 📖 [**Getting Started Guide**](./devcontainer/getting-started.md) - Step-by-step setup instructions -- 🏗️ [**Architecture & Technical Details**](./devcontainer/architecture.md) - How everything works under the hood +- 📖 [**Getting Started Guide**](./devcontainer/getting-started.md) - Step-by-step setup + instructions +- 🏗️ [**Architecture & Technical Details**](./devcontainer/architecture.md) - How everything works + under the hood - 🔧 [**Troubleshooting Guide**](./devcontainer/troubleshooting.md) - Solutions to common issues - 💡 [**Advanced Usage**](./devcontainer/advanced.md) - Customization and power user features - ❓ [**FAQ**](./devcontainer/faq.md) - Frequently asked questions diff --git a/docs/devcontainer/README.md b/docs/devcontainer/README.md index f4b07709063..309b7b5c763 100644 --- a/docs/devcontainer/README.md +++ b/docs/devcontainer/README.md @@ -1,10 +1,12 @@ # MongoDB Development with Dev Containers -**⚠️ BETA:** The devcontainer setup is currently in Beta stage. Please report issues and feedback to the team. +**⚠️ BETA:** The devcontainer setup is currently in Beta stage. Please report issues and feedback to +the team. ## 📚 Documentation Index -This is the comprehensive guide for developing MongoDB using Dev Containers. Choose the guide that best fits your needs: +This is the comprehensive guide for developing MongoDB using Dev Containers. Choose the guide that +best fits your needs: ### 🚀 [Getting Started](./getting-started.md) @@ -80,7 +82,8 @@ This is the comprehensive guide for developing MongoDB using Dev Containers. Cho ## What are Dev Containers? -Dev Containers provide a consistent, reproducible development environment using Docker containers. This ensures: +Dev Containers provide a consistent, reproducible development environment using Docker containers. +This ensures: - ✅ **Consistency**: Everyone works with identical tooling and dependencies - ✅ **Isolation**: Your host system stays clean diff --git a/docs/devcontainer/advanced.md b/docs/devcontainer/advanced.md index 8a99c8f9e0d..c3ffe433113 100644 --- a/docs/devcontainer/advanced.md +++ b/docs/devcontainer/advanced.md @@ -1,8 +1,10 @@ # Advanced Dev Container Usage -This guide covers advanced workflows and power user features for managing multiple containers, backups, and complex development scenarios. +This guide covers advanced workflows and power user features for managing multiple containers, +backups, and complex development scenarios. -**Looking to customize your devcontainer?** See the [Customization Guide](./customization.md) for dotfiles, VS Code settings, extensions, and performance tuning. +**Looking to customize your devcontainer?** See the [Customization Guide](./customization.md) for +dotfiles, VS Code settings, extensions, and performance tuning. ## Table of Contents diff --git a/docs/devcontainer/architecture.md b/docs/devcontainer/architecture.md index bd15101909e..f0a7fb6bcb7 100644 --- a/docs/devcontainer/architecture.md +++ b/docs/devcontainer/architecture.md @@ -1,6 +1,7 @@ # Dev Container Architecture -This document provides a deep dive into how the MongoDB devcontainer is structured and how all the pieces work together. +This document provides a deep dive into how the MongoDB devcontainer is structured and how all the +pieces work together. ## Table of Contents @@ -201,7 +202,8 @@ MongoDB requires specific compiler versions. The toolchain installation process ### Toolchain Configuration -The `toolchain_config.env` file contains architecture-specific toolchain definitions for both ARM64 and AMD64: +The `toolchain_config.env` file contains architecture-specific toolchain definitions for both ARM64 +and AMD64: ```bash # Generated by toolchain.py @@ -289,7 +291,8 @@ The MongoDB toolchain includes: ### Toolchain Updates -The toolchain is managed by the MongoDB team. When updates are available, you'll get them automatically when you: +The toolchain is managed by the MongoDB team. When updates are available, you'll get them +automatically when you: - Pull the latest changes from the repository - Rebuild your devcontainer diff --git a/docs/devcontainer/customization.md b/docs/devcontainer/customization.md index 7c3c2677890..8afb66a2c10 100644 --- a/docs/devcontainer/customization.md +++ b/docs/devcontainer/customization.md @@ -1,10 +1,14 @@ # Customizing Your Dev Container -This guide covers personal customizations you can make to your MongoDB devcontainer **without modifying the repository's devcontainer configuration**. These are user-level settings that only affect your development environment. +This guide covers personal customizations you can make to your MongoDB devcontainer **without +modifying the repository's devcontainer configuration**. These are user-level settings that only +affect your development environment. -**Want to modify the devcontainer setup for everyone?** See [Contributing Customizations](#contributing-customizations) at the bottom. +**Want to modify the devcontainer setup for everyone?** See +[Contributing Customizations](#contributing-customizations) at the bottom. -**For general VS Code settings** (themes, fonts, keybindings), see the [VS Code documentation](https://code.visualstudio.com/docs/getstarted/settings). +**For general VS Code settings** (themes, fonts, keybindings), see the +[VS Code documentation](https://code.visualstudio.com/docs/getstarted/settings). ## Table of Contents @@ -76,7 +80,9 @@ This applies to all devcontainers you work with, not just MongoDB. ## Contributing Customizations -The customizations above are all user-level and don't require changes to the repository. If you want to modify the devcontainer setup itself to benefit all MongoDB developers, you'll need to submit a PR. +The customizations above are all user-level and don't require changes to the repository. If you want +to modify the devcontainer setup itself to benefit all MongoDB developers, you'll need to submit a +PR. **Examples of repository-level customizations:** @@ -108,4 +114,5 @@ The customizations above are all user-level and don't require changes to the rep - [Architecture](./architecture.md) - How devcontainers work - [Advanced Usage](./advanced.md) - Multiple containers, backups, workflows - [Troubleshooting](./troubleshooting.md) - Fix issues -- [VS Code Dev Containers Documentation](https://code.visualstudio.com/docs/devcontainers/containers) - General VS Code features +- [VS Code Dev Containers Documentation](https://code.visualstudio.com/docs/devcontainers/containers) - + General VS Code features diff --git a/docs/devcontainer/faq.md b/docs/devcontainer/faq.md index 2d6c655e134..df0d465fd64 100644 --- a/docs/devcontainer/faq.md +++ b/docs/devcontainer/faq.md @@ -6,14 +6,16 @@ Frequently asked questions about MongoDB development with dev containers. ### What is a dev container? -A dev container (development container) is a Docker container configured specifically for development. It includes: +A dev container (development container) is a Docker container configured specifically for +development. It includes: - All build tools and dependencies - IDE configuration and extensions - Persistent storage for caches and settings - Consistent environment across all developers -Think of it as a portable, reproducible development environment that runs on any machine with Docker. +Think of it as a portable, reproducible development environment that runs on any machine with +Docker. [Learn more about dev containers →](https://containers.dev/) @@ -43,11 +45,14 @@ Report issues to help improve it for everyone! - Pros: Works without SSH keys, simpler for read-only access - Cons: May require password/token for push operations -See the [Getting Started guide SSH setup section](./getting-started.md#4-configure-ssh-keys-recommended) for details. +See the +[Getting Started guide SSH setup section](./getting-started.md#4-configure-ssh-keys-recommended) for +details. ### How do SSH keys work with devcontainers? -VS Code automatically forwards your SSH agent to the container, so you don't need to copy keys into the container. +VS Code automatically forwards your SSH agent to the container, so you don't need to copy keys into +the container. **Requirements:** @@ -65,7 +70,8 @@ ssh-add -l ssh -T git@github.com ``` -**Inside the container**, Git commands will automatically use your host's SSH keys through agent forwarding. +**Inside the container**, Git commands will automatically use your host's SSH keys through agent +forwarding. [Learn more about SSH agent forwarding →](https://code.visualstudio.com/remote/advancedcontainers/sharing-git-credentials) @@ -126,7 +132,8 @@ First-time setup includes: - WSL2 installed and configured - Docker Desktop with WSL2 integration enabled -**Important:** Clone repository in WSL2 filesystem (not `/mnt/c/`), not Windows filesystem, for best performance. +**Important:** Clone repository in WSL2 filesystem (not `/mnt/c/`), not Windows filesystem, for best +performance. ### Can I use this on Apple Silicon (M1/M2/M3)? @@ -161,7 +168,8 @@ docker cp :/workspaces/mongo/file.txt ~/Downloads/ **Option 3: Use bind mount** (sacrifices performance) -Open your existing local repository in VS Code and use "Dev Containers: Reopen in Container". This uses a bind mount which allows direct host filesystem access but is slower, especially on macOS. +Open your existing local repository in VS Code and use "Dev Containers: Reopen in Container". This +uses a bind mount which allows direct host filesystem access but is slower, especially on macOS. ### Can I use my existing local clone? @@ -369,8 +377,7 @@ gcc --version # Should show the MongoDB toolchain GCC version ls -la ~/.config/engflow_auth/ ``` -**Re-authenticate:** -Contact MongoDB team for authentication flow. +**Re-authenticate:** Contact MongoDB team for authentication flow. **Build locally instead:** @@ -406,13 +413,15 @@ Allocate as much disk space as you can comfortably spare. We recommend at least **Allocate as much as possible** while leaving enough for your host OS to function (~4-8 GB). -More RAM = faster builds with more parallel jobs. MongoDB builds are resource-intensive and benefit greatly from additional memory. +More RAM = faster builds with more parallel jobs. MongoDB builds are resource-intensive and benefit +greatly from additional memory. ### How many CPU cores should I allocate? **Allocate as many cores as possible** while leaving a couple for your host OS (1-2 cores). -Bazel parallelizes well; more cores = significantly faster builds. If you have 8+ cores available, MongoDB builds will complete much faster. +Bazel parallelizes well; more cores = significantly faster builds. If you have 8+ cores available, +MongoDB builds will complete much faster. ### Can I reduce resource usage? @@ -437,7 +446,8 @@ bazel clean # Clear build outputs bazel clean --expunge # Clear everything (reclaim disk space) ``` -> **Note:** Reducing resources will make builds slower. If possible, it's better to allocate more resources to Docker instead. +> **Note:** Reducing resources will make builds slower. If possible, it's better to allocate more +> resources to Docker instead. ### How do I monitor resource usage? @@ -492,7 +502,8 @@ But you lose VS Code integration, extensions, and convenience features. - **Architecture Details**: [architecture.md](./architecture.md) - **Troubleshooting**: [troubleshooting.md](./troubleshooting.md) - **Advanced Topics**: [advanced.md](./advanced.md) -- **VS Code Docs**: [code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers) +- **VS Code Docs**: + [code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers) ### Who do I contact for help? diff --git a/docs/devcontainer/getting-started.md b/docs/devcontainer/getting-started.md index 8045f1a70ad..513031023bc 100644 --- a/docs/devcontainer/getting-started.md +++ b/docs/devcontainer/getting-started.md @@ -1,16 +1,19 @@ # Getting Started with MongoDB Dev Containers -This guide will walk you through setting up your MongoDB development environment using Dev Containers. +This guide will walk you through setting up your MongoDB development environment using Dev +Containers. ## Prerequisites ### 1. Install Docker -Dev Containers require Docker to be installed and running on your system. Choose one of the following Docker providers: +Dev Containers require Docker to be installed and running on your system. Choose one of the +following Docker providers: #### Option A: Rancher Desktop (Recommended) -[Rancher Desktop](https://rancherdesktop.io/) is our recommended Docker provider for devcontainer development. +[Rancher Desktop](https://rancherdesktop.io/) is our recommended Docker provider for devcontainer +development. **Installation:** @@ -20,28 +23,34 @@ Dev Containers require Docker to be installed and running on your system. Choose - **Container Engine**: Select `dockerd (moby)` ⚠️ **Important!** - **Configure Path**: Select "Automatic" -**Recommended Settings:** -After installation, increase resources for better build performance: +**Recommended Settings:** After installation, increase resources for better build performance: 1. Open Rancher Desktop → Preferences → Virtual Machine 2. **Memory**: Allocate as much as your system allows (leave ~4-8 GB for your host OS) 3. **CPUs**: Allocate as many cores as possible (leave 1-2 for your host OS) -4. **Disk**: Rancher Desktop doesn't have a UI for disk size. To increase it, see [Troubleshooting - Increase Docker disk allocation](./troubleshooting.md#build-fails-with-no-space-left-on-device) for instructions. +4. **Disk**: Rancher Desktop doesn't have a UI for disk size. To increase it, see + [Troubleshooting - Increase Docker disk allocation](./troubleshooting.md#build-fails-with-no-space-left-on-device) + for instructions. 5. Apply changes and restart Rancher Desktop -> **Tip:** More resources = faster builds. MongoDB builds benefit significantly from additional CPU cores and memory. +> **Tip:** More resources = faster builds. MongoDB builds benefit significantly from additional CPU +> cores and memory. -**IMPORTANT!**: If you already have VSCode open when you install Rancher Desktop, make sure to restart VSCode otherwise it may not find the Docker socket and VSCode will prompt you to install Docker Desktop instead. +**IMPORTANT!**: If you already have VSCode open when you install Rancher Desktop, make sure to +restart VSCode otherwise it may not find the Docker socket and VSCode will prompt you to install +Docker Desktop instead. #### Option B: Docker Desktop [Docker Desktop](https://www.docker.com/products/docker-desktop/) is a popular alternative. -> **Note on Licensing**: Docker Desktop may require a paid license for commercial use. Please review the licensing terms to ensure compliance with your use case. +> **Note on Licensing**: Docker Desktop may require a paid license for commercial use. Please review +> the licensing terms to ensure compliance with your use case. **Installation:** -1. Download from [docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop/) +1. Download from + [docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop/) 2. Install and start Docker Desktop 3. Go to Settings → Resources and allocate generously: - **Memory**: Allocate as much as possible (leave ~4-8 GB for your host OS) @@ -52,7 +61,8 @@ After installation, increase resources for better build performance: [OrbStack](https://orbstack.dev/) is a lightweight, fast Docker alternative for macOS. -> **Note on Licensing**: OrbStack may require a paid license for commercial use. Please review the licensing terms to ensure compliance with your use case. +> **Note on Licensing**: OrbStack may require a paid license for commercial use. Please review the +> licensing terms to ensure compliance with your use case. **Installation:** @@ -64,12 +74,14 @@ After installation, increase resources for better build performance: For Linux users, you can use Docker Engine directly. -**Installation:** -Follow the official guide: [docs.docker.com/engine/install](https://docs.docker.com/engine/install/) +**Installation:** Follow the official guide: +[docs.docker.com/engine/install](https://docs.docker.com/engine/install/) ### 2. Create SSH Directory (Required) -> **⚠️ Critical:** You **must** have a `~/.ssh` directory on your host machine before building the devcontainer. The devcontainer requires this directory to exist, regardless of whether you use SSH or HTTPS to clone the repository. +> **⚠️ Critical:** You **must** have a `~/.ssh` directory on your host machine before building the +> devcontainer. The devcontainer requires this directory to exist, regardless of whether you use SSH +> or HTTPS to clone the repository. ```bash # On your HOST machine (not inside the container) @@ -87,13 +99,17 @@ Download and install VS Code from [code.visualstudio.com](https://code.visualstu 1. Open VS Code 2. Go to Extensions (⌘/Ctrl+Shift+X) 3. Search for "Dev Containers" -4. Install the [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension by Microsoft +4. Install the + [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) + extension by Microsoft ### 5. Configure SSH Keys (Recommended) -To clone the repository using SSH (recommended for contributors), you'll need SSH keys configured with GitHub. +To clone the repository using SSH (recommended for contributors), you'll need SSH keys configured +with GitHub. -> **⚠️ Important:** Run all commands in this section on your **host machine** (not inside the container). SSH keys need to be set up before cloning the repository into the container. +> **⚠️ Important:** Run all commands in this section on your **host machine** (not inside the +> container). SSH keys need to be set up before cloning the repository into the container. #### Check if you have SSH keys @@ -183,7 +199,8 @@ Get-Service ssh-agent | Set-Service -StartupType Automatic Start-Service ssh-agent ``` -> **Note:** VS Code automatically forwards your SSH agent to the container, so your keys will be available inside the devcontainer. +> **Note:** VS Code automatically forwards your SSH agent to the container, so your keys will be +> available inside the devcontainer. [Learn more about using SSH keys with GitHub →](https://docs.github.com/en/authentication/connecting-to-github-with-ssh) @@ -191,7 +208,8 @@ Start-Service ssh-agent ### Step 1: Clone Repository in Named Container Volume -For **optimal performance**, especially on macOS, clone the repository directly into a Docker volume rather than your local filesystem. This is crucial for Bazel performance. +For **optimal performance**, especially on macOS, clone the repository directly into a Docker volume +rather than your local filesystem. This is crucial for Bazel performance. #### Why Named Volumes? @@ -397,7 +415,8 @@ ssh-add ~/.ssh/id_ed25519 # Command Palette → "Dev Containers: Rebuild Container" ``` -**VS Code SSH Agent Forwarding**: The Dev Containers extension automatically forwards your SSH agent, but this requires: +**VS Code SSH Agent Forwarding**: The Dev Containers extension automatically forwards your SSH +agent, but this requires: - SSH agent running on host with keys loaded - SSH key files in default location (`~/.ssh/`) diff --git a/docs/devcontainer/troubleshooting.md b/docs/devcontainer/troubleshooting.md index ada6ee55c0e..47b947b53be 100644 --- a/docs/devcontainer/troubleshooting.md +++ b/docs/devcontainer/troubleshooting.md @@ -28,7 +28,8 @@ Docker version or later is required **Solution** -Restart VSCode. If you install Rancher Desktop while you already have VSCode open, it doesn't properly detect the Docker socket and prompts you to install Docker Desktop by mistake. +Restart VSCode. If you install Rancher Desktop while you already have VSCode open, it doesn't +properly detect the Docker socket and prompts you to install Docker Desktop by mistake. ## Container Build Issues @@ -48,7 +49,9 @@ Error response from daemon: invalid mount config for type "bind": bind source pa **Root Cause:** -The devcontainer configuration mounts your `~/.ssh` directory to enable Git operations over SSH. If this directory doesn't exist on your host machine, the container fails to start. **This directory is required even if you plan to use HTTPS instead of SSH for cloning.** +The devcontainer configuration mounts your `~/.ssh` directory to enable Git operations over SSH. If +this directory doesn't exist on your host machine, the container fails to start. **This directory is +required even if you plan to use HTTPS instead of SSH for cloning.** **Solutions:** @@ -73,7 +76,8 @@ SSH agent forwarding behavior varies by Docker provider on macOS: - With dockerd runtime: Automatic agent forwarding - With containerd runtime: Agent forwarding requires additional setup -To use SSH agent forwarding, ensure your SSH keys are added to your host's SSH agent before starting the container: +To use SSH agent forwarding, ensure your SSH keys are added to your host's SSH agent before starting +the container: ```bash ssh-add ~/.ssh/id_ed25519 # or your key name @@ -117,7 +121,8 @@ Error: failed to solve: write /var/lib/docker/...: no space left on device disk: 100GB ``` 4. Start Rancher Desktop - 5. If Rancher Desktop was previously initialized, you may need to perform a factory reset (Preferences → Troubleshooting → Reset Kubernetes) for the disk size change to take effect. + 5. If Rancher Desktop was previously initialized, you may need to perform a factory reset + (Preferences → Troubleshooting → Reset Kubernetes) for the disk size change to take effect. **On Windows (WSL2):** @@ -125,7 +130,8 @@ Error: failed to solve: write /var/lib/docker/...: no space left on device 1. Stop Rancher Desktop 2. Run: `wsl --shutdown` - 3. Follow Microsoft's guide to increase WSL2 disk size: https://learn.microsoft.com/en-us/windows/wsl/disk-space + 3. Follow Microsoft's guide to increase WSL2 disk size: + https://learn.microsoft.com/en-us/windows/wsl/disk-space **Docker Desktop:** @@ -174,7 +180,8 @@ Error: Failed to download toolchain curl -I "$(grep TOOLCHAIN_URL .devcontainer/toolchain_config.env | cut -d'"' -f2)" ``` -3. **If toolchain URL is broken**, report it to the MongoDB team. This is a devcontainer configuration issue that needs to be fixed upstream. +3. **If toolchain URL is broken**, report it to the MongoDB team. This is a devcontainer + configuration issue that needs to be fixed upstream. ### Build Fails with Checksum Mismatch @@ -203,7 +210,8 @@ Got: def456... # Command Palette → "Dev Containers: Rebuild Container Without Cache" ``` -3. **If problem persists**, this is likely a devcontainer configuration issue - report it to the MongoDB team. +3. **If problem persists**, this is likely a devcontainer configuration issue - report it to the + MongoDB team. ### Container Fails to Start @@ -288,11 +296,9 @@ Got: def456... - File save is delayed - Terminal autocomplete is slow -**Root Cause:** -Bind mounts on macOS use osxfs which has high latency for filesystem operations. +**Root Cause:** Bind mounts on macOS use osxfs which has high latency for filesystem operations. -**Solution:** -✅ **Use named volumes instead of bind mounts** (see Getting Started guide) +**Solution:** ✅ **Use named volumes instead of bind mounts** (see Getting Started guide) ### High CPU Usage @@ -517,7 +523,8 @@ fatal: Could not read from remote repository. ssh-add ~/.ssh/id_ed25519 # or id_rsa ``` -See [Getting Started - SSH Setup](./getting-started.md#4-configure-ssh-keys-recommended) for detailed instructions. +See [Getting Started - SSH Setup](./getting-started.md#4-configure-ssh-keys-recommended) for +detailed instructions. ### SSH Works on Host But Not in Container @@ -527,8 +534,7 @@ See [Getting Started - SSH Setup](./getting-started.md#4-configure-ssh-keys-reco - Same operations fail inside devcontainer - "Permission denied" or asks for password -**Root Cause:** -SSH agent forwarding isn't working properly. +**Root Cause:** SSH agent forwarding isn't working properly. **Solutions:** @@ -633,8 +639,7 @@ git config --global credential.helper store # Next time you enter credentials, they'll be saved ``` -**Option 3: Fix SSH agent forwarding**: -See "SSH Works on Host But Not in Container" section above. +**Option 3: Fix SSH agent forwarding**: See "SSH Works on Host But Not in Container" section above. ### Multiple SSH Keys (Personal + Work) @@ -868,8 +873,7 @@ ModuleNotFoundError: No module named 'pymongo' - History cleared - Python venv empty -**Root Cause:** -Volumes not mounting correctly +**Root Cause:** Volumes not mounting correctly **Solutions:** @@ -917,8 +921,8 @@ docker cp :/workspaces/mongo/file.txt ~/Downloads/ # Right-click file → Download... ``` -**To edit with external tools:** -Use bind mounts instead of named volumes (but sacrifices performance). +**To edit with external tools:** Use bind mounts instead of named volumes (but sacrifices +performance). ### Volume Fills Up Disk @@ -1070,8 +1074,7 @@ permission denied while trying to connect to Docker daemon - Slow builds - Out of memory errors -**Solution:** -Go to Docker Desktop → Settings → Resources and allocate generously: +**Solution:** Go to Docker Desktop → Settings → Resources and allocate generously: - **CPUs**: Allocate as many as possible (leave 1-2 for host OS) - **Memory**: Allocate as much as possible (leave ~4-8 GB for host OS) @@ -1087,8 +1090,7 @@ Go to Docker Desktop → Settings → Resources and allocate generously: - Docker-outside-of-docker doesn't work - Volume mounts fail -**Solution:** -OrbStack has some limitations with devcontainer features. Try: +**Solution:** OrbStack has some limitations with devcontainer features. Try: 1. Update to latest OrbStack version 2. Check OrbStack documentation for devcontainer compatibility @@ -1177,7 +1179,8 @@ cd mongo If your issue isn't covered here: -1. **Check VS Code Docs**: [code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers) +1. **Check VS Code Docs**: + [code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers) 2. **Search Issues**: MongoDB GitHub repository issues 3. **Ask the Team**: MongoDB developers Slack/chat 4. **File a Bug**: Include: diff --git a/docs/egress_networking.md b/docs/egress_networking.md index 0aab2a9bb61..ddda16b875b 100644 --- a/docs/egress_networking.md +++ b/docs/egress_networking.md @@ -1,26 +1,95 @@ # Egress Networking -Egress networking entails outbound communication (i.e. requests) from a client process to a server process (e.g. _mongod_), as well as inbound communication (i.e. responses) from such a server process back to a client process. +Egress networking entails outbound communication (i.e. requests) from a client process to a server +process (e.g. _mongod_), as well as inbound communication (i.e. responses) from such a server +process back to a client process. ## Remote Commands -A remote command represents an exchange of data between a client and a server. A remote command consists of two steps: a request, which the clients sends to the server, and a response, which the client receives from the server. These elements are represented by the [request][remote_command_request_h] and [response][remote_command_response_h] objects; each wraps the BSON that represents the on-wire transacted data and metadata that describes the context of the command, such as the host that the command targets. Each object also contains metadata that corresponds to its half of the command lifecycle. For example, the request object notes the timeout of the command and the operation's unique identifier, among other fields, and the response object notes the final disposition of the command's data exchange as a `Status` object (which takes no position on the success of the command's semantics at the remote) and the time that the command actually took to execute, among other fields. In the case of an exhaust command, there may be multiple responses for a single request. +A remote command represents an exchange of data between a client and a server. A remote command +consists of two steps: a request, which the clients sends to the server, and a response, which the +client receives from the server. These elements are represented by the +[request][remote_command_request_h] and [response][remote_command_response_h] objects; each wraps +the BSON that represents the on-wire transacted data and metadata that describes the context of the +command, such as the host that the command targets. Each object also contains metadata that +corresponds to its half of the command lifecycle. For example, the request object notes the timeout +of the command and the operation's unique identifier, among other fields, and the response object +notes the final disposition of the command's data exchange as a `Status` object (which takes no +position on the success of the command's semantics at the remote) and the time that the command +actually took to execute, among other fields. In the case of an exhaust command, there may be +multiple responses for a single request. ## Connection Pooling -The [executor::ConnectionPool][connection_pool_h] class is responsible for pooling connections to any number of hosts. It contains zero or more `ConnectionPool::SpecificPool` objects, each of which pools connections for a unique host, and exactly one `ConnectionPool::ControllerInterface` object, which is responsible for the addition, removal, and updating of `SpecificPool`s to, from, and in its owning `ConnectionPool`. When a caller requests a connection to a host from the `ConnectionPool`, the `ConnectionPool` creates a new `SpecificPool` to pool connections for that host if one does not exist already, and then the `ConnectionPool` forwards the request to the `SpecificPool`. A `SpecificPool` expires when its `hostTimeout` has passed without any connection requests, after which time it becomes unusable; further requests for connections to that host will trigger the creation of a fresh `SpecificPool`. +The [executor::ConnectionPool][connection_pool_h] class is responsible for pooling connections to +any number of hosts. It contains zero or more `ConnectionPool::SpecificPool` objects, each of which +pools connections for a unique host, and exactly one `ConnectionPool::ControllerInterface` object, +which is responsible for the addition, removal, and updating of `SpecificPool`s to, from, and in its +owning `ConnectionPool`. When a caller requests a connection to a host from the `ConnectionPool`, +the `ConnectionPool` creates a new `SpecificPool` to pool connections for that host if one does not +exist already, and then the `ConnectionPool` forwards the request to the `SpecificPool`. A +`SpecificPool` expires when its `hostTimeout` has passed without any connection requests, after +which time it becomes unusable; further requests for connections to that host will trigger the +creation of a fresh `SpecificPool`. -The final result of a successful connection request made through `ConnectionPool::getConnection` is a `ConnectionPool::ConnectionInterface`, which represents a connection ready for use. Externally, the `ConnectionInterface` is primarily used by the caller to exchange data with its remote host. Callers return `ConnectionInterface`s to the pool by allowing them to destruct and callers must signal to the pool the final disposition of the connection beforehand through the `indicate*` family of methods. `ConnectionInterface`s also support setting timers to schedule future activities. Internally, the `ConnectionInterface` is used to prepare the connection for data exchange before transferring ownership to the caller and refreshing the health of a connection when the caller returns the connection to the pool. `ConnectionInterface` also maintains a notion of generation, which is implemented as a monotonically-incrementing counter. When a caller returns a `ConnectionInterface` to a `ConnectionPool` from a generation prior to the current generation of the corresponding `SpecificPool`, the connection is dropped. The current generation of a `SpecificPool` is incremented when the pool experiences certain failures (e.g., when to establish a new connection). `ConnectionPool` also drops a connection if the caller called `indicateFailure` on the connection before returning it. `ConnectionPool` uses a global mutex for access to `SpecificPool`s as well as generation counters. +The final result of a successful connection request made through `ConnectionPool::getConnection` is +a `ConnectionPool::ConnectionInterface`, which represents a connection ready for use. Externally, +the `ConnectionInterface` is primarily used by the caller to exchange data with its remote host. +Callers return `ConnectionInterface`s to the pool by allowing them to destruct and callers must +signal to the pool the final disposition of the connection beforehand through the `indicate*` family +of methods. `ConnectionInterface`s also support setting timers to schedule future activities. +Internally, the `ConnectionInterface` is used to prepare the connection for data exchange before +transferring ownership to the caller and refreshing the health of a connection when the caller +returns the connection to the pool. `ConnectionInterface` also maintains a notion of generation, +which is implemented as a monotonically-incrementing counter. When a caller returns a +`ConnectionInterface` to a `ConnectionPool` from a generation prior to the current generation of the +corresponding `SpecificPool`, the connection is dropped. The current generation of a `SpecificPool` +is incremented when the pool experiences certain failures (e.g., when to establish a new +connection). `ConnectionPool` also drops a connection if the caller called `indicateFailure` on the +connection before returning it. `ConnectionPool` uses a global mutex for access to `SpecificPool`s +as well as generation counters. -`ConnectionPool` uses its single instance of `EgressConnectionCloserManager` to determine when hosts should be dropped. The manager consists of multiple `EgressConnectionClosers`, which are used to determine whether hosts should be dropped. In the context of the ConnectionPool, the manager's purpose is to drop _connections_ to hosts based on whether they have been marked as keep open or not. +`ConnectionPool` uses its single instance of `EgressConnectionCloserManager` to determine when hosts +should be dropped. The manager consists of multiple `EgressConnectionClosers`, which are used to +determine whether hosts should be dropped. In the context of the ConnectionPool, the manager's +purpose is to drop _connections_ to hosts based on whether they have been marked as keep open or +not. ## Internal Network Clients -Client-side outbound communication in egress networking is primarily handled by the [AsyncDBClient class][async_client_h]. The async client is responsible for initializing a connection to a particular host as well as initializing the [wire protocol][wire_protocol] for client-server communication, after which remote requests can be sent by the client and corresponding remote responses from a database can subsequently be received. In setting up the wire protocol, the async client sends an [isMaster][is_master] request to the server and parses the server's isMaster response to ensure that the status of the connection is OK. An initial isMaster request is constructed in the legacy OP_QUERY protocol, so that clients can still communicate with servers that may not support other protocols. The async client also supports client authentication functionality (i.e. authenticating a user's credentials, client host, remote host, etc.). +Client-side outbound communication in egress networking is primarily handled by the [AsyncDBClient +class][async_client_h]. The async client is responsible for initializing a connection to a +particular host as well as initializing the [wire protocol][wire_protocol] for client-server +communication, after which remote requests can be sent by the client and corresponding remote +responses from a database can subsequently be received. In setting up the wire protocol, the async +client sends an [isMaster][is_master] request to the server and parses the server's isMaster +response to ensure that the status of the connection is OK. An initial isMaster request is +constructed in the legacy OP_QUERY protocol, so that clients can still communicate with servers that +may not support other protocols. The async client also supports client authentication functionality +(i.e. authenticating a user's credentials, client host, remote host, etc.). -The scheduling of requests is managed by the [task executor][task_executor_h], which maintains the notion of **events** and **callbacks**. Callbacks represent work (e.g. remote requests) that is to be executed by the executor, and are scheduled by client threads as well as other callbacks. There are several variations of work scheduling methods, which include: immediate scheduling, scheduling no earlier than a specified time, and scheduling iff a specified event has been signalled. These methods return a handle that can be used while the executor is still in scope for either waiting on or cancelling the scheduled callback in question. If a scheduled callback is cancelled, it remains on the work queue and is technically still run, but is labeled as having been 'cancelled' beforehand. Once a given callback/request is scheduled, the task executor is then able to execute such requests via a [network interface][network_interface_h]. The network interface, connected to a particular host/server, begins the asynchronous execution of commands specified via a request bundled in the aforementioned callback handle. The interface is capable of blocking threads until its associated task executor has work that needs to be performed, and is likewise able to return from an idle state when it receives a signal that the executor has new work to process. +The scheduling of requests is managed by the [task executor][task_executor_h], which maintains the +notion of **events** and **callbacks**. Callbacks represent work (e.g. remote requests) that is to +be executed by the executor, and are scheduled by client threads as well as other callbacks. There +are several variations of work scheduling methods, which include: immediate scheduling, scheduling +no earlier than a specified time, and scheduling iff a specified event has been signalled. These +methods return a handle that can be used while the executor is still in scope for either waiting on +or cancelling the scheduled callback in question. If a scheduled callback is cancelled, it remains +on the work queue and is technically still run, but is labeled as having been 'cancelled' +beforehand. Once a given callback/request is scheduled, the task executor is then able to execute +such requests via a [network interface][network_interface_h]. The network interface, connected to a +particular host/server, begins the asynchronous execution of commands specified via a request +bundled in the aforementioned callback handle. The interface is capable of blocking threads until +its associated task executor has work that needs to be performed, and is likewise able to return +from an idle state when it receives a signal that the executor has new work to process. -Client-side legacy networking draws upon the `DBClientBase` class, of which there are multiple subclasses residing in the `src/mongo/client` folder. The [replica set DBClient][dbclient_rs_h] discerns which one of multiple servers in a replica set is the primary at construction time, and establishes a connection (using the `DBClientConnection` wrapper class, also extended from `DBClientBase`) with the replica set via the primary. In cases where the primary server is unresponsive within a specified time range, the RS DBClient will automatically attempt to establish a secondary server as the new primary (see [automatic failover][automatic_failover]). +Client-side legacy networking draws upon the `DBClientBase` class, of which there are multiple +subclasses residing in the `src/mongo/client` folder. The [replica set DBClient][dbclient_rs_h] +discerns which one of multiple servers in a replica set is the primary at construction time, and +establishes a connection (using the `DBClientConnection` wrapper class, also extended from +`DBClientBase`) with the replica set via the primary. In cases where the primary server is +unresponsive within a specified time range, the RS DBClient will automatically attempt to establish +a secondary server as the new primary (see [automatic failover][automatic_failover]). ## See Also diff --git a/docs/evergreen-testing/burn_in_tags.md b/docs/evergreen-testing/burn_in_tags.md index 3e7af0ad24b..acaba058e51 100644 --- a/docs/evergreen-testing/burn_in_tags.md +++ b/docs/evergreen-testing/burn_in_tags.md @@ -3,26 +3,26 @@ ## What it is Similar to [burn_in_tests](burn_in_tests.md), `burn_in_tags` also detects the javascript tests -(under the [jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) -that are new or have changed since the last git command and then runs those tests in repeated -mode to validate their stability. But instead of running the tests on their original build -variants, `burn_in_tags` runs them on the burn_in build variants that are generated separately. +(under the [jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) that are new +or have changed since the last git command and then runs those tests in repeated mode to validate +their stability. But instead of running the tests on their original build variants, `burn_in_tags` +runs them on the burn_in build variants that are generated separately. ## How to use it -You can use `burn_in_tags` on evergreen by selecting the `burn_in_tags_gen` task when creating a patch. -The burn_in build variants, i.e., `enterprise-rhel-8-64-bit-inmem` and `enterprise-rhel-8-64-bit-multiversion` -will be generated, each of which will have a `burn_in_tests` task generated by the -[mongo-task-generator](https://github.com/mongodb/mongo-task-generator). `burn_in_tests` task, a -[generated task](task_generation.md), may have multiple sub-tasks which run the test suites only for the -new or changed javascript tests (note that a javascript test can be included in multiple test suites). Each of -those tests will be run 2 times minimum, and 1000 times maximum or for 10 minutes, whichever is reached first. +You can use `burn_in_tags` on evergreen by selecting the `burn_in_tags_gen` task when creating a +patch. The burn_in build variants, i.e., `enterprise-rhel-8-64-bit-inmem` and +`enterprise-rhel-8-64-bit-multiversion` will be generated, each of which will have a `burn_in_tests` +task generated by the [mongo-task-generator](https://github.com/mongodb/mongo-task-generator). +`burn_in_tests` task, a [generated task](task_generation.md), may have multiple sub-tasks which run +the test suites only for the new or changed javascript tests (note that a javascript test can be +included in multiple test suites). Each of those tests will be run 2 times minimum, and 1000 times +maximum or for 10 minutes, whichever is reached first. ## ! Run All Affected JStests -The `! Run All Affected JStests` variant has a single `burn_in_tags_gen` task. This task will create & -activate [`burn_in_tests`](burn_in_tests.md) tasks for all required and suggested -variants. The end result is that any jstests that have been modified in the patch will -run on all required and suggested variants. This should give users a clear signal on -whether their jstests changes have introduced a failure that could potentially lead -to a revert or follow-up bug fix commit. +The `! Run All Affected JStests` variant has a single `burn_in_tags_gen` task. This task will create +& activate [`burn_in_tests`](burn_in_tests.md) tasks for all required and suggested variants. The +end result is that any jstests that have been modified in the patch will run on all required and +suggested variants. This should give users a clear signal on whether their jstests changes have +introduced a failure that could potentially lead to a revert or follow-up bug fix commit. diff --git a/docs/evergreen-testing/burn_in_tests.md b/docs/evergreen-testing/burn_in_tests.md index d81e1477651..c0286b43012 100644 --- a/docs/evergreen-testing/burn_in_tests.md +++ b/docs/evergreen-testing/burn_in_tests.md @@ -3,19 +3,21 @@ ## What it is `burn_in_tests` detects the javascript tests (under the -[jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) that are new or have changed -since the last git command and then runs those tests in repeated mode to validate their stability. +[jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) that are new or have +changed since the last git command and then runs those tests in repeated mode to validate their +stability. ## How to use it -You can use `burn_in_tests` on evergreen by selecting the `burn_in_tests_gen` task when creating a patch, -since `burn_in_tests` task is a [generated task](task_generation.md) generated by the -[mongo-task-generator](https://github.com/mongodb/mongo-task-generator). -`burn_in_tests` task will be generated on each of the applicable build variants, and -may have multiple sub-tasks which run the test suites only for the new or changed javascript tests (note -that a javascript test can be included in multiple test suites). Each of those tests will be run 2 times -minimum, and 1000 times maximum or for 10 minutes, whichever is reached first. +You can use `burn_in_tests` on evergreen by selecting the `burn_in_tests_gen` task when creating a +patch, since `burn_in_tests` task is a [generated task](task_generation.md) generated by the +[mongo-task-generator](https://github.com/mongodb/mongo-task-generator). `burn_in_tests` task will +be generated on each of the applicable build variants, and may have multiple sub-tasks which run the +test suites only for the new or changed javascript tests (note that a javascript test can be +included in multiple test suites). Each of those tests will be run 2 times minimum, and 1000 times +maximum or for 10 minutes, whichever is reached first. -You can also use `burn_in_tests` locally from within the [mongo repo](https://github.com/mongodb/mongo) -by running the script `python buildscripts/burn_in_tests.py`. For more information about this usage, you can -run `python buildscripts/burn_in_tests.py --help`. +You can also use `burn_in_tests` locally from within the +[mongo repo](https://github.com/mongodb/mongo) by running the script +`python buildscripts/burn_in_tests.py`. For more information about this usage, you can run +`python buildscripts/burn_in_tests.py --help`. diff --git a/docs/evergreen-testing/multiversion.md b/docs/evergreen-testing/multiversion.md index 2870a49b9d3..5ee5888e3f8 100644 --- a/docs/evergreen-testing/multiversion.md +++ b/docs/evergreen-testing/multiversion.md @@ -34,37 +34,37 @@ For some of the versions we are using such generic names as `latest`, `last-lts` - `latest` - the current version. In Evergreen, the version that was compiled in the current build. - `last-lts` - the latest LTS (Long Term Support) Major release version. In Evergreen, the version - that was downloaded from the last LTS release branch project. It resolves to an entry - in `longTermSupportReleases` of [releases.yml](../../src/mongo/util/version/releases.yml). + that was downloaded from the last LTS release branch project. It resolves to an entry in + `longTermSupportReleases` of [releases.yml](../../src/mongo/util/version/releases.yml). - `last-continuous` - the latest Rapid release version. In Evergreen, the version that was downloaded from the Rapid release branch project. It resolves to the entry in - `featureCompatibilityVersions` of [releases.yml](../../src/mongo/util/version/releases.yml) - that looks older than the output of `git describe`. Will not be tested against if it is listed in + `featureCompatibilityVersions` of [releases.yml](../../src/mongo/util/version/releases.yml) that + looks older than the output of `git describe`. Will not be tested against if it is listed in `eolVersions` as being end of life. -Note: The latest release.yml file from master is always used, even fetched remotely when on another branch. +Note: The latest release.yml file from master is always used, even fetched remotely when on another +branch. ### Old vs new Many multiversion tasks are running tests against `latest`/`last-lts` or `latest`/`last-continuous` -versions. In such context we refer to `last-lts` and `last-continuous` versions as the `old` -version and to `latest` as a `new` version. +versions. In such context we refer to `last-lts` and `last-continuous` versions as the `old` version +and to `latest` as a `new` version. A `new` version is compiled in the same way as for non-multiversion tasks. The `old` versions of compiled binaries are downloaded from the old branch projects with -[`db-contrib-tool`](https://github.com/10gen/db-contrib-tool). -`db-contrib-tool` searches for the latest available compiled binaries on the old branch projects in -Evergreen. +[`db-contrib-tool`](https://github.com/10gen/db-contrib-tool). `db-contrib-tool` searches for the +latest available compiled binaries on the old branch projects in Evergreen. ### Explicit and Implicit multiversion suites Multiversion suites can be explicit and implicit. -- Explicit - JS tests are aware of the binary versions they are running, - e.g. [multiversion.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/multiversion.yml). - The version of binaries is explicitly set in JS tests, - e.g. [jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js](https://github.com/mongodb/mongo/blob/397c8da541940b3fbe6257243f97a342fe7e0d3b/jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js#L33-L44): +- Explicit - JS tests are aware of the binary versions they are running, e.g. + [multiversion.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/multiversion.yml). + The version of binaries is explicitly set in JS tests, e.g. + [jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js](https://github.com/mongodb/mongo/blob/397c8da541940b3fbe6257243f97a342fe7e0d3b/jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js#L33-L44): ```js const versions = [ @@ -101,8 +101,8 @@ const versions = [ ]; ``` -- Implicit - JS tests know nothing about the binary versions they are running, - e.g. [retryable_writes_downgrade.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/retryable_writes_downgrade.yml). +- Implicit - JS tests know nothing about the binary versions they are running, e.g. + [retryable_writes_downgrade.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/retryable_writes_downgrade.yml). Most of the implicit multiversion suites are using matrix suites, e.g. `replica_sets_last_lts`: ```bash @@ -134,7 +134,8 @@ test_kind: js_test In implicit multiversion suites the version of binaries is defined on the resmoke fixture level. -The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L5-L8) +The +[example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L5-L8) of replica set fixture configuration override: ```yaml @@ -144,7 +145,8 @@ fixture: mixed_bin_versions: new_new_old ``` -The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L53-L57) +The +[example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L53-L57) of sharded cluster fixture configuration override: ```yaml @@ -155,7 +157,8 @@ fixture: mixed_bin_versions: new_old_old_new ``` -The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L139-L145) +The +[example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L139-L145) of shell fixture configuration override: ```yaml @@ -171,20 +174,25 @@ value: ### Version combinations In implicit multiversion suites the same set of tests may run in similar suites that are using -various mixed version combinations. Those version combinations depend on the type of resmoke -fixture the suite is running with. These are the recommended version combinations to test against based on the suite fixtures: +various mixed version combinations. Those version combinations depend on the type of resmoke fixture +the suite is running with. These are the recommended version combinations to test against based on +the suite fixtures: - Replica set fixture combinations: - `last-lts new-new-old` (i.e. suite runs the replica set fixture that spins up the `latest` and - the `last-lts` versions in a 3-node replica set where the 1st node is the `latest`, 2nd - `latest`, - 3rd - `last-lts`, etc.) + the `last-lts` versions in a 3-node replica set where the 1st node is the `latest`, 2nd - + `latest`, 3rd - `last-lts`, etc.) - `last-lts new-old-new` - `last-lts old-new-new` - `last-continuous new-new-old` - `last-continuous new-old-new` - `last-continuous old-new-new` - - Ex: [change_streams](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml) uses a [`ReplicaSetFixture`](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml#L50) so the corresponding multiversion suites are + - Ex: + [change_streams](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml) + uses a + [`ReplicaSetFixture`](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml#L50) + so the corresponding multiversion suites are - [`change_streams_last_continuous_new_new_old`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_new_old.yml) - [`change_streams_last_continuous_new_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_old_new.yml) - [`change_streams_last_continuous_old_new_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_old_new_new.yml) @@ -199,7 +207,11 @@ fixture the suite is running with. These are the recommended version combination replica sets per shard where the 1st node of the 1st shard is the `latest`, 2nd node of 1st shard - `last-lts`, 1st node of 2nd shard - `last-lts`, 2nd node of 2nd shard - `latest`, etc.) - `last-continuous new-old-old-new` - - Ex: [change_streams_downgrade](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml) uses a [`ShardedClusterFixture`](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml#L408) so the corresponding multiversion suites are + - Ex: + [change_streams_downgrade](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml) + uses a + [`ShardedClusterFixture`](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml#L408) + so the corresponding multiversion suites are - [`change_streams_downgrade_last_continuous_new_old_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_continuous_new_old_old_new.yml) - [`change_streams_downgrade_last_lts_new_old_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_lts_new_old_old_new.yml) @@ -207,18 +219,21 @@ fixture the suite is running with. These are the recommended version combination - `last-lts` (i.e. suite runs the shell fixture that spins up `last-lts` as the `old` versions, etc.) - `last-continuous` - - Ex: [initial_sync_fuzzer](https://github.com/mongodb/mongo/blob/908625ffdec050a71aa2ce47c35788739f629c60/buildscripts/resmokeconfig/suites/initial_sync_fuzzer.yml) uses a Shell Fixture, so the corresponding multiversion suites are + - Ex: + [initial_sync_fuzzer](https://github.com/mongodb/mongo/blob/908625ffdec050a71aa2ce47c35788739f629c60/buildscripts/resmokeconfig/suites/initial_sync_fuzzer.yml) + uses a Shell Fixture, so the corresponding multiversion suites are - [`initial_sync_fuzzer_last_lts`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_lts.yml) - [`initial_sync_fuzzer_last_continuous`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_continuous.yml) -If `last-lts` and `last-continuous` versions happen to be the same, or last-continuous is EOL, we skip `last-continuous` -and run multiversion suites with only `last-lts` combinations in Evergreen. +If `last-lts` and `last-continuous` versions happen to be the same, or last-continuous is EOL, we +skip `last-continuous` and run multiversion suites with only `last-lts` combinations in Evergreen. ## Working with multiversion tasks in Evergreen ### Multiversion task generation -Please refer to mongo-task-generator [documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#multiversion-testing) +Please refer to mongo-task-generator +[documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#multiversion-testing) for generating multiversion tasks in Evergreen. ### Exclude tests from multiversion testing @@ -240,20 +255,21 @@ multiversion where `XX` is the version number, e.g. `requires_fcv_70` stands for ``` Tests with `requires_fcv_XX` tags are excluded from multiversion tasks that may run the versions -below the specified FCV version, e.g. when the `latest` version is `6.2`, `last-continuous` is -`6.1` and `last-lts` is `6.0`, tests tagged with `requires_fcv_61` will NOT run in multiversion -tasks that run `latest` with `last-lts`, but will run in multiversion tasks that run `lastest` with +below the specified FCV version, e.g. when the `latest` version is `6.2`, `last-continuous` is `6.1` +and `last-lts` is `6.0`, tests tagged with `requires_fcv_61` will NOT run in multiversion tasks that +run `latest` with `last-lts`, but will run in multiversion tasks that run `lastest` with `last-continuous`. -In addition to disabling multiversion tests based on FCV, there is no need to run in-development `featureFlagXYZ` tests -(featureFlags that have `default: false`) because these tests will most likely fail on older versions that -have not implemented this feature. For multiversion tasks, we pass the `--runNoFeatureFlagTests` flag to avoid these -failures on `all feature flag` variants. +In addition to disabling multiversion tests based on FCV, there is no need to run in-development +`featureFlagXYZ` tests (featureFlags that have `default: false`) because these tests will most +likely fail on older versions that have not implemented this feature. For multiversion tasks, we +pass the `--runNoFeatureFlagTests` flag to avoid these failures on `all feature flag` variants. -For more info on FCV, take a look at [FCV_AND_FEATURE_FLAG_README.md](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/FCV_AND_FEATURE_FLAG_README.md). +For more info on FCV, take a look at +[FCV_AND_FEATURE_FLAG_README.md](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/FCV_AND_FEATURE_FLAG_README.md). -Another common case could be that the changes on master branch are breaking multiversion tests, -but with those changes backported to the older branches the multiversion tests should work. -In order to temporarily disable the test from running in multiversion it can be added to the +Another common case could be that the changes on master branch are breaking multiversion tests, but +with those changes backported to the older branches the multiversion tests should work. In order to +temporarily disable the test from running in multiversion it can be added to the [etc/backports_required_for_multiversion_tests.yml](https://github.com/mongodb/mongo/blob/fcdfe29cee066278b94ea2749456fc433cc398c6/etc/backports_required_for_multiversion_tests.yml#L1-L19). Please follow the instructions described in the file. diff --git a/docs/evergreen-testing/task_generation.md b/docs/evergreen-testing/task_generation.md index 568f6ce68c5..3770a1a99f3 100644 --- a/docs/evergreen-testing/task_generation.md +++ b/docs/evergreen-testing/task_generation.md @@ -7,21 +7,22 @@ evergreen command. Task generation allow us to do things like dynamically split a task into sub-tasks that can be run in parallel, or generate sub-tasks to run against different mongodb versions. -Task generation is typically done with the [mongo-task-generator](https://github.com/mongodb/mongo-task-generator) -tool. Refer to its [documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md) +Task generation is typically done with the +[mongo-task-generator](https://github.com/mongodb/mongo-task-generator) tool. Refer to its +[documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md) for details on how it works. ## Configuring a task to be generated -In order to generate a task, we typically create a placeholder task. By convention the name of -these tasks should end in "\_gen". Most of the time, generated tasks should inherit the +In order to generate a task, we typically create a placeholder task. By convention the name of these +tasks should end in "\_gen". Most of the time, generated tasks should inherit the [gen_task_template](https://github.com/mongodb/mongo/blob/31864e3866ce9cc54c08463019846ded2ad9e6e5/etc/evergreen_yml_components/definitions.yml#L99-L107) which configures the required dependencies. The placeholder tasks needs to have the "generate resmoke tasks" function as one of its `commands`. -This is how the `mongo-task-generator` knows that the task needs to be generated. You can also -add `vars` to the function call to configure how the task will generated. You can refer to -the [mongo-task-generator](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#use-cases) +This is how the `mongo-task-generator` knows that the task needs to be generated. You can also add +`vars` to the function call to configure how the task will generated. You can refer to the +[mongo-task-generator](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#use-cases) documentation for details on what options are available. Once a placeholder task in defined, you can reference it just like a normal task. @@ -40,15 +41,15 @@ Task generation is performed as a 2-step process. additional tasks in the future, they will exist to be run. This step will also hide all the placeholder tasks into a display task called `generator_tasks` - in each build variant. Once task generation is completed, the user should perform actions on - the generated tasks instead of the placeholder tasks, we encourage this by hiding the - placeholder tasks from view. + in each build variant. Once task generation is completed, the user should perform actions on the + generated tasks instead of the placeholder tasks, we encourage this by hiding the placeholder + tasks from view. 2. After the tasks have been generated, the placeholder tasks are free to run. The placeholder tasks - simply find the task generated for them and mark it activated. Since generated tasks are - created in the "inactive" state, this will activate any generated tasks whose placeholder task - runs. This enables users to select tasks to run on the initial task selection page even though - the tasks have not yet been generated. + simply find the task generated for them and mark it activated. Since generated tasks are created + in the "inactive" state, this will activate any generated tasks whose placeholder task runs. This + enables users to select tasks to run on the initial task selection page even though the tasks + have not yet been generated. **Note**: While this 2-step process allows a similar user experience to working with normal tasks, it does create a few UI quirks. For example, evergreen will hide "inactive" tasks in the UI, as a diff --git a/docs/evergreen-testing/task_timeouts.md b/docs/evergreen-testing/task_timeouts.md index 1d18d889452..2bd5dbdbfb0 100644 --- a/docs/evergreen-testing/task_timeouts.md +++ b/docs/evergreen-testing/task_timeouts.md @@ -2,10 +2,15 @@ ## Types of timeouts -There are two types of timeouts that [Evergreen supports](https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#timeoutupdate): +There are two types of timeouts that +[Evergreen supports](https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#timeoutupdate): -- **Exec Timeout**: The _exec timeout_ is the overall timeout for a task. Once the total runtime for a test exceeds this value, the timeout logic will be triggered. This value is specified by `exec_timeout_secs` in the Evergreen configuration. -- **Idle Timeout**: The _idle timeout_ is the amount of time Evergreen will wait for output to be generated before considering the task hung and triggering the timeout logic. This value is specified by `timeout_secs` in the Evergreen configuration. +- **Exec Timeout**: The _exec timeout_ is the overall timeout for a task. Once the total runtime for + a test exceeds this value, the timeout logic will be triggered. This value is specified by + `exec_timeout_secs` in the Evergreen configuration. +- **Idle Timeout**: The _idle timeout_ is the amount of time Evergreen will wait for output to be + generated before considering the task hung and triggering the timeout logic. This value is + specified by `timeout_secs` in the Evergreen configuration. **Note**: In most cases, the **exec timeout** is the more useful of the two timeouts. @@ -15,15 +20,27 @@ There are several ways to set the timeout for a task running in Evergreen. ### Specifying timeouts in the Evergreen YAML configuration -Timeouts can be specified directly in the `evergreen.yml` (and related) files, both for tasks and build variants. This approach is useful for setting default timeout values but is limited because different build variants often have varying runtime characteristics. This means it is not possible to set timeouts for a specific task running on a specific build variant using only this method. +Timeouts can be specified directly in the `evergreen.yml` (and related) files, both for tasks and +build variants. This approach is useful for setting default timeout values but is limited because +different build variants often have varying runtime characteristics. This means it is not possible +to set timeouts for a specific task running on a specific build variant using only this method. ### Overrides: [etc/evergreen_timeouts.yml](../../etc/evergreen_timeouts.yml) -The `etc/evergreen_timeouts.yml` file allows overriding timeouts for specific tasks on specific build variants. This workaround helps address the limitations of directly specifying timeouts in `evergreen.yml`. To use this method, the task must include the `determine task timeout` and `update task timeout expansions` functions at the beginning of its Evergreen definition. Many Resmoke tasks already incorporate these functions. +The `etc/evergreen_timeouts.yml` file allows overriding timeouts for specific tasks on specific +build variants. This workaround helps address the limitations of directly specifying timeouts in +`evergreen.yml`. To use this method, the task must include the `determine task timeout` and +`update task timeout expansions` functions at the beginning of its Evergreen definition. Many +Resmoke tasks already incorporate these functions. ### Resmoke tasks: [buildscripts/evergreen_task_timeout.py](../../buildscripts/evergreen_task_timeout.py) -This script reads the `etc/evergreen_timeouts.yml` file to calculate the appropriate timeout settings. Additionally, it checks historical test results for the task being run to determine if enough information is available to calculate timeouts based on past data. The script also supports more advanced methods of determining timeouts, such as applying aggressive timeout measures for tasks executed in the commit queue or on required build variants. In cases of conflict, the commit queue and required build variant limits take precedence over the previous two methods. +This script reads the `etc/evergreen_timeouts.yml` file to calculate the appropriate timeout +settings. Additionally, it checks historical test results for the task being run to determine if +enough information is available to calculate timeouts based on past data. The script also supports +more advanced methods of determining timeouts, such as applying aggressive timeout measures for +tasks executed in the commit queue or on required build variants. In cases of conflict, the commit +queue and required build variant limits take precedence over the previous two methods. The timeout that was calculated by the script can be retrieved from the logs: @@ -38,4 +55,8 @@ The timeout that was calculated by the script can be retrieved from the logs: ### Compile tasks: [evergreen/generate_override_timeout.py](../../evergreen/generate_override_timeout.py) -This script is used for compile tasks defined in files such as `etc/evergreen_yml_components/tasks/compile_tasks.yml` and `etc/evergreen_yml_components/tasks/compile_tasks_shared.yml`. The script reads the `etc/evergreen_timeouts.yml` file and calculates appropriate timeouts. The Evergreen function `override task timeout` then runs this script to update the timeouts accordingly. +This script is used for compile tasks defined in files such as +`etc/evergreen_yml_components/tasks/compile_tasks.yml` and +`etc/evergreen_yml_components/tasks/compile_tasks_shared.yml`. The script reads the +`etc/evergreen_timeouts.yml` file and calculates appropriate timeouts. The Evergreen function +`override task timeout` then runs this script to update the timeouts accordingly. diff --git a/docs/evergreen-testing/yaml_configuration/buildvariants.md b/docs/evergreen-testing/yaml_configuration/buildvariants.md index 68c4e80a8cc..7b672da14f9 100644 --- a/docs/evergreen-testing/yaml_configuration/buildvariants.md +++ b/docs/evergreen-testing/yaml_configuration/buildvariants.md @@ -1,37 +1,47 @@ # Build Variants -This document describes build variants (a.k.a. variants, or builds, or buildvariants) that are used in `mongodb-mongo-*` projects. -To know more about build variants, please refer to the [Build Variants](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#build-variants) section of the Evergreen wiki. +This document describes build variants (a.k.a. variants, or builds, or buildvariants) that are used +in `mongodb-mongo-*` projects. To know more about build variants, please refer to the +[Build Variants](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#build-variants) +section of the Evergreen wiki. ## YAML files structure -Build variant configuration files are in `etc/evergreen_yml_components/variants` directory. -They are merged into `etc/evergreen.yml` and `etc/evergreen_nightly.yml` with Evergreen's [include](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#include) feature. +Build variant configuration files are in `etc/evergreen_yml_components/variants` directory. They are +merged into `etc/evergreen.yml` and `etc/evergreen_nightly.yml` with Evergreen's +[include](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#include) +feature. -Inside `etc/evergreen_yml_components/variants` directory there are more directories, -which are in most cases platform names (e.g. amazon, rhel etc.) or build variant group names (e.g. sanitizer etc.). +Inside `etc/evergreen_yml_components/variants` directory there are more directories, which are in +most cases platform names (e.g. amazon, rhel etc.) or build variant group names (e.g. sanitizer +etc.). -Be aware that some of these files could be also used or re-used to be merged into `etc/system_perf.yml` which is used for `sys-perf` project. +Be aware that some of these files could be also used or re-used to be merged into +`etc/system_perf.yml` which is used for `sys-perf` project. ## Build Variants in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` -`mongodb-mongo-master` evergreen project uses `etc/evergreen.yml` and contains all build variants for development, including all feature-specific, patch build required, and suggested variants. +`mongodb-mongo-master` evergreen project uses `etc/evergreen.yml` and contains all build variants +for development, including all feature-specific, patch build required, and suggested variants. -`mongodb-mongo-master-nightly` evergreen project uses `etc/evergreen_nightly.yml` and contains build variants for public nightly builds. +`mongodb-mongo-master-nightly` evergreen project uses `etc/evergreen_nightly.yml` and contains build +variants for public nightly builds. ## Required and Suggested Build Variants -"Required" build variants are defined as any build variant with a `!` at the front of its display name in Evergreen. -These build variants also have `required` tag. +"Required" build variants are defined as any build variant with a `!` at the front of its display +name in Evergreen. These build variants also have `required` tag. [Required Patch Builds Policy](https://wiki.corp.mongodb.com/display/KERNEL/Required+Patch+Builds+Policy) -"Suggested" build variants are defined as any build variant with a `*` at the front of its display name in Evergreen. -These build variants also have `suggested` tag. +"Suggested" build variants are defined as any build variant with a `*` at the front of its display +name in Evergreen. These build variants also have `suggested` tag. ## Build Variants with forbid_tasks_tagged_with_experimental -Build variants with the `forbid_tasks_tagged_with_experimental` tag indicate that they do not allow tasks tagged as `experimental` to run. This tag is used in conjunction with the `forbid-tasks-with-tag-on-variants` evergreen lint rule to enforce this restriction. +Build variants with the `forbid_tasks_tagged_with_experimental` tag indicate that they do not allow +tasks tagged as `experimental` to run. This tag is used in conjunction with the +`forbid-tasks-with-tag-on-variants` evergreen lint rule to enforce this restriction. ## Build Variants after branching @@ -39,34 +49,48 @@ In each of platform or build variant group directory there can be these files: - `test_dev.yml` - - these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project on master branch - - after branching on all new branches these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project + - these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project + on master branch + - after branching on all new branches these files are merged into `etc/evergreen_nightly.yml` + which is used for a new branch `mongodb-mongo-vX.Y` project - `test_dev_master_and_lts_branches_only.yml` - - these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project on master branch - - after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project - - **important**: all tests that are running on these build variants will NOT run on a new Rapid release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects + - these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project + on master branch + - after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into + `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project + - **important**: all tests that are running on these build variants will NOT run on a new Rapid + release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects - `test_dev_master_branch_only.yml` - - these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project on master branch + - these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project + on master branch - after branching on all new branches these files are NOT used - - **important**: all tests that are running on these build variants will NOT run on a new branch `mongodb-mongo-vX.Y` project + - **important**: all tests that are running on these build variants will NOT run on a new branch + `mongodb-mongo-vX.Y` project - `test_release.yml` - - these files are merged into `etc/evergreen_nightly.yml` which is used for `mongodb-mongo-master-nightly` project on master branch - - after branching on all new branches these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project + - these files are merged into `etc/evergreen_nightly.yml` which is used for + `mongodb-mongo-master-nightly` project on master branch + - after branching on all new branches these files are merged into `etc/evergreen_nightly.yml` + which is used for a new branch `mongodb-mongo-vX.Y` project - `test_release_master_and_lts_branches_only.yml` - - these files are merged into `etc/evergreen_nightly.yml` which is used for `mongodb-mongo-master-nightly` project on master branch - - after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project - - **important**: all tests that are running on these build variants will NOT run on a new Rapid release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects + - these files are merged into `etc/evergreen_nightly.yml` which is used for + `mongodb-mongo-master-nightly` project on master branch + - after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into + `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project + - **important**: all tests that are running on these build variants will NOT run on a new Rapid + release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects - `test_release_master_branch_only.yml` - - these files are merged into `etc/evergreen_nightly.yml` which is used for `mongodb-mongo-master-nightly` project on master branch + - these files are merged into `etc/evergreen_nightly.yml` which is used for + `mongodb-mongo-master-nightly` project on master branch - after branching on all new branches these files are NOT used - - **important**: all tests that are running on these build variants will NOT run on a new branch `mongodb-mongo-vX.Y` project + - **important**: all tests that are running on these build variants will NOT run on a new branch + `mongodb-mongo-vX.Y` project diff --git a/docs/evergreen-testing/yaml_configuration/configuration.md b/docs/evergreen-testing/yaml_configuration/configuration.md index 708b3c5cfa6..36cd1c34aec 100644 --- a/docs/evergreen-testing/yaml_configuration/configuration.md +++ b/docs/evergreen-testing/yaml_configuration/configuration.md @@ -11,14 +11,14 @@ section of the Evergreen wiki. ### `mongodb-mongo-master` -The main project for testing MongoDB's dev environments with a number build variants, -each one corresponding to a particular compile or testing environment to support development. -Each build variant runs a set of tasks; each task ususally runs one or more tests. +The main project for testing MongoDB's dev environments with a number build variants, each one +corresponding to a particular compile or testing environment to support development. Each build +variant runs a set of tasks; each task ususally runs one or more tests. ### `mongodb-mongo-master-nightly` -Tracks the same branch as `mongodb-mongo-master`, each build variant corresponds to a -(version, OS, architecure) triplet for a supported MongoDB nightly release. +Tracks the same branch as `mongodb-mongo-master`, each build variant corresponds to a (version, OS, +architecure) triplet for a supported MongoDB nightly release. ### `sys_perf` @@ -28,22 +28,23 @@ The system performance project. The above Evergreen projects are defined in the following files: -- `etc/evergreen_yml_components/**.yml`. YAML files containing definitions for tasks, functions, buildvariants, etc. - They are copied from the existing evergreen.yml file. +- `etc/evergreen_yml_components/**.yml`. YAML files containing definitions for tasks, functions, + buildvariants, etc. They are copied from the existing evergreen.yml file. -- `etc/evergreen.yml`. Imports components from above and serves as the project config for mongodb-mongo-master, - containing all build variants for development, including all feature-specific, patch build required, and suggested - variants. +- `etc/evergreen.yml`. Imports components from above and serves as the project config for + mongodb-mongo-master, containing all build variants for development, including all + feature-specific, patch build required, and suggested variants. -- `etc/evergreen_nightly.yml`. The project configuration for mongodb-mongo-master-nightly, containing only build - variants for public nightly builds, imports similar components as evergreen.yml to ensure consistency. +- `etc/evergreen_nightly.yml`. The project configuration for mongodb-mongo-master-nightly, + containing only build variants for public nightly builds, imports similar components as + evergreen.yml to ensure consistency. - `etc/sys_perf.yml`. Configuration file for the system performance project. ## Release Branching Process -Only the `mongodb-mongo-master-nightly` project will be branched with required and other -necessary variants (e.g. sanitizers) added back in. Most variants in `mongodb-mongo-master` -would be dropped by default but can be re-introduced to the release branches manually on an -as-needed basis. For Rapid releases, all but the variants relevant to Atlas in -`mongodb-mongo-master-nightly` may be dropped as well. +Only the `mongodb-mongo-master-nightly` project will be branched with required and other necessary +variants (e.g. sanitizers) added back in. Most variants in `mongodb-mongo-master` would be dropped +by default but can be re-introduced to the release branches manually on an as-needed basis. For +Rapid releases, all but the variants relevant to Atlas in `mongodb-mongo-master-nightly` may be +dropped as well. diff --git a/docs/evergreen-testing/yaml_configuration/task_ownership_tags.md b/docs/evergreen-testing/yaml_configuration/task_ownership_tags.md index 6662413784d..0a529d73052 100644 --- a/docs/evergreen-testing/yaml_configuration/task_ownership_tags.md +++ b/docs/evergreen-testing/yaml_configuration/task_ownership_tags.md @@ -1,11 +1,15 @@ # Task ownership tags -This document describes task ownership tags that are used in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects. +This document describes task ownership tags that are used in `mongodb-mongo-master` and +`mongodb-mongo-master-nightly` projects. -Every task in in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects should be tag with exactly one `assigned_to_jira_team_.+` tag. -Team names (the part after `assigned_to_jira_team_`) should match `evergreen_tag_name` from team configurations in [mothra](https://github.com/10gen/mothra/tree/main/mothra/teams). +Every task in in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects should be tag +with exactly one `assigned_to_jira_team_.+` tag. Team names (the part after +`assigned_to_jira_team_`) should match `evergreen_tag_name` from team configurations in +[mothra](https://github.com/10gen/mothra/tree/main/mothra/teams). -This is enforced by linter. YAML linter configuration could be found [here](../../../etc/evergreen_lint.yml). +This is enforced by linter. YAML linter configuration could be found +[here](../../../etc/evergreen_lint.yml). If the linter configuration is missing your team: @@ -13,4 +17,7 @@ If the linter configuration is missing your team: 2. Make sure that your team configuration in mothra has `evergreen_tag_name` 3. Update the tag list with `assigned_to_jira_team_{evergreen_tag_name}` tag for your team -Dynamically generated tasks for resmoke suites (i.e. the ones named like `//buildscripts/resmokeconfig:core`) will set the ownership tag based on a best effort lookup from the codeowner of the test's definition to a team name from mothra, picking the first encountered in case of multiple possible assignments. +Dynamically generated tasks for resmoke suites (i.e. the ones named like +`//buildscripts/resmokeconfig:core`) will set the ownership tag based on a best effort lookup from +the codeowner of the test's definition to a team name from mothra, picking the first encountered in +case of multiple possible assignments. diff --git a/docs/evergreen-testing/yaml_configuration/task_selection_tags.md b/docs/evergreen-testing/yaml_configuration/task_selection_tags.md index f202a026b67..85097a3f64c 100644 --- a/docs/evergreen-testing/yaml_configuration/task_selection_tags.md +++ b/docs/evergreen-testing/yaml_configuration/task_selection_tags.md @@ -1,49 +1,58 @@ # Task selection tags -This document describes task selection tags that are used in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects. -To know more about task tags, please refer to the [Task and Variant Tags](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#task-and-variant-tags) section of the Evergreen wiki. +This document describes task selection tags that are used in `mongodb-mongo-master` and +`mongodb-mongo-master-nightly` projects. To know more about task tags, please refer to the +[Task and Variant Tags](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#task-and-variant-tags) +section of the Evergreen wiki. -The majority of variants in `mongodb-mongo-master-nightly` project and the most significat variants in `mongodb-mongo-master` project are using required and optional groups of task selection tags. -In order to add tasks to those variants, please use them as described in the following sections. +The majority of variants in `mongodb-mongo-master-nightly` project and the most significat variants +in `mongodb-mongo-master` project are using required and optional groups of task selection tags. In +order to add tasks to those variants, please use them as described in the following sections. ## Required task selection tags -Every task in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` project must be tagged with exactly one required selection tag. -This is enforced by linter. YAML linter configuration could be found [here](../../../etc/evergreen_lint.yml). +Every task in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` project must be tagged with +exactly one required selection tag. This is enforced by linter. YAML linter configuration could be +found [here](../../../etc/evergreen_lint.yml). -- `development_critical` - these tasks should be green prior to the merge and will block merging if failing, e.g. jsCore. - We run these tasks on all variants and in the commit-queue. +- `development_critical` - these tasks should be green prior to the merge and will block merging if + failing, e.g. jsCore. We run these tasks on all variants and in the commit-queue. -- `development_critical_single_variant` - the same as `development_critical` but these tasks do not require to run on multiple variants, e.g. clang-tidy, formatters, linters etc. - We run these tasks on the required variant and in the commit-queue. +- `development_critical_single_variant` - the same as `development_critical` but these tasks do not + require to run on multiple variants, e.g. clang-tidy, formatters, linters etc. We run these tasks + on the required variant and in the commit-queue. -- `no_commit_queue` - add this to tasks in development_critical that you do not want in the commit-queue +- `no_commit_queue` - add this to tasks in development_critical that you do not want in the + commit-queue -- `release_critical` - these tasks should be green prior to the release. - We run these tasks on all release and development (required and suggested) variants. - It should be uncommon to add tasks to this tag but if your task needs to run on many different OSes and it is extremely broad in coverage then you can add it to this tag. +- `release_critical` - these tasks should be green prior to the release. We run these tasks on all + release and development (required and suggested) variants. It should be uncommon to add tasks to + this tag but if your task needs to run on many different OSes and it is extremely broad in + coverage then you can add it to this tag. -- `default` - these tasks are running as part of a required patch build. - We run these tasks on the most significant development variants (required patches, tsan, aubsan, etc.). - Use this tag if you are not sure which tag to use for your new task. +- `default` - these tasks are running as part of a required patch build. We run these tasks on the + most significant development variants (required patches, tsan, aubsan, etc.). Use this tag if you + are not sure which tag to use for your new task. -- `non_deterministic` - these tasks depend significantly on randomization and we expect to see some unique failures, e.g. fuzzers etc. - We run these tasks on non-required development variants. +- `non_deterministic` - these tasks depend significantly on randomization and we expect to see some + unique failures, e.g. fuzzers etc. We run these tasks on non-required development variants. -- `experimental` - these tasks are not running anywhere regularly. - We do not use this tag for selecting tasks to run on variants. - This tag could be used for tasks that you would like to run on your own custom variants. +- `experimental` - these tasks are not running anywhere regularly. We do not use this tag for + selecting tasks to run on variants. This tag could be used for tasks that you would like to run on + your own custom variants. -- `auxiliary` - these are various setup, helper, etc. tasks and should be mostly owned by infrastructure team. - You should almost never use this tag. - Please reach out to [#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N) before adding tasks with this tag. +- `auxiliary` - these are various setup, helper, etc. tasks and should be mostly owned by + infrastructure team. You should almost never use this tag. Please reach out to + [#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N) before adding tasks + with this tag. -**Important**: Do not change anything in this list without talking to [#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N). +**Important**: Do not change anything in this list without talking to +[#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N). ## Optional task selection tags -In addition to the required task selection tags there is a list of optional selection tags. -Every task could be tagged with any number of the following tags: +In addition to the required task selection tags there is a list of optional selection tags. Every +task could be tagged with any number of the following tags: - `incompatible_community` - the task should be excluded from the community variants. - `incompatible_windows` - the task should be excluded from Windows variants. @@ -55,16 +64,20 @@ Every task could be tagged with any number of the following tags: - `incompatible_aubsan` - the task should be excluded from {A,UB}SAN variants. - `incompatible_tsan` - the task should be excluded from TSAN variants. - `incompatible_debug_mode` - the task should be excluded from Debug Mode variants. -- `incompatible_system_allocator` - the task should be excluded from variants that use the system allocator. +- `incompatible_system_allocator` - the task should be excluded from variants that use the system + allocator. - `incompatible_all_feature_flags` - the task should be excluded from all-feature-flags variants. - `incompatible_development_variant` - the task should be excluded from the development variants. - `incompatible_oscrypto` - the task should be excluded from variants unsupported by oscrypto. -- `requires_compile_variant` - the task can (or should) only run on variants that has compile releated expansions. +- `requires_compile_variant` - the task can (or should) only run on variants that has compile + releated expansions. - `requires_large_host` - the task requires a large host to run. - `requires_large_host_aubsan` - the task requires a large host to run on {A,UB}SAN variants. - `requires_large_host_tsan` - the task requires a large host to run on TSAN variants. - `requires_large_host_debug_mode` - the task requires a large host to run on Debug Mode variants. - `requires_large_host_commit_queue` - the task requires a large host to run on in the commit-queue. -- `requires_all_feature_flags` - the task can only run on variants that has all-feature-flags configuration. -- `requires_execution_on_windows_patch_build` - the task should be run on the required Windows build variant on each patch - build. See [SERVER-79037](https://jira.mongodb.org/browse/SERVER-79037) for how this was calculated. +- `requires_all_feature_flags` - the task can only run on variants that has all-feature-flags + configuration. +- `requires_execution_on_windows_patch_build` - the task should be run on the required Windows build + variant on each patch build. See [SERVER-79037](https://jira.mongodb.org/browse/SERVER-79037) for + how this was calculated. diff --git a/docs/exception_architecture.md b/docs/exception_architecture.md index ccd8a8ded06..ef8b35abf39 100644 --- a/docs/exception_architecture.md +++ b/docs/exception_architecture.md @@ -5,16 +5,16 @@ MongoDB code uses the following types of assertions that are available for use: - `uassert` and `iassert` - Checks for per-operation user errors. Operation-fatal. - `tassert` - - Like uassert in that it checks for per-operation user errors, but inhibits clean shutdown - in tests. Operation-fatal, but process-fatal in testing environments during shutdown. + - Like uassert in that it checks for per-operation user errors, but inhibits clean shutdown in + tests. Operation-fatal, but process-fatal in testing environments during shutdown. - `massert` - Checks per-operation invariants. Operation-fatal. - `fassert` - - Checks fatal process invariants. Process-fatal. Use to detect unexpected situations (such - as a system function returning an unexpected error status). + - Checks fatal process invariants. Process-fatal. Use to detect unexpected situations (such as a + system function returning an unexpected error status). - `invariant` - - Checks process invariant. Process-fatal. Use to detect code logic errors ("pointer should - never be null", "we should always be locked"). + - Checks process invariant. Process-fatal. Use to detect code logic errors ("pointer should never + be null", "we should always be locked"). **Note**: Calling C function `assert` is not allowed. Use one of the above instead. @@ -50,8 +50,8 @@ Some assertions will increment an assertion counter. The `serverStatus` command - `tripwire` - Incremented by `tassert`. - `rollovers` - - When any counter reaches a value of `1 << 30`, all of the counters are reset and - the "rollovers" counter is incremented. + - When any counter reaches a value of `1 << 30`, all of the counters are reset and the "rollovers" + counter is incremented. ## Considerations @@ -61,52 +61,53 @@ terminate the current operation, not the whole process. Be careful not to corrup mistakenly using these assertions midway through mutating process state. `fassert` failures will terminate the entire process; this is used for low-level checks where -continuing might lead to corrupt data or loss of data on disk. Additionally, `fassert` will log -a generic assertion message with fatal severity and add a breakpoint before terminating. +continuing might lead to corrupt data or loss of data on disk. Additionally, `fassert` will log a +generic assertion message with fatal severity and add a breakpoint before terminating. -To log a custom assertion message and terminate the server, use `LOGV2_FATAL`. -To avoid printing a stacktrace on failure use `fassertNoTrace` or `LOGV2_FATAL_NO_TRACE`. -Consider using them if there is only one way to reach this fatal point in code. +To log a custom assertion message and terminate the server, use `LOGV2_FATAL`. To avoid printing a +stacktrace on failure use `fassertNoTrace` or `LOGV2_FATAL_NO_TRACE`. Consider using them if there +is only one way to reach this fatal point in code. `tassert` will fail the operation like `uassert`, but also triggers a "deferred-fatality tripwire -flag". In testing environments, if the tripwire flag is set during shutdown, the process will -invoke the tripwire fatal assertion. In non-testing environments, there will only be a warning -during shutdown that tripwire assertions have failed. +flag". In testing environments, if the tripwire flag is set during shutdown, the process will invoke +the tripwire fatal assertion. In non-testing environments, there will only be a warning during +shutdown that tripwire assertions have failed. `tassert` presents more diagnostics than `uassert`. `tassert` will log the assertion as an error, log scoped debug info (for more info, see ScopedDebugInfoStack defined in -[mongo/util/assert_util.h][assert_util_h]), print the stack trace, and add a breakpoint. -The purpose of `tassert` is to ensure that operation failures will cause a test suite to fail -without resorting to different behavior during testing. `tassert` should only be used to check -for unexpected values produced by defined behavior. +[mongo/util/assert_util.h][assert_util_h]), print the stack trace, and add a breakpoint. The purpose +of `tassert` is to ensure that operation failures will cause a test suite to fail without resorting +to different behavior during testing. `tassert` should only be used to check for unexpected values +produced by defined behavior. Both `massert` and `uassert` take error codes, so that all assertions have codes associated with -them. Currently, programmers are free to provide the error code by either [using a unique location -number](#choosing-a-unique-location-number) or choosing a named code from `ErrorCodes`. Unique location -numbers have no meaning other than a way to associate a log message with a line of code. +them. Currently, programmers are free to provide the error code by either +[using a unique location number](#choosing-a-unique-location-number) or choosing a named code from +`ErrorCodes`. Unique location numbers have no meaning other than a way to associate a log message +with a line of code. `massert` will log the assertion message as an error, while `uassert` will log the message with debug level of 1 (for more info about log debug level, see [docs/logging.md][logging_md]). -`iassert` provides similar functionality to `uassert`, but it logs at a debug level of 3 and -does not increment user assertion counters. We should always choose `iassert` over `uassert` -when we expect a failure, a failure might be recoverable, or failure accounting is not interesting. +`iassert` provides similar functionality to `uassert`, but it logs at a debug level of 3 and does +not increment user assertion counters. We should always choose `iassert` over `uassert` when we +expect a failure, a failure might be recoverable, or failure accounting is not interesting. ### Choosing a unique location number -The current convention for choosing a unique location number is to use the 5 or 6 digit SERVER ticket number -for the ticket being addressed when the assertion is added, followed by a two digit counter to distinguish -between codes added as part of the same ticket. For example, if you're working on SERVER-12345, the first -error code would be 1234500, the second would be 1234501, etc. This convention can also be used for LOGV2 -logging id numbers. +The current convention for choosing a unique location number is to use the 5 or 6 digit SERVER +ticket number for the ticket being addressed when the assertion is added, followed by a two digit +counter to distinguish between codes added as part of the same ticket. For example, if you're +working on SERVER-12345, the first error code would be 1234500, the second would be 1234501, etc. +This convention can also be used for LOGV2 logging id numbers. -The only real constraint for unique location numbers is that they must be unique across the codebase. This is -verified at compile time with a [python script][errorcodes_py]. +The only real constraint for unique location numbers is that they must be unique across the +codebase. This is verified at compile time with a [python script][errorcodes_py]. ## Exception -A failed operation-fatal assertion throws an `AssertionException` or a child of that. -The inheritance hierarchy resembles: +A failed operation-fatal assertion throws an `AssertionException` or a child of that. The +inheritance hierarchy resembles: - `std::exception` - `mongo::DBException` @@ -123,14 +124,14 @@ upwards harmlessly. The code should also expect, and properly handle, `UserExcep ## ErrorCodes and Status -MongoDB uses `ErrorCodes` both internally and externally: a subset of error codes (e.g., -`BadValue`) are used externally to pass errors over the wire and to clients. These error codes are -the means for MongoDB processes (e.g., _mongod_ and _mongo_) to communicate errors, and are visible -to client applications. Other error codes are used internally to indicate the underlying reason for -a failed operation. For instance, `PeriodicJobIsStopped` is an internal error code that is passed -to callback functions running inside a [`PeriodicRunner`][periodic_runner_h] once the runner is -stopped. The internal error codes are for internal use only and must never be returned to clients -(i.e., in a network response). +MongoDB uses `ErrorCodes` both internally and externally: a subset of error codes (e.g., `BadValue`) +are used externally to pass errors over the wire and to clients. These error codes are the means for +MongoDB processes (e.g., _mongod_ and _mongo_) to communicate errors, and are visible to client +applications. Other error codes are used internally to indicate the underlying reason for a failed +operation. For instance, `PeriodicJobIsStopped` is an internal error code that is passed to callback +functions running inside a [`PeriodicRunner`][periodic_runner_h] once the runner is stopped. The +internal error codes are for internal use only and must never be returned to clients (i.e., in a +network response). Zero or more error categories can be assigned to `ErrorCodes`, which allows a single handler to serve a group of `ErrorCodes`. `RetriableError`, for instance, is an `ErrorCategory` that includes @@ -140,10 +141,10 @@ operation that fails with any error code in this category can be safely retried. we can use `ErrorCodes::is${category}(${error})` to check error categories. Both methods provide similar functionality. -To represent the status of an executed operation (e.g., a command or a function invocation), we -use `Status` objects, which represent an error state or the absence thereof. A `Status` uses the -standardized `ErrorCodes` to determine the underlying cause of an error. It also allows assigning -a textual description, as well as code-specific extra info, to the error code for further +To represent the status of an executed operation (e.g., a command or a function invocation), we use +`Status` objects, which represent an error state or the absence thereof. A `Status` uses the +standardized `ErrorCodes` to determine the underlying cause of an error. It also allows assigning a +textual description, as well as code-specific extra info, to the error code for further clarification. The extra info is a subclass of `ErrorExtraInfo` and specific to `ErrorCodes`. Look for `extra` in [here][error_codes_yml] for reference. @@ -153,28 +154,26 @@ functions with multiple out parameters. We can either pass an error code or an a `StatusWith` object, indicating failure or success of the operation. For examples of the proper usage of `StatusWith`, see [mongo/base/status_with.h][status_with_h] and [mongo/base/status_with_test.cpp][status_with_test_cpp]. It is highly recommended to use `uassert` -or `iassert` over `StatusWith`, and catch exceptions instead of checking `Status` objects -returned from functions. Using `StatusWith` to indicate exceptions, instead of throwing via -`uassert` and `iassert`, makes it very difficult to identify that an error has occurred, and -could lead to the wrong error being propagated. +or `iassert` over `StatusWith`, and catch exceptions instead of checking `Status` objects returned +from functions. Using `StatusWith` to indicate exceptions, instead of throwing via `uassert` and +`iassert`, makes it very difficult to identify that an error has occurred, and could lead to the +wrong error being propagated. ## Using noexcept -Server code should generally be written to be exception safe. Historically, -we've had bugs due to code being overzealously marked `noexcept`. In such -contexts, throwing an exception crashes the server, which can compromise -availability. However, _just_ removing `noexcept` from such code is not a viable -solution \- exception unsafe code may _need_ to crash in order to avoid causing -an even worse failure. We want to work towards ensuring that functions that -ought to be are in fact exception safe, and remove `noexcept` usage where it's -not warranted. Here, we outline guidelines for doing so. +Server code should generally be written to be exception safe. Historically, we've had bugs due to +code being overzealously marked `noexcept`. In such contexts, throwing an exception crashes the +server, which can compromise availability. However, _just_ removing `noexcept` from such code is not +a viable solution \- exception unsafe code may _need_ to crash in order to avoid causing an even +worse failure. We want to work towards ensuring that functions that ought to be are in fact +exception safe, and remove `noexcept` usage where it's not warranted. Here, we outline guidelines +for doing so. -Noexcept is a runtime check that terminates the process rather than allowing -the function to exit because of a throw. Noexcept may be used when it can be -thought of as a bug for any uncaught exception to be thrown. There is no -compile-time check that exceptions will not be thrown within a `noexcept` -function. Instead, putting `noexcept` on a function may be thought of as similar -to using invariant in the following way: +Noexcept is a runtime check that terminates the process rather than allowing the function to exit +because of a throw. Noexcept may be used when it can be thought of as a bug for any uncaught +exception to be thrown. There is no compile-time check that exceptions will not be thrown within a +`noexcept` function. Instead, putting `noexcept` on a function may be thought of as similar to using +invariant in the following way: ```c // Example noexcept code. @@ -190,92 +189,80 @@ void func() try { } ``` -**As with invariant, be very careful when putting `noexcept` on a function that -interacts with untrusted input.** This has been the root cause of serious past -bugs. +**As with invariant, be very careful when putting `noexcept` on a function that interacts with +untrusted input.** This has been the root cause of serious past bugs. ### Adding or Removing noexcept -When considering removing `noexcept` from a function, the author of that change -must ensure that the function’s implementation and its callsites are not -relying on the function not throwing for correctness. Because of this, **be -careful putting `noexcept` on a function** if there’s a chance it may need to be -removed later. `noexcept` generally **should not be used** solely for reasons of -performance optimization. Aside from the cases listed in the next section, it -should not be assumed to improve performance without solid evidence. +When considering removing `noexcept` from a function, the author of that change must ensure that the +function’s implementation and its callsites are not relying on the function not throwing for +correctness. Because of this, **be careful putting `noexcept` on a function** if there’s a chance it +may need to be removed later. `noexcept` generally **should not be used** solely for reasons of +performance optimization. Aside from the cases listed in the next section, it should not be assumed +to improve performance without solid evidence. -If a part of the implementation would benefit from relying on not throwing, but -`noexcept` is not meant to be a part of the function’s contract, it is acceptable -to use a try/catch/invariant construction similar to the example above or an -internal `noexcept` helper function. +If a part of the implementation would benefit from relying on not throwing, but `noexcept` is not +meant to be a part of the function’s contract, it is acceptable to use a try/catch/invariant +construction similar to the example above or an internal `noexcept` helper function. -When adding or removing `noexcept`, also consider what types of exceptions are -possible in that context and in our codebase. Refer to the “Where Exceptions -are Possible” section for more details. +When adding or removing `noexcept`, also consider what types of exceptions are possible in that +context and in our codebase. Refer to the “Where Exceptions are Possible” section for more details. -If you are uncertain about adding or removing `noexcept` in a given situation, -reach out to \#server-programmability on slack. +If you are uncertain about adding or removing `noexcept` in a given situation, reach out to +\#server-programmability on slack. ### Cases Where noexcept is Encouraged -This list is not exhaustive and there are cases not enumerated here that are -valid uses of `noexcept`. +This list is not exhaustive and there are cases not enumerated here that are valid uses of +`noexcept`. #### Move operations -Using `noexcept` with move operations allows operations to skip generating -exception handling code. If a type’s move operation will not throw exceptions, -it is strictly worse not to use `noexcept`. For instance, std::vector\ can -use optimized versions of certain operations when T has `noexcept` move -operations. In these cases, **`noexcept` can be considered a requirement**. Of -course, if a move operation genuinely needs to throw exceptions, then don’t -mark it `noexcept`. This should be very rare – moves should be non-throwing in -almost all cases. +Using `noexcept` with move operations allows operations to skip generating exception handling code. +If a type’s move operation will not throw exceptions, it is strictly worse not to use `noexcept`. +For instance, std::vector\ can use optimized versions of certain operations when T has +`noexcept` move operations. In these cases, **`noexcept` can be considered a requirement**. Of +course, if a move operation genuinely needs to throw exceptions, then don’t mark it `noexcept`. This +should be very rare – moves should be non-throwing in almost all cases. #### Swap operations -Allows callers to optimize for an exception-free pathway. **Swap operations -should follow the same `noexcept` guidelines as move operations**. +Allows callers to optimize for an exception-free pathway. **Swap operations should follow the same +`noexcept` guidelines as move operations**. #### Hash functions -Allows some hashing library types to optimize for an exception-free pathway. -This can even affect the behavior, performance, and even layout of certain -container types (such as libstdc++’s -[unordered_map](https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html)). -**Hash functions should follow the `noexcept` guidelines as move operations.** +Allows some hashing library types to optimize for an exception-free pathway. This can even affect +the behavior, performance, and even layout of certain container types (such as libstdc++’s +[unordered_map](https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html)). **Hash +functions should follow the `noexcept` guidelines as move operations.** #### Destructors and “Destructor-Safe” Functions -Destructors are generally implicitly `noexcept`, and are encouraged to remain -implicitly `noexcept` \- that is, by not marking them with `noexcept(false)`. -Functions where “destructor safety” is a core part of their functionality **may -be marked `noexcept`**. This is not a requirement – destructors are allowed to -call potentially-throwing functions. It is also not a blanket recommendation to -consider `noexcept` for all functions called from destructors. When calling a -potentially-throwing function from a destructor, think about whether or not it -can indeed throw in that context, and if exceptions need to be handled. If it -can indeed throw in that context, exceptions almost certainly need to be -handled \- otherwise the server will crash. +Destructors are generally implicitly `noexcept`, and are encouraged to remain implicitly `noexcept` +\- that is, by not marking them with `noexcept(false)`. Functions where “destructor safety” is a +core part of their functionality **may be marked `noexcept`**. This is not a requirement – +destructors are allowed to call potentially-throwing functions. It is also not a blanket +recommendation to consider `noexcept` for all functions called from destructors. When calling a +potentially-throwing function from a destructor, think about whether or not it can indeed throw in +that context, and if exceptions need to be handled. If it can indeed throw in that context, +exceptions almost certainly need to be handled \- otherwise the server will crash. -The lambda passed to `ON_BLOCK_EXIT()` and `ScopeGuard()` should be treated -similarly to destructors: it is executed in a `noexcept` context (a destructor) -and marking it as such is discouraged as being noisy. But code intended to be -called from them can be. +The lambda passed to `ON_BLOCK_EXIT()` and `ScopeGuard()` should be treated similarly to +destructors: it is executed in a `noexcept` context (a destructor) and marking it as such is +discouraged as being noisy. But code intended to be called from them can be. ### Where Exceptions are Possible -In our codebase, generally DBException is the only type of exception that -should be crossing API boundaries. If an exception other than a DBException -does cross an API boundary, it should be considered a bug. Whichever component -throws the exception should handle it locally, even if only by translating it -to a DBException. Generally any caller you would consider to be an external -caller should be able to rely on DBException being the only exception type your -function will throw. +In our codebase, generally DBException is the only type of exception that should be crossing API +boundaries. If an exception other than a DBException does cross an API boundary, it should be +considered a bug. Whichever component throws the exception should handle it locally, even if only by +translating it to a DBException. Generally any caller you would consider to be an external caller +should be able to rely on DBException being the only exception type your function will throw. -Allocations using the global new allocator or std::allocator in our codebase do -not throw, instead terminating the process directly when OOM conditions are -encountered. As such, there is no need to handle exceptions from these sources. +Allocations using the global new allocator or std::allocator in our codebase do not throw, instead +terminating the process directly when OOM conditions are encountered. As such, there is no need to +handle exceptions from these sources. ## Gotchas @@ -284,10 +271,10 @@ Gotchas to watch out for: - Generally, do not throw an `AssertionException` directly. Functions like `uasserted()` do work beyond just that. In particular, it makes sure that the `getLastError` structures are set up properly. -- Think about the location of your asserts in constructors, as the destructor would not be - called. But at a minimum, use `wassert` a lot therein, we want to know if something is wrong. -- Do **not** throw in destructors or allow exceptions to leak out (if you call a function that - may throw). +- Think about the location of your asserts in constructors, as the destructor would not be called. + But at a minimum, use `wassert` a lot therein, we want to know if something is wrong. +- Do **not** throw in destructors or allow exceptions to leak out (if you call a function that may + throw). [raii]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization [error_codes_yml]: ../src/mongo/base/error_codes.yml diff --git a/docs/fail_points.md b/docs/fail_points.md index 6226167901f..47327f6dabc 100644 --- a/docs/fail_points.md +++ b/docs/fail_points.md @@ -6,18 +6,17 @@ branches, enhance diagnostics, or achieve any number of other aims. Fail points configured, and disabled via command request to a remote process or via an API within the same process. -For more on what test-only means and how to enable the `configureFailPoint` command, see [test_commands][test_only]. +For more on what test-only means and how to enable the `configureFailPoint` command, see +[test_commands][test_only]. ## Using Fail Points -A fail point must first be defined using `MONGO_FAIL_POINT_DEFINE(myFailPoint)`. This statement -adds the fail point to a registry and allows it to be evaluated in code. There are three common -patterns for evaluating a fail point: +A fail point must first be defined using `MONGO_FAIL_POINT_DEFINE(myFailPoint)`. This statement adds +the fail point to a registry and allows it to be evaluated in code. There are three common patterns +for evaluating a fail point: -- Exercise a rarely used branch: - `if (whenPigsFly || myFailPoint.shouldFail()) { ... }` -- Block until the fail point is unset: - `myFailPoint.pauseWhileSet();` +- Exercise a rarely used branch: `if (whenPigsFly || myFailPoint.shouldFail()) { ... }` +- Block until the fail point is unset: `myFailPoint.pauseWhileSet();` - Use the fail point's payload to perform custom behavior: `myFailPoint.execute([](const BSONObj& data) { useMyPayload(data); };` @@ -30,9 +29,9 @@ Fail point configuration involves choosing a "mode" for activation (e.g., "alway providing additional data in the form of a BSON object. For the vast majority of cases, this is done by issuing a `configureFailPoint` command request. This is made easier in JavaScript using the `configureFailPoint` helper from [fail_point_util.js][fail_point_util]. Fail points can also be -useful in C++ unit tests and integration tests. To configure fail points on the local process, use -a `FailPointEnableBlock` to enable and configure the fail point for a given block scope. Finally, -a fail point can also be set via setParameter by its name prefixed with "failpoint." (e.g., +useful in C++ unit tests and integration tests. To configure fail points on the local process, use a +`FailPointEnableBlock` to enable and configure the fail point for a given block scope. Finally, a +fail point can also be set via setParameter by its name prefixed with "failpoint." (e.g., "failpoint.myFailPoint"). Users can also wait until a fail point has been evaluated a certain number of times **_over its @@ -50,8 +49,8 @@ command implementations, see [here][fail_point_commands]. The `failCommand` fail point is a special fail point used to mock arbitrary response behaviors to requests filtered by command, appName, etc. It is most often used to simulate specific conditions -between nodes like invalid replica set configurations. For examples of use, see the -[failCommand JavaScript tests][fail_command_javascript_test]. +between nodes like invalid replica set configurations. For examples of use, see the [failCommand +JavaScript tests][fail_command_javascript_test]. [fail_point]: ../src/mongo/util/fail_point.h [fail_point_test]: ../src/mongo/util/fail_point_test.cpp diff --git a/docs/futures_and_promises.md b/docs/futures_and_promises.md index 98e55abfb93..01deb830176 100644 --- a/docs/futures_and_promises.md +++ b/docs/futures_and_promises.md @@ -68,11 +68,11 @@ Future call(Message& toSend) { First, notice that our calls to `TransportSession::sourceMessage` and `TransportSession::sinkMessage` have been replaced with calls to asynchronous versions of those functions. These asynchronous versions are future-returning; they don't block, but also don't return -a result right away. Instead, they return a future that we can chain continuations onto; `then, -onError` and `onCompletion` are all member functions of `Future` that take a callable as argument -and invoke that callable when the chained-to future is ready. Unsurprisingly, continuations chained -with `.then` are run when the future is readied successfully with a `T`, and therefore callables -chained with `.then` should take a `T` as argument. Mirroring this behavior, `.onError` +a result right away. Instead, they return a future that we can chain continuations onto; +`then, onError` and `onCompletion` are all member functions of `Future` that take a callable as +argument and invoke that callable when the chained-to future is ready. Unsurprisingly, continuations +chained with `.then` are run when the future is readied successfully with a `T`, and therefore +callables chained with `.then` should take a `T` as argument. Mirroring this behavior, `.onError` continuations are run only when the future is readied with an error, and continuations chained this way take a `Status` as argument which they can inspect to discover the error explaining why a `T` could not be delivered. Continuations chained with `.onCompletion` are run when the future resolves, @@ -107,18 +107,17 @@ associated Futures exactly one time, and must do so before being destroyed (othe will be set with the `ErrorCodes::BrokenPromise` error, which is considered a programmer error and may crash debug builds of the server in the future). -To create a `Promise` that has a Future, you may use the [`PromiseAndFuture`][pf] -utility type. Upon construction, it contains a created `Promise` and its -corresponding `Future`. The perhaps-familiar `makePromiseFuture` factory -function now simply returns `PromiseAndFuture{}`. +To create a `Promise` that has a Future, you may use the [`PromiseAndFuture`][pf] utility type. +Upon construction, it contains a created `Promise` and its corresponding `Future`. The +perhaps-familiar `makePromiseFuture` factory function now simply returns `PromiseAndFuture{}`. -As was previously alluded to, it's -also possible to make a "ready future" - one that has no associated promise and is already filled -with a value or error. These might be useful in cases where the code that produces values in a way -that's normally asynchronous happens to have one available already when a request comes in, and -would like to return it right away. To create such a ready future, use `Future::makeReady()`, or -the helper function [makeReadyFutureWith(Func&& func)][mrfw] which will call the specified `func` -and create a ready `Future` from its returned value. +As was previously alluded to, it's also possible to make a "ready future" - one that has no +associated promise and is already filled with a value or error. These might be useful in cases where +the code that produces values in a way that's normally asynchronous happens to have one available +already when a request comes in, and would like to return it right away. To create such a ready +future, use `Future::makeReady()`, or the helper function [makeReadyFutureWith(Func&& +func)][mrfw] which will call the specified `func` and create a ready `Future` from its returned +value. Lastly, there might be occasions when multiple futures should be fulfilled with the same value, at the same time. This use case is best served by `SharedPromise` and the associated `SharedSemiFuture` @@ -144,8 +143,8 @@ calling threads, and return `Future`s to those threads that will be readied o available. The service may have its own internal threads it uses to produce `T`s, and doesn't want to lend out its internal threads to do the work chained via continuations to the `Future`s it's given to calling threads. Instead, it needs to insist that continuations are not chained onto the -futures it gives out, or that the caller receiving the future -arranges for some _other_ thread to run continuations. +futures it gives out, or that the caller receiving the future arranges for some _other_ thread to +run continuations. Fortunately, the service can enforce these guarantees using two types closely related to `Future`: the types `SemiFuture` and `ExecutorFuture`. @@ -270,33 +269,32 @@ will traverse the remaining continuation chain, and find the continuation chaine is run. Note that all of the continuation-chaining functions we've discussed, like `.then()`, return future- -like types themselves (i.e. `Future`, `SemiFuture`, and the like). When we chain -continuations in the manner we've been discussing here, subsequent continuations run when the future -returned by the previous continuation is ready, and the future-like type is "unwrapped" such that -the type wrapped by the future (or, in the case of failure, the error) is passed directly to the -subsequent continuation. For more detail on this topic, see the block comment above the -continuation-chaining member functions in [future.h][future], starting above the definition for -`then()`. +like types themselves (i.e. `Future`, `SemiFuture`, and the like). When we chain continuations +in the manner we've been discussing here, subsequent continuations run when the future returned by +the previous continuation is ready, and the future-like type is "unwrapped" such that the type +wrapped by the future (or, in the case of failure, the error) is passed directly to the subsequent +continuation. For more detail on this topic, see the block comment above the continuation-chaining +member functions in [future.h][future], starting above the definition for `then()`. At some point, we may have no more continuations to add to a future chain, and will want to either synchronously extract the value or error held in the last future of the chain, or add a callback to asynchronously consume this value. The `.get()` and `.getAsync()` members of future-like types -provide these facilities for terminating a future chain by extracting or asynchronously -consuming the result of the chain. The `.getAsync()` function works much like `.onCompletion()`, -taking a `Status` or `StatusWith` and running regardless of whether or not the previous link in -the chain resolved with error or success, and running asynchronously when the previous results are -ready (to determine what thread `.getAsync()` will run on, follow the rules laid out in the previous -"Where Do Continuations Run?" section.) Conversely, `.get()` takes no arguments, and blocks when it -is called until the entirety of the continuation chain is resolved, with the final result given back -to the blocking caller. Note that if the final result of the chain was an error that can be -converted to a MongoDB `Status` type (i.e. either a `Status`-family type or `DBException`), it will -be re-thrown as a `DBException` at the site where `.get()` is called when it is available. If the -code calling `.get()` is not capable of handling an exception, use `.getNoThrow()` instead to -extract the same error in the form of a `Status`. In the case of `.getAsync()`, all errors are -converted to `Status`, and crucially, callables chained as continuations via `.getAsync()` cannot -throw any exceptions, as there is no appropriate context with which to handle an asynchronous -exception. If an exception is thrown from a continuation chained via `.getAsync()`, the entire -process will be terminated (i.e. the program will crash). +provide these facilities for terminating a future chain by extracting or asynchronously consuming +the result of the chain. The `.getAsync()` function works much like `.onCompletion()`, taking a +`Status` or `StatusWith` and running regardless of whether or not the previous link in the chain +resolved with error or success, and running asynchronously when the previous results are ready (to +determine what thread `.getAsync()` will run on, follow the rules laid out in the previous "Where Do +Continuations Run?" section.) Conversely, `.get()` takes no arguments, and blocks when it is called +until the entirety of the continuation chain is resolved, with the final result given back to the +blocking caller. Note that if the final result of the chain was an error that can be converted to a +MongoDB `Status` type (i.e. either a `Status`-family type or `DBException`), it will be re-thrown as +a `DBException` at the site where `.get()` is called when it is available. If the code calling +`.get()` is not capable of handling an exception, use `.getNoThrow()` instead to extract the same +error in the form of a `Status`. In the case of `.getAsync()`, all errors are converted to `Status`, +and crucially, callables chained as continuations via `.getAsync()` cannot throw any exceptions, as +there is no appropriate context with which to handle an asynchronous exception. If an exception is +thrown from a continuation chained via `.getAsync()`, the entire process will be terminated (i.e. +the program will crash). ## Notes and Links diff --git a/docs/fuzztest.md b/docs/fuzztest.md index 76a0391df39..e7aae667a4c 100644 --- a/docs/fuzztest.md +++ b/docs/fuzztest.md @@ -2,31 +2,27 @@ title: FuzzTest --- -FuzzTest is a coverage-guided fuzzing framework for C++ that integrates -directly with GoogleTest. FuzzTest lets you write _property-based tests_: you -describe the shape of your inputs using typed _domains_, and the framework -generates and mutates values that satisfy those constraints. FuzzTest -uses Centipede as its fuzzing engine and AUBSAN to surface undefined -behavior. +FuzzTest is a coverage-guided fuzzing framework for C++ that integrates directly with GoogleTest. +FuzzTest lets you write _property-based tests_: you describe the shape of your inputs using typed +_domains_, and the framework generates and mutates values that satisfy those constraints. FuzzTest +uses Centipede as its fuzzing engine and AUBSAN to surface undefined behavior. # When to use FuzzTest -- Your function under test accepts structured inputs (integers, strings, - custom types, BSON objects, etc.) rather than an opaque byte blob. -- You want to express correctness properties beyond "does not crash", such - as API invariants, differential equivalence, or roundtrip symmetry. -- You want a fuzz test that also runs cleanly as a unit test in normal CI, - without needing a special fuzzer build variant. +- Your function under test accepts structured inputs (integers, strings, custom types, BSON objects, + etc.) rather than an opaque byte blob. +- You want to express correctness properties beyond "does not crash", such as API invariants, + differential equivalence, or roundtrip symmetry. +- You want a fuzz test that also runs cleanly as a unit test in normal CI, without needing a special + fuzzer build variant. # How to use FuzzTest ## The property function and FUZZ_TEST macro -A FuzzTest consists of a _property function_ and a registration macro. -The property function is a plain C++ function whose parameters define the -inputs to fuzz. The framework calls it repeatedly with generated values, -looking for any call that triggers an assertion failure or sanitizer -error. +A FuzzTest consists of a _property function_ and a registration macro. The property function is a +plain C++ function whose parameters define the inputs to fuzz. The framework calls it repeatedly +with generated values, looking for any call that triggers an assertion failure or sanitizer error. ```cpp #include "fuzztest/fuzztest.h" @@ -38,14 +34,16 @@ void MyFunctionFuzzer(const std::string& input) { FUZZ_TEST(MyTestSuite, MyFunctionFuzzer); ``` -When no `.WithDomains()` clause is provided, each parameter defaults to -`fuzztest::Arbitrary()`, which covers most standard library types. +When no `.WithDomains()` clause is provided, each parameter defaults to `fuzztest::Arbitrary()`, +which covers most standard library types. ## Specifying input domains Use `.WithDomains()` to constrain the generated inputs: -> ⚠️ **Warning:** Never initialize input domains with global objects initialized in other compilation units. For more information see [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md) +> ⚠️ **Warning:** Never initialize input domains with global objects initialized in other +> compilation units. For more information see +> [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md) ```cpp void ProcessRequestFuzzer(int opcode, const std::string& payload) { @@ -56,14 +54,18 @@ FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer) /*payload=*/fuzztest::Arbitrary()); ``` -FuzzTest ships with a rich set of built-in domains. A complete list of default types implemented in fuzztest can be found in the [Fuzztest Domain Reference](https://github.com/google/fuzztest/blob/main/doc/domains-reference.md). Also see [BSON Fuzzing](#fuzzing-bson). +FuzzTest ships with a rich set of built-in domains. A complete list of default types implemented in +fuzztest can be found in the +[Fuzztest Domain Reference](https://github.com/google/fuzztest/blob/main/doc/domains-reference.md). +Also see [BSON Fuzzing](#fuzzing-bson). ## Providing seeds -Seed values give the fuzzer a head start by providing known-interesting -inputs to mutate: +Seed values give the fuzzer a head start by providing known-interesting inputs to mutate: -> ⚠️ **Warning:** Never initialize seeds with global objects initialized in other compilation units. For more information see [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md) +> ⚠️ **Warning:** Never initialize seeds with global objects initialized in other compilation units. +> For more information see +> [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md) ```cpp FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer) @@ -82,11 +84,9 @@ FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer) ## Common correctness patterns -Beyond "does not crash", FuzzTest makes it easy to assert higher-level -properties. +Beyond "does not crash", FuzzTest makes it easy to assert higher-level properties. -**Roundtrip**: verify that encode→decode (or serialize→parse) is the -identity: +**Roundtrip**: verify that encode→decode (or serialize→parse) is the identity: ```cpp void SerializeRoundtrips(const MyMessage& msg) { @@ -97,8 +97,7 @@ void SerializeRoundtrips(const MyMessage& msg) { FUZZ_TEST(MyTestSuite, SerializeRoundtrips); ``` -**Differential fuzzing**: compare two implementations of the same -operation: +**Differential fuzzing**: compare two implementations of the same operation: ```cpp void ImplementationsAgree(const std::string& input) { @@ -109,10 +108,11 @@ FUZZ_TEST(MyTestSuite, ImplementationsAgree); ## Using fixtures -If your test requires expensive one-time setup (e.g. starting a service), -use a fixture with `FUZZ_TEST_F`. Any default-constructible class can be -a fixture; the constructor and destructor run once for the whole fuzz test, -not once per iteration. When using fixtures, care should be taken to ensure that only the initial fixture state is retained. Program state created during a test _**must**_ not affect or be affected by subsequent iterations. +If your test requires expensive one-time setup (e.g. starting a service), use a fixture with +`FUZZ_TEST_F`. Any default-constructible class can be a fixture; the constructor and destructor run +once for the whole fuzz test, not once per iteration. When using fixtures, care should be taken to +ensure that only the initial fixture state is retained. Program state created during a test +_**must**_ not affect or be affected by subsequent iterations. ```cpp class MyServiceFuzzTest { @@ -132,10 +132,10 @@ FUZZ_TEST_F(MyServiceFuzzTest, RequestFuzzer); ## Fuzzing BSON -MongoDB provides a custom FuzzTest domain for generating valid BSON -objects: `mongo::bson_mutator::BSONObjImpl`. It is registered as the -`Arbitrary` specialization, so any fuzz test that -accepts a `ConstSharedBuffer` will automatically receive well-formed BSON. +MongoDB provides a custom FuzzTest domain for generating valid BSON objects: +`mongo::bson_mutator::BSONObjImpl`. It is registered as the `Arbitrary` +specialization, so any fuzz test that accepts a `ConstSharedBuffer` will automatically receive +well-formed BSON. ```cpp #include "mongo/bson/bson_mutator/bson_mutator.h" @@ -147,8 +147,7 @@ void MyCommandFuzzer(ConstSharedBuffer input) { FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer); ``` -To constrain which fields are present and their types, use the -`.With()` builders: +To constrain which fields are present and their types, use the `.With()` builders: ```cpp FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer) @@ -158,8 +157,8 @@ FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer) .WithLong("limit", fuzztest::InRange(0LL, 1000LL))); ``` -Fields added via `.With()` are not guaranteed to appear in every -generated object, which exercises missing-field error handling as well. +Fields added via `.With()` are not guaranteed to appear in every generated object, which +exercises missing-field error handling as well. Use `.WithVariant()` when a field may legally hold more than one type: @@ -171,8 +170,7 @@ fuzztest::Arbitrary() }); ``` -Use `.WithAny()` when a key should be present but its type is -unconstrained: +Use `.WithAny()` when a key should be present but its type is unconstrained: ```cpp fuzztest::Arbitrary().WithAny("filter"); @@ -180,8 +178,8 @@ fuzztest::Arbitrary().WithAny("filter"); ## Bazel target -Use `mongo_cc_fuzztest` (from `//bazel:mongo_src_rules.bzl`) to declare a -fuzz test target. It links in FuzzTest and GoogleTest automatically: +Use `mongo_cc_fuzztest` (from `//bazel:mongo_src_rules.bzl`) to declare a fuzz test target. It links +in FuzzTest and GoogleTest automatically: ```python mongo_cc_fuzztest( @@ -198,8 +196,8 @@ mongo_cc_fuzztest( ## Unit test mode -Every `FUZZ_TEST` is also a regular GoogleTest test. In unit test mode, -the property function is called a small number of times with minimal inputs. This lets fuzz tests run in ordinary CI +Every `FUZZ_TEST` is also a regular GoogleTest test. In unit test mode, the property function is +called a small number of times with minimal inputs. This lets fuzz tests run in ordinary CI alongside unit tests: ``` @@ -208,10 +206,9 @@ bazel test --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocato ## Fuzzing mode -Fuzzing mode enables sanitizer and coverage instrumentation and runs the -test indefinitely (or until a crash is found). It requires the `fsan` -build configuration. Check our Evergreen configuration for the current -bazel arguments, or run: +Fuzzing mode enables sanitizer and coverage instrumentation and runs the test indefinitely (or until +a crash is found). It requires the `fsan` build configuration. Check our Evergreen configuration for +the current bazel arguments, or run: ``` bazel run --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocator=system +my_command_fuzztest -- \ @@ -226,7 +223,9 @@ bazel run --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocator ## Evergreen -Fuzz tests defined in bazel using `mongo_cc_fuzztest` will periodically run on the master branch in evergreen. The compiled tests and their associated corpus are saved to S3 and can be downloaded for debugging issues. The corpus is reused between evergreen runs in order to increase fuzzing coverage. +Fuzz tests defined in bazel using `mongo_cc_fuzztest` will periodically run on the master branch in +evergreen. The compiled tests and their associated corpus are saved to S3 and can be downloaded for +debugging issues. The corpus is reused between evergreen runs in order to increase fuzzing coverage. ## Useful flags diff --git a/docs/golden_data_test_framework.md b/docs/golden_data_test_framework.md index f15e1ee7e01..7aa2b96b7ad 100644 --- a/docs/golden_data_test_framework.md +++ b/docs/golden_data_test_framework.md @@ -33,24 +33,24 @@ outputs. code changes. - Multiple test variations MAY be bundled into a single test. Recommended when testing same feature - with different inputs. This helps reviewing the outputs by grouping similar tests together, and also - reduces the number of output files. + with different inputs. This helps reviewing the outputs by grouping similar tests together, and + also reduces the number of output files. - Changes to test fixture or test code that affect non-trivial amount test outputs MUST BE done in separate pull request from production code changes: - Pull request for test code only changes can be easily reviewed, even if large number of test - outputs are modified. While such changes can still introduce merge conflicts, they don't introduce - risk of regression (if outputs were valid + outputs are modified. While such changes can still introduce merge conflicts, they don't + introduce risk of regression (if outputs were valid - Pull requests with mixed production - Tests in the same suite SHOULD share the fixtures when appropriate. This reduces cost of adding - new tests to the suite. Changes to the fixture may only affect expected outputs from that fixtures, - and those output can be updated in bulk. + new tests to the suite. Changes to the fixture may only affect expected outputs from that + fixtures, and those output can be updated in bulk. - Tests in different suites SHOULD NOT reuse/share fixtures. Changes to the fixture can affect large - number of expected outputs. - There are exceptions to that rule, and tests in different suites MAY reuse/share fixtures if: + number of expected outputs. There are exceptions to that rule, and tests in different suites MAY + reuse/share fixtures if: - Test fixture is considered stable and changes rarely. - Tests suites are related, either by sharing tests, or testing similar components. @@ -59,9 +59,8 @@ outputs. - Tests SHOULD print both inputs and outputs of the tested code. This makes it easy for reviewers to verify of the expected outputs are indeed correct by having both input and output next to each - other. - Otherwise finding the input used to produce the new output may not be practical, and might not even - be included in the diff. + other. Otherwise finding the input used to produce the new output may not be practical, and might + not even be included in the diff. - When resolving merge conflicts on the expected output files, one of the approaches below SHOULD be used: @@ -71,8 +70,8 @@ outputs. hanges done by local branch. - "Accept yours", rerun the tests and verify the new outputs. This approach requires knowledge of production/test code changes in "theirs" branch. However, if such changes resulted in - straightforward and repetitive output changes, like due to printing code change or fixture change, - it may be easier to verify than reinspecting local changes. + straightforward and repetitive output changes, like due to printing code change or fixture + change, it may be easier to verify than reinspecting local changes. - Expected test outputs SHOULD be reused across tightly-coupled test suites. The suites are tightly-coupled if: @@ -92,8 +91,8 @@ outputs. - Versioned tests, where expected behavior is the same for majority of test inputs/scenarios. - AVOID manually modifying expected output files. Those files are considered to be auto generated. - Instead, run the tests and then copy the generated output as a new expected output file. See "How to - diff and accept new test outputs" section for instructions. + Instead, run the tests and then copy the generated output as a new expected output file. See "How + to diff and accept new test outputs" section for instructions. # How to use write Golden Data tests? @@ -121,9 +120,10 @@ outputs. Verifies the output with the expected output that is in the source repo See: [golden_test.h](../src/mongo/unittest/golden_test.h) -Before running `bazel test`, set up the golden test framework as described in the `Setup` section below. -This will ensure that the C++ test outputs are written to a location where `buildscripts/golden_test.py` -can find them so that the `diff` and `accept` functions work as expected. +Before running `bazel test`, set up the golden test framework as described in the `Setup` section +below. This will ensure that the C++ test outputs are written to a location where +`buildscripts/golden_test.py` can find them so that the `diff` and `accept` functions work as +expected. **Example:** @@ -160,8 +160,7 @@ TEST_F(MySuiteFixture, MyFeatureBTest) { } ``` -Also see self-test: -[golden_test_test.cpp](../src/mongo/unittest/golden_test_test.cpp) +Also see self-test: [golden_test_test.cpp](../src/mongo/unittest/golden_test_test.cpp) # How to diff and accept new test outputs on a workstation @@ -177,13 +176,15 @@ buildscripts/golden_test.py requires a one-time workstation setup. Note: this setup is only required to use buildscripts/golden_test.py itself. It is NOT required to just run the Golden Data tests when not using buildscripts/golden_test.py. -1. Create a yaml config file, as described by [Appendix - Config file reference](#appendix---config-file-reference). +1. Create a yaml config file, as described by + [Appendix - Config file reference](#appendix---config-file-reference). 2. Set GOLDEN_TEST_CONFIG_PATH environment variable to config file location, so that is available when running tests and when running buildscripts/golden_test.py tool. ### Automatic Setup -Use buildscripts/golden_test.py builtin setup to initialize default config for your current platform. +Use buildscripts/golden_test.py builtin setup to initialize default config for your current +platform. **Instructions for Linux** @@ -195,8 +196,8 @@ buildscripts/golden_test.py setup **Instructions for Windows** -Run buildscripts/golden_test.py setup utility. -You may be asked for a password, when not running in "Run as administrator" shell. +Run buildscripts/golden_test.py setup utility. You may be asked for a password, when not running in +"Run as administrator" shell. ```cmd c:\python\python310\python.exe buildscripts/golden_test.py setup @@ -295,7 +296,8 @@ $> buildscripts/golden_test.py --help ### Update multiple expected files at once -Some tests will run in multiple passthroughs or build variants, so they have multiple expected files. +Some tests will run in multiple passthroughs or build variants, so they have multiple expected +files. Whenever the test is updated, all the expected files should be updated together as well. @@ -306,8 +308,8 @@ buildscripts/golden_test.py --verbose clean-run-accept jstests/query_golden/NAME This option uses `resmoke.py find-suites` to determine the passthrough suites a test belongs to and runs them. -If the test is found to only belong to the `query_golden_classic` passthrough, it is assumed that -it can have multiple expected results due to being run under multiple build variants with a different +If the test is found to only belong to the `query_golden_classic` passthrough, it is assumed that it +can have multiple expected results due to being run under multiple build variants with a different `internalQueryFrameworkControl` settings. So the test will be run with various values for `internalQueryFrameworkControl`. @@ -348,22 +350,21 @@ outputRootPattern: type: String optional: true description: - Root path patten that will be used to write expected and actual test outputs for all tests - in the test run. - If not specified a temporary folder location will be used. - Path pattern string may use '%' characters in the last part of the path. '%' characters in - the last part of the path will be replaced with random lowercase hexadecimal digits. - examples: /var/tmp/test_output/out-%%%%-%%%%-%%%%-%%%% - /var/tmp/test_output + Root path patten that will be used to write expected and actual test outputs for all tests in + the test run. If not specified a temporary folder location will be used. Path pattern string may + use '%' characters in the last part of the path. '%' characters in the last part of the path + will be replaced with random lowercase hexadecimal digits. + examples: /var/tmp/test_output/out-%%%%-%%%%-%%%%-%%%% /var/tmp/test_output diffCmd: type: String optional: true - description: Shell command to diff a single golden test run output. - {{expected}} and {{actual}} variables should be used and will be replaced with expected and - actual output folder paths respectively. - This property is not used to decide whether the test passes or fails; it is only used to - display differences once we've decided that a test failed. - examples: git diff --no-index "{{expected}}" "{{actual}}" - diff -ruN --unidirectional-new-file --color=always "{{expected}}" "{{actual}}" + description: + Shell command to diff a single golden test run output. {{expected}} and {{actual}} variables + should be used and will be replaced with expected and actual output folder paths respectively. + This property is not used to decide whether the test passes or fails; it is only used to display + differences once we've decided that a test failed. + examples: + git diff --no-index "{{expected}}" "{{actual}}" diff -ruN --unidirectional-new-file + --color=always "{{expected}}" "{{actual}}" ``` diff --git a/docs/idl.md b/docs/idl.md index e4b85032033..ebe486a0991 100644 --- a/docs/idl.md +++ b/docs/idl.md @@ -142,8 +142,8 @@ mongo_idl_library( ``` Bazel knows how to invoke the IDL compiler and generate files in the build directory with the C++ -code. This code can also be generated by `--build_tag_filters=gen_source` tag in bazel which is useful for -code navigation. +code. This code can also be generated by `--build_tag_filters=gen_source` tag in bazel which is +useful for code navigation. The generated IDL code looks something like the simplified code below. @@ -206,17 +206,17 @@ fields on the `commands` object. The special features/requirements of commands: -1. First element must match the name of the command, and the parsing rules of this element - can be customized via the `namespace` field. +1. First element must match the name of the command, and the parsing rules of this element can be + customized via the `namespace` field. 2. In `OP_MSG`, `$db` must be present or defaults to `admin` 3. Commands may have a `struct` as a reply 4. Commands may be a part of API Version 1 -5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files - will automatically be chained to all commands. The IDL compiler imports - [`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct - defined in that file will be chained to all commands by default. -6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc - during parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl). +5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files will + automatically be chained to all commands. The IDL compiler imports + [`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct defined + in that file will be chained to all commands by default. +6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc during + parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl). Example Command: @@ -388,7 +388,8 @@ void idlDeserialize(StringEnumEnum& en, ::mongo::StringData value, const IDLPars constexpr ::mongo::StringData idlGetDefaultParserFieldName(StringEnumEnum) { return "StringEnumEnum"; } ``` -These ADL hooks are not intended to be used directly by user code. See [Serialization/Deserialization API](#serializationdeserialization-api). +These ADL hooks are not intended to be used directly by user code. See +[Serialization/Deserialization API](#serializationdeserialization-api). ### Integer Enums @@ -420,7 +421,8 @@ std::int32_t idlSerialize(IntEnum value); constexpr ::mongo::StringData idlGetDefaultParserFieldName(IntEnum) { return "IntEnum"; } ``` -These ADL hooks are not intended to be used directly by user code. See [Serialization/Deserialization API](#serializationdeserialization-api). +These ADL hooks are not intended to be used directly by user code. See +[Serialization/Deserialization API](#serializationdeserialization-api). ### Serialization/Deserialization API @@ -432,9 +434,9 @@ The public API to serialize and deserialize IDL-generated enums is defined in auto parsedEnum = idl::deserialize(value); ``` -The definitions of `idl::serialize()` and `idl::deserialize()` rely on the autogenerated ADL hooks to -find the serializer/deserializer implementations for each enum. User code should use this public API -and not the ADL hooks directly. +The definitions of `idl::serialize()` and `idl::deserialize()` rely on the autogenerated ADL hooks +to find the serializer/deserializer implementations for each enum. User code should use this public +API and not the ADL hooks directly. ### Reference @@ -482,8 +484,8 @@ types allow users to customize IDL parsing for their own unique needs. A field in a struct or command can be defined as a type but a field can also be an array, enum, struct or variant. Declaring a field as something other then a type preferred to using types since -it allows more type information to be represented in IDL over C++. See `type` in the [field -reference](#struct-fields-attribute-reference) for more information. +it allows more type information to be represented in IDL over C++. See `type` in the +[field reference](#struct-fields-attribute-reference) for more information. Type supports builtin BSON types like int32, int64, and string. These are types built into `BSONElement`/`BSONObjBuilder`. It also supports custom types to give the code full control of @@ -529,11 +531,11 @@ The five key things to note in this example: `BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`. - `serializer` - omitted in this example because `BSONObjBuilder` has builtin support for `std::string` -- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's - possible that objects of the type will not own all of its members. If the type is not a view, - then objects of the type are guaranteed to own all of its members. This field is optional and - defaults to True. To reduce the size of the C++ representation of structs including this type, - you can specify this field as False if the type is not a view type. +- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's possible + that objects of the type will not own all of its members. If the type is not a view, then objects + of the type are guaranteed to own all of its members. This field is optional and defaults to True. + To reduce the size of the C++ representation of structs including this type, you can specify this + field as False if the type is not a view type. ### Custom Types @@ -590,22 +592,29 @@ IDLAnyType: - `std::vector<_>` - When using `std::vector<->`, the getters/setters using `mongo::ConstDataRange` instead - `deserializer` - string - a method name to all deserialize the type. Typically this is a function - that takes `BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`. - By default, IDL assumes it is a instance methods of `cpp_type`. - If prefixed with `::`, assumes the function is a global static function - By default, the deserializer's function signature is `()`. - For `object` types, the deserializer's function signature is `(const BSONObj& -obj)` - For `any` types, the deserializer's function signature is `(BSONElement -element)`. -- `serializer` - string -a method name to all serialize the type. - By default, IDL assumes it is a instance methods of `cpp_type`. - If prefixed with `::`, assumes the function is a global static function - By default, the deserializer's function signature is ` (const -&)` where `type_append` is a type `BSONObjBuilder` understands. - For `object` types, the deserializer's function signature is `(const BSONObj& -obj)` - For `any` types that are not in an array, the serializer's function signature is - `(StringData fieldName, BSONObjBuilder* builder)`. - For `any` types that are in an array, the serializer's function signature is + that takes `BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`. - + By default, IDL assumes it is a instance methods of `cpp_type`. - If prefixed with `::`, assumes + the function is a global static function - By default, the deserializer's function signature is + `()`. - For `object` types, the deserializer's function signature is + `(const BSONObj& obj)` - For `any` types, the deserializer's function signature is + `(BSONElement element)`. +- `serializer` - string -a method name to all serialize the type. - By default, IDL assumes it is a + instance methods of `cpp_type`. - If prefixed with `::`, assumes the function is a global static + function - By default, the deserializer's function signature is + ` (const &)` where `type_append` is a type `BSONObjBuilder` + understands. - For `object` types, the deserializer's function signature is + `(const BSONObj& obj)` - For `any` types that are not in an array, the serializer's + function signature is `(StringData fieldName, BSONObjBuilder* builder)`. - For + `any` types that are in an array, the serializer's function signature is `(BSONArrayBuilder* builder)`. - `deserialize_with_tenant` - bool - if set, adds `TenantId` as the first parameter to `deserializer` - `internal_only` - bool - undocumented, DO NOT USE - `default` - string - default value for a type. A field in a struct inherits this value if a field does not set a default. See struct's `default` rules for more information. -- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's - possible that objects of the type will not own all of its members. If the type is not a view, - then objects of the type are guaranteed to own all of its members. +- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's possible + that objects of the type will not own all of its members. If the type is not a view, then objects + of the type are guaranteed to own all of its members. ## Structs @@ -638,9 +647,8 @@ exampleStruct: optional: true defaultedField: description: >- - Most callers should rely on 42 - as it is the answer to the question - of life the universe and everything. + Most callers should rely on 42 as it is the answer to the question of life the universe and + everything. type: long validator: gt: 0 @@ -762,8 +770,8 @@ multi level chained structs. - `is_command_reply` - bool - if true, marks the struct as a command reply. A struct marked a `is_command_reply` generates a parser that ignores known generic or common fields across all replies when parsing replies (i.e. `ok`, `errmsg`, etc) -- `is_generic_cmd_list` - string - choice [`arg`, `reply`], if set, generates functions `bool -hasField(StringData)` and `bool shouldForwardToShards(StringData)` for each field in the +- `is_generic_cmd_list` - string - choice [`arg`, `reply`], if set, generates functions + `bool hasField(StringData)` and `bool shouldForwardToShards(StringData)` for each field in the struct. If set to `arg`, the struct will automatically be chained to every `command`. - `query_shape_component` - bool - true indicates this special serialization code will be generated to serialize as a query shape @@ -784,10 +792,10 @@ hasField(StringData)` and `bool shouldForwardToShards(StringData)` for each fiel have a variant of strings and structs. - Variant string support differentiates the type to choose based on the BSON type. - Variant struct support differentiates the type to choose based on the _first_ field of the - struct. The first field must be unique in each struct across the structs. When parsing a - BSON object as a variant of multiple structs, the parser assumes that the first field - declared in the IDL struct is always the first field in its BSON representation. - See `bulkWrite` for an example. + struct. The first field must be unique in each struct across the structs. When parsing a BSON + object as a variant of multiple structs, the parser assumes that the first field declared in + the IDL struct is always the first field in its BSON representation. See `bulkWrite` for an + example. - `ignore` - bool - true means field generates no code but is ignored by the generated deserializer. Used to deprecate fields that no longer have an affect but allow strict parsers to ignore them. - `optional` - bool - true means the field is optional. Generated C++ type is @@ -819,8 +827,9 @@ Comparisons are generated with C++ operators for these comparisons - `lt` - string - Validates field is less than or equal to `string` - `gte` - string - Validates field is greater than `string` - `lte` - string - Validates field is less than or equal to `string` -- `callback` - string - A static function to call of the shape `Status (const - value)`. For non-simple types, `value` is passed by const-reference. +- `callback` - string - A static function to call of the shape + `Status (const value)`. For non-simple types, `value` is passed by + const-reference. ## Commands @@ -830,24 +839,24 @@ the `command` object when compared to `struct`. The special features: -1. First element must match the name of the command, and the parsing rules of this element - can be customized via the `namespace` field. +1. First element must match the name of the command, and the parsing rules of this element can be + customized via the `namespace` field. 2. In `OP_MSG`, `$db` must be present or defaults to `admin` 3. Commands may have a `struct` as a reply 4. Commands may be a part of API Version 1 -5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files - will automatically be chained to all commands. The IDL compiler imports - [`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct - defined in that file will be chained to all commands by default. -6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc - during parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl). +5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files will + automatically be chained to all commands. The IDL compiler imports + [`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct defined + in that file will be chained to all commands by default. +6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc during + parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl). The `namespace` field is the field that describes one kind of parameter a command takes. -1. `concatenate_with_db` - takes a collection name. Generates a method `const NamespaceString -getNamespace()`. Examples: `insert`, `update`, `delete` -2. `concatenate_with_db_or_uuid` - takes a collection name. Generates a method `const -NamespaceStringOrUUID& getNamespaceOrUUID()`. Examples: `find`, `count` +1. `concatenate_with_db` - takes a collection name. Generates a method + `const NamespaceString getNamespace()`. Examples: `insert`, `update`, `delete` +2. `concatenate_with_db_or_uuid` - takes a collection name. Generates a method + `const NamespaceStringOrUUID& getNamespaceOrUUID()`. Examples: `find`, `count` 3. `ignored` - ignores the first argument entirely. Examples: `hello`, `setParameter`, `ping` 4. `type` - takes a struct as the first argument. Examples: `getLog`, `clearLog`, `renameCollection` @@ -866,15 +875,16 @@ Commands can also specify their replies that they return. Replies are regular `s - `immutable` - [see structs](#struct-reference) - `non_const_getter` - [see structs](#struct-reference) - `namespace` - string - choice of a string [`concatenate_with_db`, `concatenate_with_db_or_uuid`, - `ignored`, `type`]. Instructs how the value of command field should be parsed - `concatenate_with_db` - Indicates the command field is a string and should be treated as a - collection name. Typically used by commands that deal with collections. Automatically - concatenated with `$db` by the IDL parser. Adds a method `const NamespaceString getNamespace()` - to the generated class. - `concatenate_with_db_or_uuid` - Indicates the command field is a string or uuid, and should be - treated as a collection name. Typically used by commands that deal with collections. - Automatically concatenated with `$db` by the IDL parser. Adds a method `const -NamespaceStringOrUUID& getNamespaceOrUUID()` to the generated class. - `ignored` - Ignores the value of the command field. Used by commands that ignore their command - argument entirely - `type` - Indicates the command takes a custom type for the first field. `type` field must be - set. + `ignored`, `type`]. Instructs how the value of command field should be parsed - + `concatenate_with_db` - Indicates the command field is a string and should be treated as a + collection name. Typically used by commands that deal with collections. Automatically concatenated + with `$db` by the IDL parser. Adds a method `const NamespaceString getNamespace()` to the + generated class. - `concatenate_with_db_or_uuid` - Indicates the command field is a string or + uuid, and should be treated as a collection name. Typically used by commands that deal with + collections. Automatically concatenated with `$db` by the IDL parser. Adds a method + `const NamespaceStringOrUUID& getNamespaceOrUUID()` to the generated class. - `ignored` - Ignores + the value of the command field. Used by commands that ignore their command argument entirely - + `type` - Indicates the command takes a custom type for the first field. `type` field must be set. - `type` - string - name of IDL type or struct to parse the command field as - `command_name` - string - IDL generated parser expects the command to be named the name of YAML map. This can be overwritten with `command_name`. Commands should be `camelCase` @@ -893,8 +903,8 @@ NamespaceStringOrUUID& getNamespaceOrUUID()` to the generated class. - `ignored` ### Access Check Reference -A list of privileges the command checks. Only applicable for commands that are a part of -API Version 1. Checked at runtime when test commands are enabled. +A list of privileges the command checks. Only applicable for commands that are a part of API +Version 1. Checked at runtime when test commands are enabled. - `none` - bool - No privileges required - `simple` - mapping - single [check or privilege](#check-or-privilege) @@ -1002,28 +1012,29 @@ unit tests exercise all features and combinations IDL can handle. #### BSONObj Anchor The parsing method a struct is initialized with indicates what type of ownership the constructed -object has on the `BSONObj` parameter. An internal `BSONObj` anchor ensures that the lifetime of -the `BSONObj` matches the lifetime of the object in the cases that the `BSONObj` parameter is -owned or shared. +object has on the `BSONObj` parameter. An internal `BSONObj` anchor ensures that the lifetime of the +`BSONObj` matches the lifetime of the object in the cases that the `BSONObj` parameter is owned or +shared. #### View Types If the struct is a view, then it's possible that objects of the type will not own all of its members. If the struct is not a view, then objects of the type are guaranteed to own all of its -members. This is determined by recursively checking the fields of a struct. This info is used -during generation to determine whether or not a struct will need a `BSONObj` anchor. +members. This is determined by recursively checking the fields of a struct. This info is used during +generation to determine whether or not a struct will need a `BSONObj` anchor. ## Best Practices IDL has been in use since 2017. In that time, here are a few best practices: 1. strict or non-strict parsers - Structs that are persisted to disk should set `strict: false`. - It's better for upgrade/downgrade. Commands should set `strict: true` or omit it as `strict: -true` is the default. 1. For persistance: For upgrade/downgrade, if a persisted document with a strict parser has a - field added in new version N+1 and then the user downgrades to old version N, the strict - parser will throw an exception and reject the document. If this document was part of the - storage catalog for instance, the server would fail to start. 2. For commands: By using strict parsers, it gives the server the ability to add fields without - the risk of clients accidentally sending fields with the same name that had been ignored. + It's better for upgrade/downgrade. Commands should set `strict: true` or omit it as + `strict: true` is the default. 1. For persistance: For upgrade/downgrade, if a persisted document + with a strict parser has a field added in new version N+1 and then the user downgrades to old + version N, the strict parser will throw an exception and reject the document. If this document + was part of the storage catalog for instance, the server would fail to start. 2. For commands: By + using strict parsers, it gives the server the ability to add fields without the risk of clients + accidentally sending fields with the same name that had been ignored. 2. Extending existing structs/commands - all new fields in a struct/command must be marked optional to support backwards compatibility. For new structs/commands, there should be some required fields. It does not matter if the struct is not persisted, non-optional fields break backwards diff --git a/docs/libfuzzer.md b/docs/libfuzzer.md index 44250efed6d..b734953099d 100644 --- a/docs/libfuzzer.md +++ b/docs/libfuzzer.md @@ -2,28 +2,26 @@ title: LibFuzzer --- -> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See [FuzzTest](fuzztest.md) for new fuzzing implementations +> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See +> [FuzzTest](fuzztest.md) for new fuzzing implementations -LibFuzzer is a tool for performing coverage guided fuzzing of C/C++ -code. LibFuzzer will try to trigger AUBSAN failures in a function you -provide, by repeatedly calling it with a carefully crafted byte array as -input. Each input will be assigned a "score". Byte arrays which exercise -new or more regions of code will score better. LibFuzzer will merge and -mutate high scoring inputs in order to gradually cover more and more -possible behavior. +LibFuzzer is a tool for performing coverage guided fuzzing of C/C++ code. LibFuzzer will try to +trigger AUBSAN failures in a function you provide, by repeatedly calling it with a carefully crafted +byte array as input. Each input will be assigned a "score". Byte arrays which exercise new or more +regions of code will score better. LibFuzzer will merge and mutate high scoring inputs in order to +gradually cover more and more possible behavior. # When to use LibFuzzer -> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See [FuzzTest](fuzztest.md) for new fuzzing implementations +> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See +> [FuzzTest](fuzztest.md) for new fuzzing implementations -LibFuzzer is great for testing functions which accept a opaque blob of -untrusted user-provided data. +LibFuzzer is great for testing functions which accept a opaque blob of untrusted user-provided data. # How to use LibFuzzer -LibFuzzer implements `int main`, and expects to be linked with an object -file which provides the function under test. You will achieve this by -writing a cpp file which implements +LibFuzzer implements `int main`, and expects to be linked with an object file which provides the +function under test. You will achieve this by writing a cpp file which implements ```cpp extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { @@ -31,26 +29,22 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { } ``` -`LLVMFuzzerTestOneInput` will be called repeatedly, with fuzzer -generated bytes in `Data`. `Size` will always truthfully tell your -implementation how many bytes are in `Data`. If your function crashes or -induces an AUBSAN fault, LibFuzzer will consider that to be a finding -worth reporting. +`LLVMFuzzerTestOneInput` will be called repeatedly, with fuzzer generated bytes in `Data`. `Size` +will always truthfully tell your implementation how many bytes are in `Data`. If your function +crashes or induces an AUBSAN fault, LibFuzzer will consider that to be a finding worth reporting. -Keep in mind that your function will often "just" be adapting `Data` to -whatever format our internal C++ functions requires. However, you have a -lot of freedom in exactly what you choose to do. Just make sure your -function crashes or produces an invariant when something interesting -happens! As just a few ideas: +Keep in mind that your function will often "just" be adapting `Data` to whatever format our internal +C++ functions requires. However, you have a lot of freedom in exactly what you choose to do. Just +make sure your function crashes or produces an invariant when something interesting happens! As just +a few ideas: -- You might choose to call multiple implementations of a single - operation, and validate that they produce the same output when - presented the same input. -- You could tease out individual bytes from `Data` and provide them as - different arguments to the function under test. +- You might choose to call multiple implementations of a single operation, and validate that they + produce the same output when presented the same input. +- You could tease out individual bytes from `Data` and provide them as different arguments to the + function under test. -Finally, your cpp file will need a bazel target. There is a method which -defines fuzzer targets, much like how we define unittests. For example: +Finally, your cpp file will need a bazel target. There is a method which defines fuzzer targets, +much like how we define unittests. For example: ```python mongo_cc_fuzzer_test( @@ -70,25 +64,21 @@ defines fuzzer targets, much like how we define unittests. For example: # Running LibFuzzer -Your test's object file and **all** of its dependencies must be compiled -with the "fuzzer" sanitizer, plus a set of sanitizers which might -produce interesting runtime errors like AUBSAN. Evergreen has a build -variant, whose name will include the string "FUZZER", which will compile -and run all of the fuzzer tests. +Your test's object file and **all** of its dependencies must be compiled with the "fuzzer" +sanitizer, plus a set of sanitizers which might produce interesting runtime errors like AUBSAN. +Evergreen has a build variant, whose name will include the string "FUZZER", which will compile and +run all of the fuzzer tests. -The fuzzers can be built locally, for development and debugging. Check -our Evergreen configuration for the current bazel arguments. +The fuzzers can be built locally, for development and debugging. Check our Evergreen configuration +for the current bazel arguments. -LibFuzzer binaries will accept a path to a directory containing its -"corpus". A corpus is a list of examples known to produce interesting -outputs. LibFuzzer will start producing interesting results more quickly -if starts off with a set of inputs which it can begin mutating. When its -done, it will write down any new inputs it discovered into its corpus. -Re-using a corpus across executions is a good way to make LibFuzzer -return more results in less time. Our Evergreen tasks will try to -acquire and re-use a corpus from an earlier commit, if it can. +LibFuzzer binaries will accept a path to a directory containing its "corpus". A corpus is a list of +examples known to produce interesting outputs. LibFuzzer will start producing interesting results +more quickly if starts off with a set of inputs which it can begin mutating. When its done, it will +write down any new inputs it discovered into its corpus. Re-using a corpus across executions is a +good way to make LibFuzzer return more results in less time. Our Evergreen tasks will try to acquire +and re-use a corpus from an earlier commit, if it can. # References -- [LibFuzzer's official - documentation](https://llvm.org/docs/LibFuzzer.html) +- [LibFuzzer's official documentation](https://llvm.org/docs/LibFuzzer.html) diff --git a/docs/linting.md b/docs/linting.md index aaed63ff4c6..c7c2db8416e 100644 --- a/docs/linting.md +++ b/docs/linting.md @@ -60,9 +60,8 @@ Ex: `bash buildscripts/yamllinters.sh` ## Python Linters -The `bazel run lint` command runs all Python linters as well as several other linters in our code base. You can -run auto-remediations via: -`bazel run lint --fix`. +The `bazel run lint` command runs all Python linters as well as several other linters in our code +base. You can run auto-remediations via: `bazel run lint --fix`. Ex: `bazel run lint` diff --git a/docs/load_balancer_support.md b/docs/load_balancer_support.md index ec15c079534..084ac613c2e 100644 --- a/docs/load_balancer_support.md +++ b/docs/load_balancer_support.md @@ -1,18 +1,18 @@ # Proxy protocol support -`mongod` and `mongos` have built-in support for connections made via L4 load balancers using -the [proxy protocol][proxy-protocol-url] header. Placing `mongos` or `mongod` behind load balancers +`mongod` and `mongos` have built-in support for connections made via L4 load balancers using the +[proxy protocol][proxy-protocol-url] header. Placing `mongos` or `mongod` behind load balancers requires proper configuration of the load balancers, `mongos`, and `mongod`. # Configuring mongod To use `mongod` with a L4 load balancer (or reverse proxy) it _must_ be configured with the -`proxyPort` config option whose value can be specified at program start in any of the ways -mentioned in the server config documentation. This config option opens a new port to which the -L4 load balancer _must_ connect. +`proxyPort` config option whose value can be specified at program start in any of the ways mentioned +in the server config documentation. This config option opens a new port to which the L4 load +balancer _must_ connect. -The L4 load balancer (or reverse proxy) _must_ emit a [proxy protocol][proxy-protocol-url] header -at the start of its connection stream. `mongod` supports both version 1 and version 2 of the proxy +The L4 load balancer (or reverse proxy) _must_ emit a [proxy protocol][proxy-protocol-url] header at +the start of its connection stream. `mongod` supports both version 1 and version 2 of the proxy standard. # Reverse proxy vs load balancer @@ -20,8 +20,8 @@ standard. Sharded clusters might be configured to work with either a L4 load balancer or a reverse proxy. In both cases the proxy or load balancer _must_ connect to the `mongos`'s load-balancer port. -Placing `mongos` behind a reverse proxy does not hide the list of `mongos`. The driver will choose -a specific `mongos` to connect to via the reverse proxy. +Placing `mongos` behind a reverse proxy does not hide the list of `mongos`. The driver will choose a +specific `mongos` to connect to via the reverse proxy. Placing `mongos` behind an L4 load balancer hides the list of `mongos`. The driver only sees the load balancer and, the connections it makes are routed by the load balancer to a `mongos`. There is @@ -33,11 +33,18 @@ that connections from a driver are distributed among multiple `mongos`. When a sharded cluster is deployed with a reverse proxy, there are two conditions that must be fulfilled : -- `mongos` must be configured with the [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in the server parameter documentation. - This option causes `mongos` to open a second port. All connections made from reverse proxy _must_ be made over this port, and no regular connections (without HAProxy protocol header) may be made over this port. -- The reverse proxy _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header - at the [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). `mongos` [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both version 1 and version 2 of the proxy - protocol standard. +- `mongos` must be configured with the + [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) + `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in + the server parameter documentation. This option causes `mongos` to open a second port. All + connections made from reverse proxy _must_ be made over this port, and no regular connections + (without HAProxy protocol header) may be made over this port. +- The reverse proxy _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header at + the + [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). + `mongos` + [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both + version 1 and version 2 of the proxy protocol standard. The driver does not require any configuration change compared to a cluster without a reverse proxy. @@ -46,22 +53,32 @@ The driver does not require any configuration change compared to a cluster witho When a sharded cluster is deployed with an L4 load balancer there are three conditions that must be fulfilled : -- `mongos` must be configured with the [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in the server parameter documentation. - This option causes `mongos` to open a second port. All connections made from load - balancers _must_ be made over this port, and no regular connections (without HAProxy protocol header) may be made over this port. -- The L4 load balancer _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header - at the [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). `mongos` [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both version 1 and version 2 of the proxy - protocol standard. -- Clients (drivers or shells) connecting to a `mongos` through the load balancer must set the `loadBalanced` option, - e.g., when connecting to a local `mongos` instance through the load balancer, if the `loadBalancerPort` server parameter was set to 20100, the - connection string must be of the form `"mongodb://localhost:20100/?loadBalanced=true"`. +- `mongos` must be configured with the + [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) + `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in + the server parameter documentation. This option causes `mongos` to open a second port. All + connections made from load balancers _must_ be made over this port, and no regular connections + (without HAProxy protocol header) may be made over this port. +- The L4 load balancer _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header at + the + [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). + `mongos` + [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both + version 1 and version 2 of the proxy protocol standard. +- Clients (drivers or shells) connecting to a `mongos` through the load balancer must set the + `loadBalanced` option, e.g., when connecting to a local `mongos` instance through the load + balancer, if the `loadBalancerPort` server parameter was set to 20100, the connection string must + be of the form `"mongodb://localhost:20100/?loadBalanced=true"`. -There are some subtle behavioral differences that the load balancer options enable, chief of -which is how `mongos` deals with open cursors on client disconnection. Over a normal connection, -`mongos` will keep open cursors alive for a short while after client disconnection in case the -client reconnects and continues to request more from the given cursor. Since client reconnections -aren't expected behind a load balancer (as the load balancer will likely redirect a given client -to a different `mongos` instance upon reconnection), we eagerly [close cursors](https://github.com/mongodb/mongo/commit/b429d5dda98bbe18ab0851ffd1729d3b57fc8a4e) on load balanced -client disconnects. We also [abort any in-progress transactions](https://github.com/mongodb/mongo/commit/74628ed4e314dfe0fd69d3fbae1411981a869f6b) that were initiated by the load balanced client. +There are some subtle behavioral differences that the load balancer options enable, chief of which +is how `mongos` deals with open cursors on client disconnection. Over a normal connection, `mongos` +will keep open cursors alive for a short while after client disconnection in case the client +reconnects and continues to request more from the given cursor. Since client reconnections aren't +expected behind a load balancer (as the load balancer will likely redirect a given client to a +different `mongos` instance upon reconnection), we eagerly +[close cursors](https://github.com/mongodb/mongo/commit/b429d5dda98bbe18ab0851ffd1729d3b57fc8a4e) on +load balanced client disconnects. We also +[abort any in-progress transactions](https://github.com/mongodb/mongo/commit/74628ed4e314dfe0fd69d3fbae1411981a869f6b) +that were initiated by the load balanced client. [proxy-protocol-url]: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt diff --git a/docs/logging.md b/docs/logging.md index fa5c410ad1b..e73d3730573 100644 --- a/docs/logging.md +++ b/docs/logging.md @@ -1,9 +1,9 @@ # Log System Overview -The new log system adds capability to produce structured logs in the [Relaxed -Extended JSON 2.0.0][relaxed_json_2] format. The new API requires names to be -given to variables, forming field names for the variables in structured JSON -logs. Named variables are called attributes in the log system. +The new log system adds capability to produce structured logs in the [Relaxed Extended JSON +2.0.0][relaxed_json_2] format. The new API requires names to be given to variables, forming field +names for the variables in structured JSON logs. Named variables are called attributes in the log +system. # Style guide @@ -13,43 +13,38 @@ Log lines are composed primarily of a message (`msg`) and attributes (`attr` fie ## Philosophy -As you write log messages, keep the following in mind: A big thing that makes -JSON and BSON useful as data formats is the ability to provide rich field names. +As you write log messages, keep the following in mind: A big thing that makes JSON and BSON useful +as data formats is the ability to provide rich field names. -What makes logv2 machine readable is that we write an intact Extended BSON -format. +What makes logv2 machine readable is that we write an intact Extended BSON format. -But, what makes these lines human readable is that the `msg` provides a simple, -clear context for interpreting well-formed field names and values in the `attr` -subdocument. +But, what makes these lines human readable is that the `msg` provides a simple, clear context for +interpreting well-formed field names and values in the `attr` subdocument. ## Specific Guidance -For maximum readability, a log message additionally has the least amount of -repetition possible, and shares attribute names with other related log lines. +For maximum readability, a log message additionally has the least amount of repetition possible, and +shares attribute names with other related log lines. ### Message (the msg field) -The `msg` field predicates a reader's interpretation of the log line. It should -be crafted with care and attention. +The `msg` field predicates a reader's interpretation of the log line. It should be crafted with care +and attention. -- Concisely describe what the log line is reporting, providing enough - context necessary for interpreting attribute field names and values +- Concisely describe what the log line is reporting, providing enough context necessary for + interpreting attribute field names and values - Capitalize the first letter, as in a sentence -- Avoid unnecessary punctuation, but punctuate between sentences if using - multiple sentences +- Avoid unnecessary punctuation, but punctuate between sentences if using multiple sentences - Do not conclude with punctuation -- You may occasionally encounter `msg` strings containing fmt-style - `{expr}` braces. These are legacy artifacts and should be rephrased - according to these guidelines. +- You may occasionally encounter `msg` strings containing fmt-style `{expr}` braces. These are + legacy artifacts and should be rephrased according to these guidelines. ### Attributes (fields in the attr subdocument) -The `attr` subdocument includes important metrics/statistics about the logged -event for the purposes of debugging or performance analysis. These variables -should be named very well, as though intended for a very human-readable portion -of the codebase (like config variable declaration, abstract class definitions, -etc.) +The `attr` subdocument includes important metrics/statistics about the logged event for the purposes +of debugging or performance analysis. These variables should be named very well, as though intended +for a very human-readable portion of the codebase (like config variable declaration, abstract class +definitions, etc.) For `attr` field names, do the following: @@ -57,40 +52,38 @@ For `attr` field names, do the following: The bar for understanding should be: -- Someone with reasonable understanding of mongod behavior should understand - immediately what is being logged -- Someone with reasonable troubleshooting skill should be able to extract doc- - or code-searchable phrases to learn about what is being logged +- Someone with reasonable understanding of mongod behavior should understand immediately what is + being logged +- Someone with reasonable troubleshooting skill should be able to extract doc- or code-searchable + phrases to learn about what is being logged #### Precisely describe values and units -Exception: Do not add a unit suffix when logging a Duration type. The system -automatically adds this unit. +Exception: Do not add a unit suffix when logging a Duration type. The system automatically adds this +unit. #### When providing an execution time attribute, ensure it is named "durationMillis" -To describe the execution time of an operation using our preferred method: -Specify an `attr` name of “duration” and provide a value using the Milliseconds -Duration type. The log system will automatically append "Millis" to the -attribute name. +To describe the execution time of an operation using our preferred method: Specify an `attr` name of +“duration” and provide a value using the Milliseconds Duration type. The log system will +automatically append "Millis" to the attribute name. -Alternatively, specify an `attr` name of “durationMillis” and provide the -number of milliseconds as an integer type. +Alternatively, specify an `attr` name of “durationMillis” and provide the number of milliseconds as +an integer type. -**Importantly**: downstream analysis tools will rely on this convention, as a -replacement for the "[0-9]+ms$" format of prior logs. +**Importantly**: downstream analysis tools will rely on this convention, as a replacement for the +"[0-9]+ms$" format of prior logs. #### Use certain specific terms whenever possible When logging the below information, do so with these specific terms: -- **namespace** - when logging a value of the form - "\.\". Do not use "collection" or abbreviate to "ns" +- **namespace** - when logging a value of the form "\.\". Do not use + "collection" or abbreviate to "ns" - **db** - instead of "database" -- **error** - when an error occurs, instead of "status". Use this for objects - of type Status and DBException -- **reason** - to provide rationale for an event/action when "error" isn't - appropriate +- **error** - when an error occurs, instead of "status". Use this for objects of type Status and + DBException +- **reason** - to provide rationale for an event/action when "error" isn't appropriate ### Examples @@ -122,11 +115,10 @@ The log system is made available with the following header: #include "mongo/logv2/log.h" -The macro `MONGO_LOGV2_DEFAULT_COMPONENT` is expanded by all logging macros. -This configuration macro must expand at their point of use to a `LogComponent` -expression, which is implicitly attached to the emitted message. It is -conventionally defined near the top of a `.cpp` file after headers are included, -and before any logging macros are invoked. Example: +The macro `MONGO_LOGV2_DEFAULT_COMPONENT` is expanded by all logging macros. This configuration +macro must expand at their point of use to a `LogComponent` expression, which is implicitly attached +to the emitted message. It is conventionally defined near the top of a `.cpp` file after headers are +included, and before any logging macros are invoked. Example: #define MONGO_LOGV2_DEFAULT_COMPONENT ::mongo::logv2::LogComponent::kDefault @@ -138,22 +130,19 @@ Logging is performed using function style macros: ..., "nameN"_attr = varN); -The ID is a signed 32bit integer in the same number space as the error code -numbers. It is used to uniquely identify a log statement. If changing existing -code, using a new ID is strongly advised to avoid any parsing ambiguity. When -selecting ID during work on JIRA ticket `SERVER-ABCDE` you can use the JIRA -ticket number to avoid ID collisions with other engineers by taking ID from the -range `ABCDE00` - `ABCDE99`. +The ID is a signed 32bit integer in the same number space as the error code numbers. It is used to +uniquely identify a log statement. If changing existing code, using a new ID is strongly advised to +avoid any parsing ambiguity. When selecting ID during work on JIRA ticket `SERVER-ABCDE` you can use +the JIRA ticket number to avoid ID collisions with other engineers by taking ID from the range +`ABCDE00` - `ABCDE99`. -Attributes are created with the `_attr` user-defined literal. The intermediate -object that gets instantiated provides the assignment operator `=` for -assigning a value to the attribute. +Attributes are created with the `_attr` user-defined literal. The intermediate object that gets +instantiated provides the assignment operator `=` for assigning a value to the attribute. -The message string must be a compile time constant. -This is to avoid dynamic attribute names in the log output and to be able to -add compile time verification of log statements in the future. If the string -needs to be shared with anything else (like constructing a Status object) you -can use this pattern: +The message string must be a compile time constant. This is to avoid dynamic attribute names in the +log output and to be able to add compile time verification of log statements in the future. If the +string needs to be shared with anything else (like constructing a Status object) you can use this +pattern: static constexpr char str[] = "the string"; @@ -172,13 +161,12 @@ can use this pattern: ### Log Component -To override the default component, a separate logging API can be used that -takes a `LogOptions` structure: +To override the default component, a separate logging API can be used that takes a `LogOptions` +structure: LOGV2_OPTIONS(options, message-string, attr0, ...); -`LogOptions` can be constructed with a `LogComponent` to avoid verbosity in the -log statement. +`LogOptions` can be constructed with a `LogComponent` to avoid verbosity in the log statement. ##### Example @@ -186,9 +174,8 @@ log statement. ### Log Severity -`LOGV2` is the logging macro for the default informational (0) severity. To log -to different severities there are separate logging macros to be used, they all -take paramaters like `LOGV2`: +`LOGV2` is the logging macro for the default informational (0) severity. To log to different +severities there are separate logging macros to be used, they all take paramaters like `LOGV2`: - `LOGV2_WARNING` - `LOGV2_ERROR` @@ -202,18 +189,17 @@ There is also variations that take `LogOptions` if needed: - `LOGV2_ERROR_OPTIONS` - `LOGV2_FATAL_OPTIONS` -Fatal level log statements using `LOGV2_FATAL` perform `fassert` after logging, -using the provided ID as assert id. `LOGV2_FATAL_NOTRACE` perform -`fassertNoTrace` and `LOGV2_FATAL_CONTINUE` does not `fassert` allowing for -continued execution. `LOGV2_FATAL_CONTINUE` is meant to be used when a fatal -error has occurred but a different way of halting execution is desired such as -`std::terminate` or `fassertFailedWithStatus`. +Fatal level log statements using `LOGV2_FATAL` perform `fassert` after logging, using the provided +ID as assert id. `LOGV2_FATAL_NOTRACE` perform `fassertNoTrace` and `LOGV2_FATAL_CONTINUE` does not +`fassert` allowing for continued execution. `LOGV2_FATAL_CONTINUE` is meant to be used when a fatal +error has occurred but a different way of halting execution is desired such as `std::terminate` or +`fassertFailedWithStatus`. -`LOGV2_FATAL_OPTIONS` performs `fassert` by default like `LOGV2_FATAL` but this -can be changed by setting the `FatalMode` on the `LogOptions`. +`LOGV2_FATAL_OPTIONS` performs `fassert` by default like `LOGV2_FATAL` but this can be changed by +setting the `FatalMode` on the `LogOptions`. -Debug-level logging is slightly different where an additional parameter (as -integer) required to indicate the desired debug level: +Debug-level logging is slightly different where an additional parameter (as integer) required to +indicate the desired debug level: LOGV2_DEBUG(ID, debug-level, message-string, attr0, ...); @@ -224,17 +210,15 @@ integer) required to indicate the desired debug level: message-string, attr0, ...); -`LOGV2_PROD_ONLY` logs like a default `LOGV2` log in production, but debug-1 log -in internal testing. It accepts the same arguments as `LOGV2`. This log level is -for log lines that may be spammy in testing but are more rare in production. As -such, they may be useful in investigations. This level also preserves backwards -compatibility for logs that are no longer as useful as when they were introduced. -To determine whether to log, this macro uses the `LogSeverity::ProdOnly()` -level, which returns level `LogSeverity::Debug(1)` when in a testing environment -and `LogSeverity::Log()` otherwise. Whether the server is in a testing -environment is determined using the `enableTestCommands` server parameter. -It is preferred to use other macros over this one as it introduces a difference -between testing and production. There is also the `LOGV2_PROD_ONLY_OPTIONS` +`LOGV2_PROD_ONLY` logs like a default `LOGV2` log in production, but debug-1 log in internal +testing. It accepts the same arguments as `LOGV2`. This log level is for log lines that may be +spammy in testing but are more rare in production. As such, they may be useful in investigations. +This level also preserves backwards compatibility for logs that are no longer as useful as when they +were introduced. To determine whether to log, this macro uses the `LogSeverity::ProdOnly()` level, +which returns level `LogSeverity::Debug(1)` when in a testing environment and `LogSeverity::Log()` +otherwise. Whether the server is in a testing environment is determined using the +`enableTestCommands` server parameter. It is preferred to use other macros over this one as it +introduces a difference between testing and production. There is also the `LOGV2_PROD_ONLY_OPTIONS` variation that takes `LogOptions`. ##### Example @@ -248,15 +232,13 @@ variation that takes `LogOptions`. ### Log Tags -Log tags are replacing the Tee from the old log system as the way to indicate -that the log should also be written to a `RamLog` (accessible with the `getLog` -command). +Log tags are replacing the Tee from the old log system as the way to indicate that the log should +also be written to a `RamLog` (accessible with the `getLog` command). -Tags are added to a log statement with the options API similarly to how -non-default components are specified by constructing a `LogOptions`. +Tags are added to a log statement with the options API similarly to how non-default components are +specified by constructing a `LogOptions`. -Multiple tags can be attached to a log statement using the bitwise or operator -`|`. +Multiple tags can be attached to a log statement using the bitwise or operator `|`. ##### Example @@ -267,19 +249,18 @@ Multiple tags can be attached to a log statement using the bitwise or operator ### Dynamic attributes -Sometimes there is a need to add attributes depending on runtime conditionals. -To support this there is the `DynamicAttributes` class that has an `add` method -to add named attributes one by one. This class is meant to be used when you -have this specific requirement and is not the general logging API. +Sometimes there is a need to add attributes depending on runtime conditionals. To support this there +is the `DynamicAttributes` class that has an `add` method to add named attributes one by one. This +class is meant to be used when you have this specific requirement and is not the general logging +API. -When finished, it is logged using the regular logging API but the -`DynamicAttributes` instance is passed as the first attribute parameter. Mixing -`_attr` literals with the `DynamicAttributes` is not supported. +When finished, it is logged using the regular logging API but the `DynamicAttributes` instance is +passed as the first attribute parameter. Mixing `_attr` literals with the `DynamicAttributes` is not +supported. -When using the `DynamicAttributes` you need to be careful about parameter -lifetimes. The `DynamicAttributes` binds attributes _by reference_ and the -reference must be valid when passing the `DynamicAttributes` to the log -statement. +When using the `DynamicAttributes` you need to be careful about parameter lifetimes. The +`DynamicAttributes` binds attributes _by reference_ and the reference must be valid when passing the +`DynamicAttributes` to the log statement. ##### Example @@ -321,11 +302,11 @@ Many basic types have built in support: ### User-defined types -To make a user-defined type loggable it needs a serialization member function -that the log system can bind to. +To make a user-defined type loggable it needs a serialization member function that the log system +can bind to. -The system binds and uses serialization functions by looking for functions in -the following priority order: +The system binds and uses serialization functions by looking for functions in the following priority +order: - Structured serialization functions - `void x.serialize(BSONObjBuilder*) const` (member) @@ -338,19 +319,18 @@ the following priority order: - `x.toString() ` (member) - `toString(x)` (non-member) -Enums cannot have member functions, but they will still try to bind to the -`toStringForLogging(e)` or `toString(e)` non-members. If neither is available, -the enum value will be logged as its underlying integral type. +Enums cannot have member functions, but they will still try to bind to the `toStringForLogging(e)` +or `toString(e)` non-members. If neither is available, the enum value will be logged as its +underlying integral type. -In order to offer structured serialization and output, a type would need to -supply a structured serialization function. Otherwise, if only stringification -is provided, the output will be an escaped string. +In order to offer structured serialization and output, a type would need to supply a structured +serialization function. Otherwise, if only stringification is provided, the output will be an +escaped string. -The `toStringForLogging` non-member is an ADL customization hook used to -override `toString` for very rare cases where `toString` is inappropriate for -logging perhaps because it's needed for other non-logging formatting. Usually a -`toString` (member or nonmember) is a sufficient customization point and should -be preferred as a canonical stringification of the object. +The `toStringForLogging` non-member is an ADL customization hook used to override `toString` for +very rare cases where `toString` is inappropriate for logging perhaps because it's needed for other +non-logging formatting. Usually a `toString` (member or nonmember) is a sufficient customization +point and should be preferred as a canonical stringification of the object. _NOTE: No `operator<<` overload is used even if available_ @@ -370,20 +350,19 @@ _NOTE: No `operator<<` overload is used even if available_ ### Container support -STL containers and data structures that have STL like interfaces are loggable -as long as they contain loggable elements (built-in, user-defined or other -containers). +STL containers and data structures that have STL like interfaces are loggable as long as they +contain loggable elements (built-in, user-defined or other containers). #### Sequential containers -Sequential containers like `std::vector`, `std::deque` and `std::list` are -loggable and the elements get formatted as JSON array in structured output. +Sequential containers like `std::vector`, `std::deque` and `std::list` are loggable and the elements +get formatted as JSON array in structured output. #### Associative containers -Associative containers such as `std::map` and `stdx::unordered_map` loggable -with the requirement that they key is of a string type. The structured format -is a JSON object where the field names are the key. +Associative containers such as `std::map` and `stdx::unordered_map` loggable with the requirement +that they key is of a string type. The structured format is a JSON object where the field names are +the key. #### Ranges @@ -392,11 +371,10 @@ Ranges is loggable via helpers to indicate what type of range it is - `seqLog(begin, end)` - `mapLog(begin, end)` -seqLog indicates that it is a sequential range where the iterators point to -loggable value directly. +seqLog indicates that it is a sequential range where the iterators point to loggable value directly. -mapLog indicates that it is a range coming from an associative container where -the iterators point to a key-value pair. +mapLog indicates that it is a range coming from an associative container where the iterators point +to a key-value pair. ##### Examples @@ -425,10 +403,9 @@ the iterators point to a key-value pair. #### Containers and `uint64_t` -Logging of containers uses `BSONObj` as an internal representation and -`uint64_t` is not a supported type with `BSONObjBuilder::append()`. As a user -you can use `boost::transform_iterator` to cast the `uint64_t` to a supported -type. +Logging of containers uses `BSONObj` as an internal representation and `uint64_t` is not a supported +type with `BSONObjBuilder::append()`. As a user you can use `boost::transform_iterator` to cast the +`uint64_t` to a supported type. ##### Example @@ -448,17 +425,14 @@ type. ### Duration types -Duration types have special formatting to match existing practices in the -server code base. Their resulting format depends on the context they are -logged. +Duration types have special formatting to match existing practices in the server code base. Their +resulting format depends on the context they are logged. -When durations are formatted as JSON or BSON a unit suffix is added to the -attribute name when building the field name. The value will be count of the -duration as a number. +When durations are formatted as JSON or BSON a unit suffix is added to the attribute name when +building the field name. The value will be count of the duration as a number. -When logging containers with durations there is no attribute per duration -instance that can have the suffix added. In this case durations are instead -formatted as a BSON object. +When logging containers with durations there is no attribute per duration instance that can have the +suffix added. In this case durations are instead formatted as a BSON object. ##### Examples @@ -485,9 +459,9 @@ formatted as a BSON object. # Attribute naming abstraction -The style guide contains recommendations for attribute naming in certain cases. -To make abstraction of attribute naming possible a `logAttrs` function can be -implemented as a friend function in a class with the following signature: +The style guide contains recommendations for attribute naming in certain cases. To make abstraction +of attribute naming possible a `logAttrs` function can be implemented as a friend function in a +class with the following signature: class AnyUserType { public: @@ -505,15 +479,13 @@ implemented as a friend function in a class with the following signature: ## Multiple attributes -In some cases a loggable type might be composed as a hierarchy in the C++ type -system which would lead to a very verbose structured log output as every level -in the hierarcy needs a name when outputted as JSON. The attribute naming -abstraction system can also be used to collapse such hierarchies. Instead of -making a type loggable it can instead return one or more attributes from its +In some cases a loggable type might be composed as a hierarchy in the C++ type system which would +lead to a very verbose structured log output as every level in the hierarcy needs a name when +outputted as JSON. The attribute naming abstraction system can also be used to collapse such +hierarchies. Instead of making a type loggable it can instead return one or more attributes from its members by using `multipleAttrs` in `logAttrs` functions. -`multipleAttrs(...)` accepts attributes or instances of types with `logAttrs` -functions implemented. +`multipleAttrs(...)` accepts attributes or instances of types with `logAttrs` functions implemented. ##### Examples @@ -535,12 +507,11 @@ functions implemented. ## Handling temporary lifetime with multiple attributes -To avoid lifetime issues (log attributes bind their values by reference) it is -recommended to **not** create attributes when using `multipleAttrs` unless -attributes are created for members directly. If `logAttrs` or `""_attr=` is -used inside a `logAttrs` function on the return of a function returning by -value it will result in a dangling reference. The following example illustrates -the problem: +To avoid lifetime issues (log attributes bind their values by reference) it is recommended to +**not** create attributes when using `multipleAttrs` unless attributes are created for members +directly. If `logAttrs` or `""_attr=` is used inside a `logAttrs` function on the return of a +function returning by value it will result in a dangling reference. The following example +illustrates the problem: class SomeSubType { public: @@ -566,10 +537,9 @@ the problem: std::string name_; }; -The better implementation would be to let the log system control the -lifetime by passing the instance to `multipleAttrs` without creating the -attribute. The log system will detect that it is not an attribute and will -attempt to create attributes by calling `logAttrs`: +The better implementation would be to let the log system control the lifetime by passing the +instance to `multipleAttrs` without creating the attribute. The log system will detect that it is +not an attribute and will attempt to create attributes by calling `logAttrs`: friend auto logAttrs(const SomeType& type) { return logv2::multipleAttrs("name"_attr=type.name(), type.sub()); @@ -579,11 +549,10 @@ attempt to create attributes by calling `logAttrs`: ## Combining uassert with log statement -Code that emits a high severity log statement may also need to emit a `uassert` -after the log. There is the `UserAssertAfterLog` logging option that allows you -to re-use the log statement to do the formatting required for the `uassert`. -The assertion id can be either the logging ID by passing `UserAssertAfterLog` -with no arguments or the assertion id can set by constructing +Code that emits a high severity log statement may also need to emit a `uassert` after the log. There +is the `UserAssertAfterLog` logging option that allows you to re-use the log statement to do the +formatting required for the `uassert`. The assertion id can be either the logging ID by passing +`UserAssertAfterLog` with no arguments or the assertion id can set by constructing `UserAssertAfterLog` with an `ErrorCodes::Error`. The assertion reason string will be a plain text log and can be provided with additional attribute @@ -614,26 +583,23 @@ Would emit a `uassert` after performing the log that is equivalent to: ## Unstructured logging for local development -To make it easier to use the log system for tracing in local development, there -is a special API that does not use IDs or attribute names: +To make it easier to use the log system for tracing in local development, there is a special API +that does not use IDs or attribute names: logd(format-string, value0, ..., valueN); It formats the string using libfmt similarly to what -`fmt::format(format-string, value0, ..., valueN)` would produce but using the -regular log system type support on how types are made loggable. The formatted -string is logged as the `msg` field in the JSON output, with no `attr` -subobject. +`fmt::format(format-string, value0, ..., valueN)` would produce but using the regular log system +type support on how types are made loggable. The formatted string is logged as the `msg` field in +the JSON output, with no `attr` subobject. -When using `logd` the log will emitted with standard severity and the default -component. +When using `logd` the log will emitted with standard severity and the default component. -A difference from regular logging, `logd` is allowed to be used in header files -by including `logv2/log_debug.h`. +A difference from regular logging, `logd` is allowed to be used in header files by including +`logv2/log_debug.h`. -Unstructured logging is not allowed to be used in code committed to master, -there is a lint check to validate this. It is however allowed to be used in -Evergreen patch builds. +Unstructured logging is not allowed to be used in code committed to master, there is a lint check to +validate this. It is however allowed to be used in Evergreen patch builds. ##### Examples @@ -642,8 +608,8 @@ Evergreen patch builds. ## Rate limiting -Rate limiting logs is useful to reduce the impact of logging on database throughput. At high -rate and concurrency, logging can be expensive and reduce performance. Attention should be paid +Rate limiting logs is useful to reduce the impact of logging on database throughput. At high rate +and concurrency, logging can be expensive and reduce performance. Attention should be paid specifically to logs that can occur on every operation, whether they fail or succeed. The rate limiting feature is implemented by `SeveritySuppressor` (see @@ -653,8 +619,8 @@ severity; subsequent logs within that interval are emitted at a "quiet" severity level). This ensures logs are not always written unless the logging level is increased for the component. -`SeveritySuppressor` is typically used with `StaticImmortal` for static storage. The interval can -be configured with a server parameter when constructing SeveritySuppressor. +`SeveritySuppressor` is typically used with `StaticImmortal` for static storage. The interval can be +configured with a server parameter when constructing SeveritySuppressor. ##### Example @@ -666,18 +632,17 @@ be configured with a server parameter when constructing SeveritySuppressor. "Slow network response send time", "elapsed"_attr = bob.obj()); -In this example, the first log within each gSlowNetworkLogRate-second window is emitted at Info level; -subsequent logs within that window are emitted at Debug(2), which requires increasing the component's -log level to be visible. +In this example, the first log within each gSlowNetworkLogRate-second window is emitted at Info +level; subsequent logs within that window are emitted at Debug(2), which requires increasing the +component's log level to be visible. For per-key rate limiting (e.g., one log per key per interval), use `KeyedSeveritySuppressor` instead. # JSON output format -Produces structured logs of the [Relaxed Extended JSON 2.0.0][relaxed_json_2] -format. Below is an example of a log statement in C++ and a pretty-printed JSON -output: +Produces structured logs of the [Relaxed Extended JSON 2.0.0][relaxed_json_2] format. Below is an +example of a log statement in C++ and a pretty-printed JSON output: C++ statement: @@ -717,5 +682,7 @@ Output: --- [relaxed_json_2]: https://github.com/mongodb/specifications/blob/master/source/extended-json.rst -[_lastOplogEntryFetcherCallbackForStopTimestamp]: https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/initial_syncer.cpp#L1500-L1512 -[_summarizeRollback]: https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/rollback_impl.cpp#L1263-L1305 +[_lastOplogEntryFetcherCallbackForStopTimestamp]: + https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/initial_syncer.cpp#L1500-L1512 +[_summarizeRollback]: + https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/rollback_impl.cpp#L1263-L1305 diff --git a/docs/memory_management.md b/docs/memory_management.md index 7b7e70bd6f3..30440ff1d7c 100644 --- a/docs/memory_management.md +++ b/docs/memory_management.md @@ -2,5 +2,5 @@ - Avoid using bare pointers for dynamically allocated objects. Prefer `std::unique_ptr`, `std::shared_ptr`, or another RAII class such as `BSONObj`. -- If you assign the output of `new/malloc()` directly to a bare pointer you should document where - it gets deleted/freed, who owns it along the way, and how exception safety is ensured. +- If you assign the output of `new/malloc()` directly to a bare pointer you should document where it + gets deleted/freed, who owns it along the way, and how exception safety is ensured. diff --git a/docs/modularity.md b/docs/modularity.md index c04a976b327..b83c4399ec9 100644 --- a/docs/modularity.md +++ b/docs/modularity.md @@ -15,86 +15,87 @@ TODO ## Why are we doing this? Having a clear delineation between public and private APIs for each module will improve the -maintainability and velocity of our codebase. Teams will have more freedom to evolve their -internal implementation details without affecting consumers. Consumers will benefit from -knowing what APIs are intended for their consumption. +maintainability and velocity of our codebase. Teams will have more freedom to evolve their internal +implementation details without affecting consumers. Consumers will benefit from knowing what APIs +are intended for their consumption. ## Assigning files to modules -The file `modules_poc/modules.yaml` contains a list of modules, each containing -a list of files. Each file must be contained in only one module. Note that -module assignment is not required to map neatly to team ownership. +The file `modules_poc/modules.yaml` contains a list of modules, each containing a list of files. +Each file must be contained in only one module. Note that module assignment is not required to map +neatly to team ownership. -In cases where multiple globs match a file, the current rule is that the -longest glob wins. This is used as a simpler-to-implement version of -most-specific glob wins, which we may switch to in the future. +In cases where multiple globs match a file, the current rule is that the longest glob wins. This is +used as a simpler-to-implement version of most-specific glob wins, which we may switch to in the +future. ## How do I mark API visibility? -This section will just describe the basic process. Later sections will cover the tooling -available to help, along with caveats to be aware of. +This section will just describe the basic process. Later sections will cover the tooling available +to help, along with caveats to be aware of. -First read the documentation in [src/mongo/util/modules.h](https://github.com/mongodb/mongo/blob/master/src/mongo/util/modules.h) -for the canonical list and description of visibility levels. As a brief overview of the main -levels from least to most restrictive: +First read the documentation in +[src/mongo/util/modules.h](https://github.com/mongodb/mongo/blob/master/src/mongo/util/modules.h) for +the canonical list and description of visibility levels. As a brief overview of the main levels from +least to most restrictive: - `OPEN`: This is available for usage _and inheritance_ from anywhere in the codebase - `PUBLIC`: This is available for usage from anywhere in the codebase. For types, subclasses may only be defined in the same module. -- `NEEDS_REPLACEMENT` and `USE_REPLACEMENT(...)`: These are collectively considered - "unfortunately public" and are available for use, but should be avoided +- `NEEDS_REPLACEMENT` and `USE_REPLACEMENT(...)`: These are collectively considered "unfortunately + public" and are available for use, but should be avoided - `PARENT_PRIVATE`: This is similar to `PRIVATE`, but allows usage from any file in the parent module, including other submodules - `PRIVATE`: This may only be used from the current module or one of its submodules -- `FILE_PRIVATE`: This may only be used from the current "file family" (roughly, header \+ cpp - \+ tests). It may not be used by other files, even from the same module. +- `FILE_PRIVATE`: This may only be used from the current "file family" (roughly, header \+ cpp \+ + tests). It may not be used by other files, even from the same module. You can think of public vs private similarly to how you would the sections of a `class`: they indicate whether something is intended to be part of the API or an implementation detail. The difference is that they apply at a wider granularity of code than a single class, with -implementation details available to either the full module (and its submodules) for `PRIVATE` -or the file family for `FILE_PRIVATE`. +implementation details available to either the full module (and its submodules) for `PRIVATE` or the +file family for `FILE_PRIVATE`. -The macros in that header file are attached to declarations and set the visibility level for -that declaration and all of its "semantic children"[^1]. The macros are C++ attributes which -means that they need to go in specific places that differ based on what is being marked (for -templates, the location does not change and is always somewhere after the `template <...>` part): +The macros in that header file are attached to declarations and set the visibility level for that +declaration and all of its "semantic children"[^1]. The macros are C++ attributes which means that +they need to go in specific places that differ based on what is being marked (for templates, the +location does not change and is always somewhere after the `template <...>` part): -- `MONGO_MOD_PUBLIC;` by itself as the first line after includes in a header sets the default - for that header (only `PUBLIC`, `PARENT_PRIVATE`, and `FILE_PRIVATE` are allowed here) -- `namespace MONGO_MOD mongo {` (this does not work with nested namespaces in a single - declaration like `namespace mongo::repl`) +- `MONGO_MOD_PUBLIC;` by itself as the first line after includes in a header sets the default for + that header (only `PUBLIC`, `PARENT_PRIVATE`, and `FILE_PRIVATE` are allowed here) +- `namespace MONGO_MOD mongo {` (this does not work with nested namespaces in a single declaration + like `namespace mongo::repl`) - `class MONGO_MOD Foo {` (Ditto for `enum`, `struct`, and `union`) - `MONGO_MOD void func(...);` - `MONGO_MOD int var;` - `concept isFooable MONGO_MOD {` For the cases where it goes at the beginning of the line, if clang-format chooses an unfortunate -place to break the line, it usually helps to undo the formatting then put the macro on its own -line above the declaration. +place to break the line, it usually helps to undo the formatting then put the macro on its own line +above the declaration. -APIs are marked one header at a time, by including `"mongo/util/modules.h"` in the header. -This causes the header to be treated as "modularized" which has the following effects: +APIs are marked one header at a time, by including `"mongo/util/modules.h"` in the header. This +causes the header to be treated as "modularized" which has the following effects: -- All declarations in that header (not transitive includes) default to `PRIVATE`, meaning that - the public API is what must be marked. -- Members in `private:` sections in classes default to `PRIVATE`, regardless of the visibility - of the class. The only way the language would allow them to be used from outside of the module - is if you have cross-module friendships, which should generally be avoided. If needed - temporarily, favor `NEEDS_REPLACEMENT` over `PUBLIC` for these declarations. -- Declarations ending in `_forTest` default to `FILE_PRIVATE` to support the common case where - they are only intended for testing that class. If they are actually intended to support testing - of consumers, not just the type they are defined on, they can be explicitly given `PUBLIC` or +- All declarations in that header (not transitive includes) default to `PRIVATE`, meaning that the + public API is what must be marked. +- Members in `private:` sections in classes default to `PRIVATE`, regardless of the visibility of + the class. The only way the language would allow them to be used from outside of the module is if + you have cross-module friendships, which should generally be avoided. If needed temporarily, favor + `NEEDS_REPLACEMENT` over `PUBLIC` for these declarations. +- Declarations ending in `_forTest` default to `FILE_PRIVATE` to support the common case where they + are only intended for testing that class. If they are actually intended to support testing of + consumers, not just the type they are defined on, they can be explicitly given `PUBLIC` or `PRIVATE` visibility. -- Internal and detail namespaces default to `PRIVATE` and cannot be made less restricted, but - can still be marked as `FILE_PRIVATE`. Individual declarations within the namespace can be - exposed as necessary, but they cannot be exposed in bulk without changing the name of the - namespace to something that doesn't imply private. +- Internal and detail namespaces default to `PRIVATE` and cannot be made less restricted, but can + still be marked as `FILE_PRIVATE`. Individual declarations within the namespace can be exposed as + necessary, but they cannot be exposed in bulk without changing the name of the namespace to + something that doesn't imply private. For internal headers of a module which do not contribute to its public API, simply including -`modules.h` is sufficient. There is a [tool](#the-private-header-marker) to automate this -process. You may additionally want to consider whether any APIs should be marked `FILE_PRIVATE`, -but that is optional. +`modules.h` is sufficient. There is a [tool](#the-private-header-marker) to automate this process. +You may additionally want to consider whether any APIs should be marked `FILE_PRIVATE`, but that is +optional. For IDL files, you mark visibility of whole types (`struct`, `enum`, and `command`) with the `mod_visibility` option. The value should be the same as one of the `MONGO_MOD` macros, but @@ -105,17 +106,17 @@ compelling use case for this. ## What tooling exists to help me? -Note that all tooling should be run from within a properly set-up python virtual environment. -This includes running `buildscripts/poetry_sync.sh` to ensure you have the correct dependencies. +Note that all tooling should be run from within a properly set-up python virtual environment. This +includes running `buildscripts/poetry_sync.sh` to ensure you have the correct dependencies. ### The scanner and merger -The merger generates a cross reference of all first-party usages of first-party code and stores -it in `merged_decls.json`, which is used by the rest of our tooling. It is also where we validate -that there are no disallowed accesses. It will be invoked for you by the browser when you ask it -to rescan, or you can also manually run it as `modules_poc/merge_decls.py`. If you are interested -in analyzing that file, [`jq`](https://jqlang.org/) is a powerful tool, or you can just write -some python. +The merger generates a cross reference of all first-party usages of first-party code and stores it +in `merged_decls.json`, which is used by the rest of our tooling. It is also where we validate that +there are no disallowed accesses. It will be invoked for you by the browser when you ask it to +rescan, or you can also manually run it as `modules_poc/merge_decls.py`. If you are interested in +analyzing that file, [`jq`](https://jqlang.org/) is a powerful tool, or you can just write some +python. As a rather extreme example of what you can do with `jq`, here is how the progress reports are generated: @@ -129,43 +130,43 @@ generated: jq 'map(., .mod = "TOTAL") | group_by(.mod)[] | group_by(.loc | split(":")[0]) | {mod: .[0].[0].mod, total: length, marked: map(select(any(.visibility == "UNKNOWN") | not)) | length} | .done = (1000 * .marked / .total | round) / 10 | "\(.mod): \(" " * (.mod | 40-length)) \(.done)% (\(.marked) / \(.total))"' -r merged_decls.json ``` -Internally, the merger will internally invoke `bazel build --config=mod-scanner //src/mongo/...` -to run the scanner over the whole codebase (or the parts that have changed since the last scan), -taking advantage of bazel remote execution to achieve very high levels of parallelism. +Internally, the merger will internally invoke `bazel build --config=mod-scanner //src/mongo/...` to +run the scanner over the whole codebase (or the parts that have changed since the last scan), taking +advantage of bazel remote execution to achieve very high levels of parallelism. ### The browser The main piece of tooling to run is the browser, which is launched by running -`modules_poc/browse.py`. If you haven't scanned the codebase recently, it will offer to run it -for you which will take a few minutes. After modifying the source code, you can rescan at any -time by pressing `r`. It will only rescan files that have been modified or that transitively -include modified headers. +`modules_poc/browse.py`. If you haven't scanned the codebase recently, it will offer to run it for +you which will take a few minutes. After modifying the source code, you can rescan at any time by +pressing `r`. It will only rescan files that have been modified or that transitively include +modified headers. -The browser is primarily intended to assist in labeling public APIs, so the files are sorted -with the most number of unlabeled declarations ("unknowns") first. You can search for a file -by pressing `f` or press `m` to filter the files by module. +The browser is primarily intended to assist in labeling public APIs, so the files are sorted with +the most number of unlabeled declarations ("unknowns") first. You can search for a file by pressing +`f` or press `m` to filter the files by module. -The list of available key bindings is shown on the right. You can toggle that by pressing `?`. -Other keybinding of note are that you can press `g` to go to the currently highlighted -declaration or location in your editor (only when running in the vscode or nvim terminal), -and `p` to toggle an inline preview of the location within the browser. You can press `Tab ↹` -to toggle between the tree and the code preview. The mouse is fully supported for scrolling -and expanding rows in the tree, and there are aliases for some basic vim keybinds (`hjkl/`). +The list of available key bindings is shown on the right. You can toggle that by pressing `?`. Other +keybinding of note are that you can press `g` to go to the currently highlighted declaration or +location in your editor (only when running in the vscode or nvim terminal), and `p` to toggle an +inline preview of the location within the browser. You can press `Tab ↹` to toggle between the tree +and the code preview. The mouse is fully supported for scrolling and expanding rows in the tree, and +there are aliases for some basic vim keybinds (`hjkl/`). ### The private header marker Once you have scanned the codebase and produced a `merged_decls.json`, -`modules_poc/private_headers.py` can be used to find all header and IDL files where there are -no currently detected external usages and automatically mark them as fully private to the -module. This does not necessarily mean that all automatically marked headers are intended to -be private. A human should review to ensure that the marked headers match intent. You can pass -flags to filter on any/all of module, owning team, or path glob. For headers matching the filter, -the script will also warn of usages of `_forTest` external to the file family that may need to -be marked `PRIVATE` to make them available to the whole module since they default to only being -available to the file family for marked headers. +`modules_poc/private_headers.py` can be used to find all header and IDL files where there are no +currently detected external usages and automatically mark them as fully private to the module. This +does not necessarily mean that all automatically marked headers are intended to be private. A human +should review to ensure that the marked headers match intent. You can pass flags to filter on +any/all of module, owning team, or path glob. For headers matching the filter, the script will also +warn of usages of `_forTest` external to the file family that may need to be marked `PRIVATE` to +make them available to the whole module since they default to only being available to the file +family for marked headers. -Make sure to run `buildscripts/clang_format.py format-my` or `bazel run format` after using it -to modify any C++ files. +Make sure to run `buildscripts/clang_format.py format-my` or `bazel run format` after using it to +modify any C++ files. Example usage: @@ -178,13 +179,12 @@ Example usage: ### The PR comment generator You can run `modules_poc/mod_diff.py` to output a brief summary of all of the API (including -visibility levels and usages counts) for each file modified in your branch. When putting up a PR -to mark API visibility, you should add a comment with its output to the PR as an aide to -reviewers. The output is intended to be close enough to C++ that you should put it in a -` ```cpp ` block when making your PR comment to make it more readable. You can also -pipe it through `bat -lcpp` to make it colorful locally. Note that it will use the last -scan output, so if you've modified any headers, you should run a rescan prior to running this -tool. +visibility levels and usages counts) for each file modified in your branch. When putting up a PR to +mark API visibility, you should add a comment with its output to the PR as an aide to reviewers. The +output is intended to be close enough to C++ that you should put it in a ` ```cpp ` block when +making your PR comment to make it more readable. You can also pipe it through `bat -lcpp` to make it +colorful locally. Note that it will use the last scan output, so if you've modified any headers, you +should run a rescan prior to running this tool. ## Workflow @@ -198,24 +198,23 @@ The general workflow for each PR will generally be the same: 5. Run [the pr comment generator](#the-pr-comment-generator) to show the APIs that you have marked - Look through this to ensure that everything is as you expect. 6. Put up a PR and include the generated comment in a ` ```cpp ` block - - I suggest keeping PRs small (say, no more than 10 files at a time) so that they are - manageable by reviewers. As an exception it seems reasonable to auto-mark many headers as - private in a single PR, as long as those PRs are separate from those containing any manual - marking. + - I suggest keeping PRs small (say, no more than 10 files at a time) so that they are manageable + by reviewers. As an exception it seems reasonable to auto-mark many headers as private in a + single PR, as long as those PRs are separate from those containing any manual marking. -When first starting to mark a module, I suggest running the [`modules_poc/private_headers.py`](#the-private-header-marker) -script with `--dry-run` (or `-n`) and `--module=YOUR_MODULE`. For larger modules (in particular, -the `query` mega module) you may want to pass a `--glob` so that you can focus on a smaller -subset of the code initially. That will give you an overview of the files that are used from -outside your module (which contain defacto public APIs today) and those that do not (which can -automatically be marked as private implementation details). +When first starting to mark a module, I suggest running the +[`modules_poc/private_headers.py`](#the-private-header-marker) script with `--dry-run` (or `-n`) and +`--module=YOUR_MODULE`. For larger modules (in particular, the `query` mega module) you may want to +pass a `--glob` so that you can focus on a smaller subset of the code initially. That will give you +an overview of the files that are used from outside your module (which contain defacto public APIs +today) and those that do not (which can automatically be marked as private implementation details). -If all of the defacto private headers seem like they should be private, you can remove the -dry-run flag to have it automatically mark them as private. Be sure to validate that their -contents are actually intended to be private. Remember that the point of having a human doing -the marking is to ensure that we correctly capture intent. You can optionally mark implementation -details within each header as `FILE_PRIVATE`, if you would like to prevent them from being used -elsewhere even within the module. +If all of the defacto private headers seem like they should be private, you can remove the dry-run +flag to have it automatically mark them as private. Be sure to validate that their contents are +actually intended to be private. Remember that the point of having a human doing the marking is to +ensure that we correctly capture intent. You can optionally mark implementation details within each +header as `FILE_PRIVATE`, if you would like to prevent them from being used elsewhere even within +the module. You can then open [the browser](#the-browser) (`modules_poc/browse.py`) to look at the remaining headers. It will show you what is used and from where. It will be particularly useful for things @@ -229,137 +228,136 @@ that seem like they should be private, but are being used externally. `modules_poc/modules.yaml` to move them. 2. If there is already a public API that callers should use instead, mark it as `USE_REPLACEMENT(better_api)`. The argument accepts any C++ tokens, but the intent is where - possible to use the name of the replacement. This will generate a ticket for all teams using - that code. + possible to use the name of the replacement. This will generate a ticket for all teams using that + code. 1. If there are very few users, consider just cleaning them up. -3. Reconsider making this API public if other modules need its functionality, and this is - the only way to get it. -4. Otherwise, if there is no public API that fulfills the needs of the callers, but you - don't want the current API to remain public long-term, use `NEEDS_REPLACEMENT`. This will - generate a ticket for the team that owns that code. - 1. If the API was "obviously" intended to be private (eg it is in a `details` namespace) - and callers would be reasonably able to implement the functionality themselves, possibly - by writing their own version, it seems acceptable to use +3. Reconsider making this API public if other modules need its functionality, and this is the only + way to get it. +4. Otherwise, if there is no public API that fulfills the needs of the callers, but you don't want + the current API to remain public long-term, use `NEEDS_REPLACEMENT`. This will generate a ticket + for the team that owns that code. + 1. If the API was "obviously" intended to be private (eg it is in a `details` namespace) and + callers would be reasonably able to implement the functionality themselves, possibly by + writing their own version, it seems acceptable to use `USE_REPLACEMENT(do not use internal details)` ## Caveats and Limitations -**OVERARCHING GUIDELINE**: Always try to mark declarations correctly according to intent, -even if it will not be enforced by the current tooling. This is both to provide the correct -information to human readers, as well as to avoid issues if we improve the tooling in the -future to eliminate these limitations +**OVERARCHING GUIDELINE**: Always try to mark declarations correctly according to intent, even if it +will not be enforced by the current tooling. This is both to provide the correct information to +human readers, as well as to avoid issues if we improve the tooling in the future to eliminate these +limitations -The rest of this section is fairly technical and probably not necessary for most readers unless -they notice something "weird" going on and want to dive into why. Most of these limitations are -more likely to affect the core modules since most of the rest of our code does not expose APIs -via macros and templates or have APIs only consumed by templates, and those are where most of -these issues come up. +The rest of this section is fairly technical and probably not necessary for most readers unless they +notice something "weird" going on and want to dive into why. Most of these limitations are more +likely to affect the core modules since most of the rest of our code does not expose APIs via macros +and templates or have APIs only consumed by templates, and those are where most of these issues come +up. -- We do not track usages of namespaces at all, only the declarations within namespaces. When - a namespace is marked with a visibility, it does not affect the visibility of the namespace - itself (since it doesn't have one), it sets the default visibility for all declarations within - **that namespace block**. Each time a namespace is reopened it is a separate block and the - visibility markers on other blocks of the same namespace do not apply. -- The scanner only knows about declarations that it sees being used. For implementation reasons, - it only discovers declarations by seeing what every usage is using. This can either cause or be +- We do not track usages of namespaces at all, only the declarations within namespaces. When a + namespace is marked with a visibility, it does not affect the visibility of the namespace itself + (since it doesn't have one), it sets the default visibility for all declarations within **that + namespace block**. Each time a namespace is reopened it is a separate block and the visibility + markers on other blocks of the same namespace do not apply. +- The scanner only knows about declarations that it sees being used. For implementation reasons, it + only discovers declarations by seeing what every usage is using. This can either cause or be caused by other limitations. -- Usages in templates may not be seen. This is especially the case for "dependent types and - values" which are things that are not known by the compiler before the template is instantiated. - - This is a problem for functions where any arguments are dependent if it can't figure out - which overload will be selected. It is even worse for free-functions called unqualified - (`f(blah)` rather than `ns::f(blah)` or `x.f(blah)`) since due to ADL, overload resolution - is _always_ delayed for them. -- Everything that results from a macro expansion is treated as-if it was written at the point - of expansion. This applies to both declarations and usages. If you have an API that should - only be used via the defined macros, mark it as `MOD_PUBLIC_FOR_TECHNICAL_REASONS` to signal - to readers that they should avoid direct usage, even if the tooling won't prevent it. We may - improve this in the future. -- Template variables are completely ignored due to some unfortunate clang bugs. Still, try - to mark them correctly since we may change this in the future. +- Usages in templates may not be seen. This is especially the case for "dependent types and values" + which are things that are not known by the compiler before the template is instantiated. + - This is a problem for functions where any arguments are dependent if it can't figure out which + overload will be selected. It is even worse for free-functions called unqualified (`f(blah)` + rather than `ns::f(blah)` or `x.f(blah)`) since due to ADL, overload resolution is _always_ + delayed for them. +- Everything that results from a macro expansion is treated as-if it was written at the point of + expansion. This applies to both declarations and usages. If you have an API that should only be + used via the defined macros, mark it as `MOD_PUBLIC_FOR_TECHNICAL_REASONS` to signal to readers + that they should avoid direct usage, even if the tooling won't prevent it. We may improve this in + the future. +- Template variables are completely ignored due to some unfortunate clang bugs. Still, try to mark + them correctly since we may change this in the future. - Method calls are assigned to the static type at the call site. This has two important effects: - - A subclass's overridden method may seem unused if it is only used via calls through a base - class pointer/reference - - Calls through a base class pointer/reference count as calls of that class's method, not of - the interface's -- Defaulted members (methods, ctors, dtors) are treated as usages of the class itself, - regardless of whether they implicitly or explicitly defaulted. This is because clang does not - provide an API to distinguish between those cases. -- Template normalization woes: we try really hard to report declarations as the template - `foo` rather than separate instantiations like `foo`, `foo`, etc, **unless** - they are explicitly specialized, meaning that the instantiation has its own definition different - from the main template. Unfortunately, clang does a bad job at this and we have a number of - kludgy workarounds. The most important effects: - - Explicit specializations of function and variable templates are ignored and always converted - to the primary template. + - A subclass's overridden method may seem unused if it is only used via calls through a base class + pointer/reference + - Calls through a base class pointer/reference count as calls of that class's method, not of the + interface's +- Defaulted members (methods, ctors, dtors) are treated as usages of the class itself, regardless of + whether they implicitly or explicitly defaulted. This is because clang does not provide an API to + distinguish between those cases. +- Template normalization woes: we try really hard to report declarations as the template `foo` + rather than separate instantiations like `foo`, `foo`, etc, **unless** they are + explicitly specialized, meaning that the instantiation has its own definition different from the + main template. Unfortunately, clang does a bad job at this and we have a number of kludgy + workarounds. The most important effects: + - Explicit specializations of function and variable templates are ignored and always converted to + the primary template. - We do treat explicit specializations of types as separate (using the heuristic of having a separate location than the main template), because they can have a different shape and API than the main template. In general they should probably have the same visibility though, unless the instantiation is using a private type which should be unavailable to consumers anyway. - - Clang assigns many locations to the site of explicit template instantiations and extern - template declarations, even when there is a better location that it can see. Luckily these - are fairly rare. + - Clang assigns many locations to the site of explicit template instantiations and extern template + declarations, even when there is a better location that it can see. Luckily these are fairly + rare. - Sometimes clang reports the resolved destination of `using` declarations and type alias, but - usually it reports the `using` declaration itself. A few notable cases (these are trends and - may not be absolute\!) + usually it reports the `using` declaration itself. A few notable cases (these are trends and may + not be absolute\!) - `using Base::foo;` to expose a member of a base class is resolved as a usage of `Base::foo` - rather than `Derived::foo`. This is especially notable when the `Base` class is intended to be - a private implementation detail. You will need to mark all exposed methods as public. - - `using Base::Base;` to pull in the base constructors is the opposite and is recorded as a - usage of `Derived::Base(args)`, which is odd because such a declaration doesn't actually exist. + rather than `Derived::foo`. This is especially notable when the `Base` class is intended to be a + private implementation detail. You will need to mark all exposed methods as public. + - `using Base::Base;` to pull in the base constructors is the opposite and is recorded as a usage + of `Derived::Base(args)`, which is odd because such a declaration doesn't actually exist. - Internal/details namespaces (currently defined as matching the regex `(detail|internal)s?$`) implicitly have implicit default visibility of private if `modules.h` is included. It is not possible to give the namespace a public visibility, but you can restrict it further with - `FILE_PRIVATE`. If you want declarations inside it to be usable from outside your module you - must mark children of the namespace explicitly, or rename it to not use a name that implies - that it is for internal usage only. A somewhat common case will be marking internal declarations - that are only intended to be used via macros with `PUBLIC_FOR_TECHNICAL_REASONS`. -- Be very careful with forward declarations. Try to avoid them wherever possible (unless there - is a significant benefit). Especially avoid forward declaring anything from another module\! - Where forward declarations must be used, make sure that they have the same visibility as the - real definition. As an exception, if every TU that sees the forward declaration will also see - the definition it is OK to omit marking the forward definition. This may happen when they are - both in the same header, or the forward declaration is in a private implementation detail header - which is included by the defining header. Be aware of the implicit visibility marking which also - applies to forward declaration, if they are the only declaration seen in the TU. - - Never forward declare functions to avoid including a header. They are much more problematic - than types, both in general in C++ and specifically for this tooling. -- We try to use the definition location for types defined in headers, but the "canonical" - location (clang's term for the first declaration seen in the current TU) for everything else. - If the type is defined in a .cpp, we use the canonical location. + `FILE_PRIVATE`. If you want declarations inside it to be usable from outside your module you must + mark children of the namespace explicitly, or rename it to not use a name that implies that it is + for internal usage only. A somewhat common case will be marking internal declarations that are + only intended to be used via macros with `PUBLIC_FOR_TECHNICAL_REASONS`. +- Be very careful with forward declarations. Try to avoid them wherever possible (unless there is a + significant benefit). Especially avoid forward declaring anything from another module\! Where + forward declarations must be used, make sure that they have the same visibility as the real + definition. As an exception, if every TU that sees the forward declaration will also see the + definition it is OK to omit marking the forward definition. This may happen when they are both in + the same header, or the forward declaration is in a private implementation detail header which is + included by the defining header. Be aware of the implicit visibility marking which also applies to + forward declaration, if they are the only declaration seen in the TU. + - Never forward declare functions to avoid including a header. They are much more problematic than + types, both in general in C++ and specifically for this tooling. +- We try to use the definition location for types defined in headers, but the "canonical" location + (clang's term for the first declaration seen in the current TU) for everything else. If the type + is defined in a .cpp, we use the canonical location. - We only consider declarations in headers, never in .cpp files. - Be mindful of `_forTest` functions. They default to `FILE_PRIVATE` since they are typically - intended only for use when testing the type they are defined on, not when testing consumers. - In the cases where they _are_ intended as part of the API for testing consumers, you can - explicitly mark them `PUBLIC` or `PRIVATE` depending on whether they should be usable from - outside your module or not. -- Things used implicitly (eg implicit conversion operators) are still counted as usages even - if they are not specifically named at the call site -- When merging information from multiple TUs, definitions always replace the metadata gathered - from TUs that only saw a declaration. - - Note that we aren't guaranteed to see every definition, in particular for functions that - are not called from the TU that they are defined in. So this cannot be used to find places - where we deleted the definition but forgot to delete the declaration (we wouldn't see them - anyway, since we only track things that are used, and undefined things can't really be used, - except trivially, without breaking the build). -- `private` members of classes are implicitly `PRIVATE`, and must be explicitly marked otherwise - if desired. They should probably never be made `PUBLIC` since that implies cross-module - friendship. In the few places where we have that today, they have been made one of the flavors - of unfortunately public: `NEEDS_REPLACEMENT` or `USE_INSTEAD`. - - `public` members of `private` types do not inherit the implicit `PRIVATE` and follow the - normal rule of looking for their nearest semantic parent with an explicit marker. That means - that they may be `PUBLIC`. However, the language rules still apply and as long as an - instance of the type is never handed to consumers they will have no way of accessing those - members. + intended only for use when testing the type they are defined on, not when testing consumers. In + the cases where they _are_ intended as part of the API for testing consumers, you can explicitly + mark them `PUBLIC` or `PRIVATE` depending on whether they should be usable from outside your + module or not. +- Things used implicitly (eg implicit conversion operators) are still counted as usages even if they + are not specifically named at the call site +- When merging information from multiple TUs, definitions always replace the metadata gathered from + TUs that only saw a declaration. + - Note that we aren't guaranteed to see every definition, in particular for functions that are not + called from the TU that they are defined in. So this cannot be used to find places where we + deleted the definition but forgot to delete the declaration (we wouldn't see them anyway, since + we only track things that are used, and undefined things can't really be used, except trivially, + without breaking the build). +- `private` members of classes are implicitly `PRIVATE`, and must be explicitly marked otherwise if + desired. They should probably never be made `PUBLIC` since that implies cross-module friendship. + In the few places where we have that today, they have been made one of the flavors of + unfortunately public: `NEEDS_REPLACEMENT` or `USE_INSTEAD`. + - `public` members of `private` types do not inherit the implicit `PRIVATE` and follow the normal + rule of looking for their nearest semantic parent with an explicit marker. That means that they + may be `PUBLIC`. However, the language rules still apply and as long as an instance of the type + is never handed to consumers they will have no way of accessing those members. - `protected` members do not default to `PRIVATE`, but because we only allow subclassing from `OPEN` classes, the language visibility rules will disallow access from outside the module - unless you choose to allow it by use `OPEN` classes or `friend`s. Note that making any - subclass `OPEN` exposes all `protected` members of parents unless they are marked `PRIVATE`. -- `friend` declarations are mostly ignored, except when they are a definition. So the - definitions using the "hidden friend" pattern are tracked, but we ignore it if the definition - is in a cpp file. + unless you choose to allow it by use `OPEN` classes or `friend`s. Note that making any subclass + `OPEN` exposes all `protected` members of parents unless they are marked `PRIVATE`. +- `friend` declarations are mostly ignored, except when they are a definition. So the definitions + using the "hidden friend" pattern are tracked, but we ignore it if the definition is in a cpp + file. [^1]: - Clang distinguishes between "semantic" and "lexical" parents. The primary differences - are that members of classes (including member types) are semantic children of the class even - when defined out of line, and conversely `friend` declarations are not, and instead are - considered semantic children of the nearest namespace. + Clang distinguishes between "semantic" and "lexical" parents. The primary differences are that + members of classes (including member types) are semantic children of the class even when defined + out of line, and conversely `friend` declarations are not, and instead are considered semantic + children of the nearest namespace. diff --git a/docs/owners/allowed_unowned_files_format.md b/docs/owners/allowed_unowned_files_format.md index 045588666bf..8b2e4ffaca2 100644 --- a/docs/owners/allowed_unowned_files_format.md +++ b/docs/owners/allowed_unowned_files_format.md @@ -2,15 +2,20 @@ ## ALLOWED_UNOWNED_FILES.yml File Format -This file is for repos that require all files be owned. Some files may be listed here as an exception and will be added to the end of the CODEOWNERS. +This file is for repos that require all files be owned. Some files may be listed here as an +exception and will be added to the end of the CODEOWNERS. -`version` is the current version of the `ALLOWED_UNOWNED_FILES.yml` file format. The only version is `1.0.0`. +`version` is the current version of the `ALLOWED_UNOWNED_FILES.yml` file format. The only version is +`1.0.0`. `filters` are a list of filters that each have a `filter` and `justificaiton` field. -`filter` is a file path. This file path must start with a `/` and is relative to the root repo directory. Directories or globs are not supported at the moment to ensure careful selection of files allowed to be unowned. This can be reconsidered if proper usecases appear. +`filter` is a file path. This file path must start with a `/` and is relative to the root repo +directory. Directories or globs are not supported at the moment to ensure careful selection of files +allowed to be unowned. This can be reconsidered if proper usecases appear. -`justification` is the reason why this file should be unowned. A common case is that this is a generated file that has checks in CI to ensure it is in the correct format. +`justification` is the reason why this file should be unowned. A common case is that this is a +generated file that has checks in CI to ensure it is in the correct format. ### Example file @@ -23,7 +28,8 @@ filters: # List of all filters ### Configuration -This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your `.bazelrc` file: +This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your +`.bazelrc` file: ``` common --define codeowners_have_allowed_unowned_files=True diff --git a/docs/owners/banned_codeowners_format.md b/docs/owners/banned_codeowners_format.md index d6a826f48b2..3263d09bf95 100644 --- a/docs/owners/banned_codeowners_format.md +++ b/docs/owners/banned_codeowners_format.md @@ -15,7 +15,8 @@ Banned owners should be separated by newlines. Empty lines and lines starting wi ### Configuration -This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your `.bazelrc` file: +This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your +`.bazelrc` file: ``` common --define codeowners_have_banned_codeowners=True diff --git a/docs/owners/owners_format.md b/docs/owners/owners_format.md index 362d97a8d06..a7cdfd95889 100644 --- a/docs/owners/owners_format.md +++ b/docs/owners/owners_format.md @@ -1,23 +1,40 @@ # Code Owners -After modifying any OWNERS files, the overall ownership database (`.github/CODEOWNERS`) must be rebuilt. -This is done by running `bazel run codeowners`. +After modifying any OWNERS files, the overall ownership database (`.github/CODEOWNERS`) must be +rebuilt. This is done by running `bazel run codeowners`. ## OWNERS.yml File Format -This is loosely based on [kubernetes](https://www.kubernetes.dev/docs/guide/owners/) and [chromium](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/code_reviews.md) OWNERS files. +This is loosely based on [kubernetes](https://www.kubernetes.dev/docs/guide/owners/) and +[chromium](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/code_reviews.md) OWNERS files. -`version` is the current version of the `OWNERS.yml` file format. The latest version is `2.0.0`. For previous versions, see the [changelog](#owners-changelog). +`version` is the current version of the `OWNERS.yml` file format. The latest version is `2.0.0`. For +previous versions, see the [changelog](#owners-changelog). `aliases` point to yaml files files that list aliases that can be used in this OWNERS.yml file. -`filters` are a list of globs that match [gitignore syntax](https://git-scm.com/docs/gitignore#_pattern_format). The filter must match at least once file and be unique to the file. Each filter must have a list of `approvers`. An approval from any single approver will allow the code to be merged. `NOOWNER` can be specified to mark a filter as unowned. Each filter can optionally have a `metadata` tag. Inside that tag a user can put whatever tags they want. We have reserved two meaningful tags `emeritus_approvers` and `owning_team`. This is not an exhaustive list and more documented and undocumented options can be added later. There is no linting done on the metadata tag. +`filters` are a list of globs that match +[gitignore syntax](https://git-scm.com/docs/gitignore#_pattern_format). The filter must match at +least once file and be unique to the file. Each filter must have a list of `approvers`. An approval +from any single approver will allow the code to be merged. `NOOWNER` can be specified to mark a +filter as unowned. Each filter can optionally have a `metadata` tag. Inside that tag a user can put +whatever tags they want. We have reserved two meaningful tags `emeritus_approvers` and +`owning_team`. This is not an exhaustive list and more documented and undocumented options can be +added later. There is no linting done on the metadata tag. -`emeritus_approvers` are folks that used to be approvers that no longer have approver privileges. This allows us to keep track of folks who built up a knowledge base of this code that might need to be consulted in a critical situation. Both `approvers` and `emeritus_approvers` should be either github usernames, emails, or aliases. +`emeritus_approvers` are folks that used to be approvers that no longer have approver privileges. +This allows us to keep track of folks who built up a knowledge base of this code that might need to +be consulted in a critical situation. Both `approvers` and `emeritus_approvers` should be either +github usernames, emails, or aliases. -`owning_team` is a team that owns the files, however this team does not have approval privileges. Instead this team should be looked to for asking questions. This metadata can also be used programmatically to, for example, generate a report of all the files owned by a particular team, even though that team has nominated specific engineers as approvers. +`owning_team` is a team that owns the files, however this team does not have approval privileges. +Instead this team should be looked to for asking questions. This metadata can also be used +programmatically to, for example, generate a report of all the files owned by a particular team, +even though that team has nominated specific engineers as approvers. -`options` are not required and are various options about how to use this OWNERS.yml file. Currently there is only a single option `no_parent_owners` which is defaulted to false. If this option is set to true it will stop upwards OWNERS resolution. +`options` are not required and are various options about how to use this OWNERS.yml file. Currently +there is only a single option `no_parent_owners` which is defaulted to false. If this option is set +to true it will stop upwards OWNERS resolution. ### Example file @@ -70,7 +87,8 @@ options: # All options for this file `version` is the current version of the aliases file format. This should always be `1.0.0`. -`aliases` are a list of group names. Each group name must have one or more reviewers. Reviewers should be github usernames. +`aliases` are a list of group names. Each group name must have one or more reviewers. Reviewers +should be github usernames. ## Example File @@ -133,18 +151,26 @@ filters: ### Example 1 -If someone changes `a/b/c/file.py` the owner resolution will select teamC since the first file searched is `a/b/c/OWNERS.yml` First we compare if `file.py` matches `*.md`. It does not so we now check if `file.py` matches `*`. It does match so teamC is selected for review. +If someone changes `a/b/c/file.py` the owner resolution will select teamC since the first file +searched is `a/b/c/OWNERS.yml` First we compare if `file.py` matches `*.md`. It does not so we now +check if `file.py` matches `*`. It does match so teamC is selected for review. ### Example 2 -If someone changes `a/b/c/file.yaml` the owner resolution will not find a team. The first file searched is `a/b/c/OWNERS.yml`. No filters match file.yaml. Next we search in `a/b/OWNERS.yml`. No filters match there either. We stop searching up because `no_parent_owners` is set to true. +If someone changes `a/b/c/file.yaml` the owner resolution will not find a team. The first file +searched is `a/b/c/OWNERS.yml`. No filters match file.yaml. Next we search in `a/b/OWNERS.yml`. No +filters match there either. We stop searching up because `no_parent_owners` is set to true. ## OWNERS Changelog ### v2.0.0 -See the [previous version](https://github.com/mongodb/mongo/blob/79590effe86c471cc15d91c6785599ec2085d7c0/docs/owners/owners_format.md) of this documentation for details on v1.0.0. +See the +[previous version](https://github.com/mongodb/mongo/blob/79590effe86c471cc15d91c6785599ec2085d7c0/docs/owners/owners_format.md) +of this documentation for details on v1.0.0. -Patterns without a slash are no longer prepended with `**/` to make them apply recursively. If you want your pattern you apply recursively you must add the `**/` yourself now. +Patterns without a slash are no longer prepended with `**/` to make them apply recursively. If you +want your pattern you apply recursively you must add the `**/` yourself now. -The `*` pattern is now resolved as the directory name to ensure it applies recursively by default. You can use the `/*` pattern to only match inside the current directory. +The `*` pattern is now resolved as the directory name to ensure it applies recursively by default. +You can use the `/*` pattern to only match inside the current directory. diff --git a/docs/parsing_stack_traces.md b/docs/parsing_stack_traces.md index babe17023c7..a31178818ee 100644 --- a/docs/parsing_stack_traces.md +++ b/docs/parsing_stack_traces.md @@ -12,16 +12,16 @@ To find the correct binary for a specific log you need to: curl -O http://s3.amazonaws.com/downloads.mongodb.org/linux/mongodb-linux-x86_64-debugsymbols-1.x.x.tgz ``` -You can also get the debugsymbols archive for official builds through [the Downloads page][1]. In the -Archived Releases section, click on the appropriate platform link to view the available archives. -Select the appropriate debug symbols archive. +You can also get the debugsymbols archive for official builds through [the Downloads page][1]. In +the Archived Releases section, click on the appropriate platform link to view the available +archives. Select the appropriate debug symbols archive. ## Using mongosymb.py to get file and line numbers -Stacktraces are logged on a line with `msg` `BACKTRACE`. The full backtrace contents are available in -an attribute named `bt`. To convert this into a list of source locations with file and line numbers, -copy the contents of the `bt` JSON blob into a file, then direct the contents of that file into -the standard input of `buildscripts/mongosymb.py`: +Stacktraces are logged on a line with `msg` `BACKTRACE`. The full backtrace contents are available +in an attribute named `bt`. To convert this into a list of source locations with file and line +numbers, copy the contents of the `bt` JSON blob into a file, then direct the contents of that file +into the standard input of `buildscripts/mongosymb.py`: ``` cat bt | buildscripts/mongosymb.py --debug-file-resolver=path path/to/debug/symbols/file @@ -55,8 +55,8 @@ $ cat bt | buildscripts/mongosymb.py --debug-file-resolver=path bazel-bin/instal ## Stack Trace Schema -Stack traces are typically logged as log message 31380, having a `bt` attribute -that holds a JSON object value: +Stack traces are typically logged as log message 31380, having a `bt` attribute that holds a JSON +object value: ```json "bt": { @@ -86,10 +86,9 @@ that holds a JSON object value: } ``` -The "processInfo" subobject has other information about the process, but -the most important thing for the stack trace is the "somap", which is an -array of all dynamically linked ELF files, including the main executable, -and where in memory they were loaded. +The "processInfo" subobject has other information about the process, but the most important thing +for the stack trace is the "somap", which is an array of all dynamically linked ELF files, including +the main executable, and where in memory they were loaded. Partial example showing a few typical frames: diff --git a/docs/poetry_execution.md b/docs/poetry_execution.md index 7b524b808d6..ab18a9ec8c0 100644 --- a/docs/poetry_execution.md +++ b/docs/poetry_execution.md @@ -2,27 +2,55 @@ ## Project Impetus -We frequently encounter Python errors that are caused by a python dependency author updating their package that is backward breaking. The following tickets are a few examples of this happening: -[SERVER-79126](https://jira.mongodb.org/browse/SERVER-79126), [SERVER-79798](https://jira.mongodb.org/browse/SERVER-79798), [SERVER-53348](https://jira.mongodb.org/browse/SERVER-53348), [SERVER-57036](https://jira.mongodb.org/browse/SERVER-57036), [SERVER-44579](https://jira.mongodb.org/browse/SERVER-44579), [SERVER-70845](https://jira.mongodb.org/browse/SERVER-70845), [SERVER-63974](https://jira.mongodb.org/browse/SERVER-63974), [SERVER-61791](https://jira.mongodb.org/browse/SERVER-61791), and [SERVER-60950](https://jira.mongodb.org/browse/SERVER-60950). We have always known this was a problem and have known there was a way to fix it. We finally had the bandwidth to tackle this problem. +We frequently encounter Python errors that are caused by a python dependency author updating their +package that is backward breaking. The following tickets are a few examples of this happening: +[SERVER-79126](https://jira.mongodb.org/browse/SERVER-79126), +[SERVER-79798](https://jira.mongodb.org/browse/SERVER-79798), +[SERVER-53348](https://jira.mongodb.org/browse/SERVER-53348), +[SERVER-57036](https://jira.mongodb.org/browse/SERVER-57036), +[SERVER-44579](https://jira.mongodb.org/browse/SERVER-44579), +[SERVER-70845](https://jira.mongodb.org/browse/SERVER-70845), +[SERVER-63974](https://jira.mongodb.org/browse/SERVER-63974), +[SERVER-61791](https://jira.mongodb.org/browse/SERVER-61791), and +[SERVER-60950](https://jira.mongodb.org/browse/SERVER-60950). We have always known this was a +problem and have known there was a way to fix it. We finally had the bandwidth to tackle this +problem. ## Project Prework -First, we wanted to test out using poetry so we converted mongo-container project to use poetry [SERVER-76974](https://jira.mongodb.org/browse/SERVER-76974). This showed promise and we considered this a green light to move forward on converting the server python to use poetry. +First, we wanted to test out using poetry so we converted mongo-container project to use poetry +[SERVER-76974](https://jira.mongodb.org/browse/SERVER-76974). This showed promise and we considered +this a green light to move forward on converting the server python to use poetry. -Before we could start the project we had to upgrade python to a version that was not EoL. This work is captured in [SERVER-72262](https://jira.mongodb.org/browse/SERVER-72262). We upgraded python to 3.10 on every system except windows. Windows could not be upgraded due to a test problem relating to some cipher suites [SERVER-79172](https://jira.mongodb.org/browse/SERVER-79172). +Before we could start the project we had to upgrade python to a version that was not EoL. This work +is captured in [SERVER-72262](https://jira.mongodb.org/browse/SERVER-72262). We upgraded python to +3.10 on every system except windows. Windows could not be upgraded due to a test problem relating to +some cipher suites [SERVER-79172](https://jira.mongodb.org/browse/SERVER-79172). ## Conversion to Poetry -After the prework was done we wrote, tested, and merged [SERVER-76751](https://jira.mongodb.org/browse/SERVER-76751) which is converting the mongo python dependencies to poetry. This ticket had an absurd amount of dependencies and required a significant amount of patch builds. The total number of changes was pretty small but it affected a lot of different projects. +After the prework was done we wrote, tested, and merged +[SERVER-76751](https://jira.mongodb.org/browse/SERVER-76751) which is converting the mongo python +dependencies to poetry. This ticket had an absurd amount of dependencies and required a significant +amount of patch builds. The total number of changes was pretty small but it affected a lot of +different projects. -Knowing there was a lot this touched we expected to see some bugs and were quick to try to fix them. Some of these were caught before merge and some were caught after. +Knowing there was a lot this touched we expected to see some bugs and were quick to try to fix them. +Some of these were caught before merge and some were caught after. -[BUILD-17860](https://jira.mongodb.org/browse/BUILD-17860) required the build team to rebuild python on macosx arm. This was caught before merging. +[BUILD-17860](https://jira.mongodb.org/browse/BUILD-17860) required the build team to rebuild python +on macosx arm. This was caught before merging. -[SERVER-81122](https://jira.mongodb.org/browse/SERVER-81122) found that poetry broke the spawnhost script. This was caught after merge. +[SERVER-81122](https://jira.mongodb.org/browse/SERVER-81122) found that poetry broke the spawnhost +script. This was caught after merge. -[SERVER-81061](https://jira.mongodb.org/browse/SERVER-81061) and [BF-29909](https://jira.mongodb.org/browse/BF-29909) were found by sys-perf since they run their own build and do not use the standard build process. Therefore it was very hard to test for this one. This was caught post merge. +[SERVER-81061](https://jira.mongodb.org/browse/SERVER-81061) and +[BF-29909](https://jira.mongodb.org/browse/BF-29909) were found by sys-perf since they run their own +build and do not use the standard build process. Therefore it was very hard to test for this one. +This was caught post merge. -[SERVER-80799](https://jira.mongodb.org/browse/SERVER-80799) found that poetry broke mongo tooling metrics collection (not OTel). This was only found since an engineer on the team saw this bug in the code. This was caught post merge. +[SERVER-80799](https://jira.mongodb.org/browse/SERVER-80799) found that poetry broke mongo tooling +metrics collection (not OTel). This was only found since an engineer on the team saw this bug in the +code. This was caught post merge. Overall, when changing something so foundational it is inevitable that some things will break. diff --git a/docs/primary_only_service.md b/docs/primary_only_service.md index 4140e8e2bad..fdc7f282743 100644 --- a/docs/primary_only_service.md +++ b/docs/primary_only_service.md @@ -1,10 +1,10 @@ # PrimaryOnlyService The PrimaryOnlyService machinery provides a way to register tasks that should run only when current -node is Primary, and should be driven to completion across replica set failovers on the new -Primary. It is intended to be used by tasks that can be modeled as a state machine with a single -MongoDB document containing the current state, which newly-elected Primaries can use to rebuild the -state of the task after failover and pick up where the old Primary left off. +node is Primary, and should be driven to completion across replica set failovers on the new Primary. +It is intended to be used by tasks that can be modeled as a state machine with a single MongoDB +document containing the current state, which newly-elected Primaries can use to rebuild the state of +the task after failover and pick up where the old Primary left off. ## Classes @@ -62,16 +62,17 @@ what state it is in and thus what work still needs to be performed, and what wor completed by the previous Primary. To see an example bare-bones PrimaryOnlyService implementation to use as a reference, check out the -TestService defined in this unit test: https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp +TestService defined in this unit test: +https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp ## Behavior during state transitions At stepUp, each PrimaryOnlyService queries its state document collection, and for each document -found, creates and launches a PrimaryOnlyService::Instance initialized off of the state -document. This happens asynchronously relative to the core replication stepUp process - there is no -guarantee that when stepUp completes and the RSTL lock is dropped that the PrimaryOnlyServices have -finished rebuilding all their Instances. At stepDown all Instances are interrupted, but the threads -running their work are not joined, and the Instance objects containing their in-memory state are not +found, creates and launches a PrimaryOnlyService::Instance initialized off of the state document. +This happens asynchronously relative to the core replication stepUp process - there is no guarantee +that when stepUp completes and the RSTL lock is dropped that the PrimaryOnlyServices have finished +rebuilding all their Instances. At stepDown all Instances are interrupted, but the threads running +their work are not joined, and the Instance objects containing their in-memory state are not released, until the next stepUp. This is done to reduce the likelihood of blocking within the state transition process and delaying it for the entire node. This behavior does, however, guarantee that there will never be two Instances of the same PrimaryOnlyService with the same InstanceID running at diff --git a/docs/priority_port.md b/docs/priority_port.md index ab915bb4ebd..c5ab721302b 100644 --- a/docs/priority_port.md +++ b/docs/priority_port.md @@ -1,11 +1,14 @@ # Priority port support -`mongod` and `mongos` support a dedicated **priority port** intended for **internal, high-priority operations** such as automation monitoring, MongoTune, and critical intra-cluster replication traffic. +`mongod` and `mongos` support a dedicated **priority port** intended for **internal, high-priority +operations** such as automation monitoring, MongoTune, and critical intra-cluster replication +traffic. With a priority port configured: - The database listens on a second TCP port in addition to the main port. -- Connections accepted on the priority port are exempt from connection limits, connection establishment rate limiting, and ingress request rate limiting. +- Connections accepted on the priority port are exempt from connection limits, connection + establishment rate limiting, and ingress request rate limiting. - gRPC is not supported. The feature is **disabled by default**. @@ -35,7 +38,8 @@ net: When the transport layer starts: - A **separate listener thread** is created for the priority port in the ASIO transport layer. -- Sessions created from the priority port are tagged so downstream code can distinguish them from main-port sessions (similar to the load balancer port implementation). +- Sessions created from the priority port are tagged so downstream code can distinguish them from + main-port sessions (similar to the load balancer port implementation). --- @@ -47,27 +51,33 @@ Priority-port connections differ from normal connections in several ways. When a new connection is accepted: -- Connections from the priority port are treated as **limit-exempt** in the session manager, reusing the existing exemption machinery used for CIDR-based exemptions. +- Connections from the priority port are treated as **limit-exempt** in the session manager, reusing + the existing exemption machinery used for CIDR-based exemptions. - These connections can continue to be created even when the normal connection limit is reached. Metrics: - `serverStatus.connections.priority` counts current connections on the priority port only. -- These connections are also included in `connections.limitExempt` (along with CIDR-based exemptions). +- These connections are also included in `connections.limitExempt` (along with CIDR-based + exemptions). ## Rate limiters Two ingress-side rate limiters recognize priority-port exemptions: -- [**SessionEstablishmentRateLimiter**](../src/mongo/db/admission/README.md#session-establishment-rate-limiter) (connection establishment) -- [**IngressRequestRateLimiter**](../src/mongo/db/admission/README.md#ingress-request-rate-limiting) (request rate limiting) +- [**SessionEstablishmentRateLimiter**](../src/mongo/db/admission/README.md#session-establishment-rate-limiter) + (connection establishment) +- [**IngressRequestRateLimiter**](../src/mongo/db/admission/README.md#ingress-request-rate-limiting) + (request rate limiting) ## Logging and profiling -For observability and debugging, the server records whether an operation came through the priority port: +For observability and debugging, the server records whether an operation came through the priority +port: - `CurOp` / currentOp output includes a flag indicating the connection is from the priority port. -- Slow query log and profiler entries include whether the operation was executed via a priority-port connection. +- Slow query log and profiler entries include whether the operation was executed via a priority-port + connection. - Client summary reports also distinguish clients on the main vs priority port. --- @@ -79,7 +89,8 @@ For observability and debugging, the server records whether an operation came th To connect to a replica set via the priority port, a user must: - Use a connection string that points directly at a specific host and priority port. -- Set `directConnection=true` to disable SDAM and prevent the driver from using hello-based host discovery, which currently does not advertise the priority port. +- Set `directConnection=true` to disable SDAM and prevent the driver from using hello-based host + discovery, which currently does not advertise the priority port. Example: @@ -92,11 +103,14 @@ mongodb://hostA:27018/?directConnection=true For `mongos`: - You may connect directly to the `mongos` priority port. -- `directConnection=true` is **not required** for `mongos` connections, since SDAM is not used in the same way. +- `directConnection=true` is **not required** for `mongos` connections, since SDAM is not used in + the same way. Important limitation: - **Priority does not automatically propagate**: - - If a client connects to a `mongos` via the priority port and `mongos` forwards a command to shards, those shard-side connections still use the main ports and do **not** inherit priority-port behavior in the current implementation. + - If a client connects to a `mongos` via the priority port and `mongos` forwards a command to + shards, those shard-side connections still use the main ports and do **not** inherit + priority-port behavior in the current implementation. --- diff --git a/docs/server_parameters.md b/docs/server_parameters.md index 2a6005a6262..30af14fa161 100644 --- a/docs/server_parameters.md +++ b/docs/server_parameters.md @@ -37,9 +37,9 @@ Users can set or modify a server parameter at startup and/or runtime, depending specified for `set_at`. For instance, `logLevel` may be set at both startup and runtime, as indicated by `set_at` (see the above code snippet). -At startup, server parameters may be set using the `--setParameter` command line option. -At runtime, the `setParameter` command may be used to modify server parameters. -See the [`setParameter` documentation][set-parameter] for details. +At startup, server parameters may be set using the `--setParameter` command line option. At runtime, +the `setParameter` command may be used to modify server parameters. See the [`setParameter` +documentation][set-parameter] for details. ## How to get the value provided for a parameter @@ -99,27 +99,28 @@ must be unique across the server instance. More information on the specific fiel - `set_at` (required): Must contain the value `startup`, `runtime`, [`startup`, `runtime`], or `cluster`. If `runtime` is specified along with `cpp_varname`, then `decltype(cpp_varname)` must - refer to a thread-safe storage type, specifically: `Atomic`, `std::atomic`, - or `boost::synchronized`. Parameters declared as `cluster` can only be set at runtime and exhibit + refer to a thread-safe storage type, specifically: `Atomic`, `std::atomic`, or + `boost::synchronized`. Parameters declared as `cluster` can only be set at runtime and exhibit numerous differences. See [Cluster Server Parameters](cluster-server-parameters) below. -- `description` (required): Free-form text field currently used only for commenting the generated C++ - code. Future uses may preserve this value for a possible `{listSetParameters:1}` command or other - programmatic and potentially user-facing purposes. +- `description` (required): Free-form text field currently used only for commenting the generated + C++ code. Future uses may preserve this value for a possible `{listSetParameters:1}` command or + other programmatic and potentially user-facing purposes. - `cpp_vartype`: Declares the full storage type. If `cpp_vartype` is not defined, it may be inferred from the C++ variable referenced by `cpp_varname`. -- `cpp_varname`: Declares the underlying variable or C++ `struct` member to use when setting or reading the - server parameter. If defined together with `cpp_vartype`, the storage will be declared as a global - variable, and externed in the generated header file. If defined alone, a variable of this name will - assume to have been declared and defined by the implementer, and its type will be automatically - inferred at compile time. If `cpp_varname` is not defined, then `cpp_class` must be specified. +- `cpp_varname`: Declares the underlying variable or C++ `struct` member to use when setting or + reading the server parameter. If defined together with `cpp_vartype`, the storage will be declared + as a global variable, and externed in the generated header file. If defined alone, a variable of + this name will assume to have been declared and defined by the implementer, and its type will be + automatically inferred at compile time. If `cpp_varname` is not defined, then `cpp_class` must be + specified. - `cpp_class`: Declares a custom `ServerParameter` class in the generated header using the provided string, or the name field in the associated map. The declared class will require an implementation - of `setFromString()`, and optionally `set()`, `append()`, and a constructor. - See [Specialized Server Parameters](#specialized-server-parameters) below. + of `setFromString()`, and optionally `set()`, `append()`, and a constructor. See + [Specialized Server Parameters](#specialized-server-parameters) below. - `default`: String or expression map representation of the initial value. @@ -127,10 +128,10 @@ must be unique across the server instance. More information on the specific fiel This is a required field and must be explicitly set to `false` to disable redaction. - `omit_in_ftdc`: Only applies to cluster parameters. If set to `true`, then the cluster parameter - will be omitted when `getClusterParameter` is invoked with `omitInFTDC: true`. - In practice, FTDC runs `getClusterParameter` with this option periodically to - collect configuration metadata about the server and setting this flag to true - for a cluster parameter ensures that its value(s) will not be exposed in FTDC. + will be omitted when `getClusterParameter` is invoked with `omitInFTDC: true`. In practice, FTDC + runs `getClusterParameter` with this option periodically to collect configuration metadata about + the server and setting this flag to true for a cluster parameter ensures that its value(s) will + not be exposed in FTDC. - `test_only`: Set to `true` to disable this set parameter if `enableTestCommands` is not specified. @@ -141,26 +142,27 @@ must be unique across the server instance. More information on the specific fiel new value has been stored. Prototype: `Status(const cpp_vartype&);` - `condition`: Up to five conditional rules for deciding whether or not to apply this server - parameter. `preprocessor` will be evaluated first, followed by `constexpr`, then finally `expr`. If - no provided setting evaluates to `false`, the server parameter will be registered. `feature_flag` and - `min_fcv` are evaluated after the parameter is registered, and instead affect whether the parameter - is enabled. `min_fcv` is a string of the form `X.Y`, representing the minimum FCV version for which - this parameter should be enabled. `feature_flag` is the name of a feature flag variable upon which - this server parameter depends -- if the feature flag is disabled, this parameter will be disabled. - `feature_flag` should be removed when all other instances of that feature flag are deleted, which - typically is done after the next LTS version of the server is branched. `min_fcv` should be removed - after it is no longer possible to downgrade to a FCV lower than that version - this occurs when the - next LTS version of the server is branched. + parameter. `preprocessor` will be evaluated first, followed by `constexpr`, then finally `expr`. + If no provided setting evaluates to `false`, the server parameter will be registered. + `feature_flag` and `min_fcv` are evaluated after the parameter is registered, and instead affect + whether the parameter is enabled. `min_fcv` is a string of the form `X.Y`, representing the + minimum FCV version for which this parameter should be enabled. `feature_flag` is the name of a + feature flag variable upon which this server parameter depends -- if the feature flag is disabled, + this parameter will be disabled. `feature_flag` should be removed when all other instances of that + feature flag are deleted, which typically is done after the next LTS version of the server is + branched. `min_fcv` should be removed after it is no longer possible to downgrade to a FCV lower + than that version - this occurs when the next LTS version of the server is branched. - `validator`: Zero or many validation rules to impose on the setting. All specified rules must pass - to consider the new setting valid. `lt`, `gt`, `lte`, `gte` fields provide for simple numeric limits - or expression maps which evaluate to numeric values. For all other validation cases, specify - callback as a C++ function or static method. Note that validation rules (including callback) may run - in any order. To perform an action after all validation rules have completed, `on_update` should be - preferred instead. Callback prototype: `Status(const cpp_vartype&, const boost::optional&);` + to consider the new setting valid. `lt`, `gt`, `lte`, `gte` fields provide for simple numeric + limits or expression maps which evaluate to numeric values. For all other validation cases, + specify callback as a C++ function or static method. Note that validation rules (including + callback) may run in any order. To perform an action after all validation rules have completed, + `on_update` should be preferred instead. Callback prototype: + `Status(const cpp_vartype&, const boost::optional&);` -- `is_deprecated`: Mark the server parameter as deprecated. Warns users if the server parameter - is ever used. Defaults to false. +- `is_deprecated`: Mark the server parameter as deprecated. Warns users if the server parameter is + ever used. Defaults to false. Any symbols such as global variables or callbacks used by a server parameter must be imported using the usual IDL machinery via `globals.cpp_includes`. Similarly, all generated code will be nested @@ -240,9 +242,8 @@ to any other work, this custom constructor must invoke its parent's constructor. Status {name}::set(const BSONElement& val, const boost::optional& tenantId); ``` -Otherwise the base class implementation `ServerParameter::set` is used. It -invokes `setFromString` using a string representation of `val`, if the `val` is -holding one of the supported types. +Otherwise the base class implementation `ServerParameter::set` is used. It invokes `setFromString` +using a string representation of `val`, if the `val` is holding one of the supported types. `override_validate`: If `true`, the implementer must provide a `validate` member function as: @@ -261,8 +262,8 @@ must be provided with the following signature: Status {name}::append(OperationContext*, BSONObjBuilder*, StringData, const boost::optional& tenantId); ``` -`override_warn_if_deprecated`: If `true`, allows a custom warnIfDeprecated() method to be defined, defaults -to `false`. +`override_warn_if_deprecated`: If `true`, allows a custom warnIfDeprecated() method to be defined, +defaults to `false`. Lastly, a `setFromString` method must always be provided with the following signature: @@ -318,17 +319,17 @@ preferred to implementing custom parameter propagation whenever possible. `setClusterParameter` persists the new value of the indicated cluster server parameter onto a majority of nodes on non-sharded replica sets. On sharded clusters, it majority-writes the new value -onto every shard and the config server. This ensures that every **mongod** in the cluster will be able -to recover the most recently written value for all cluster server parameters on restart. +onto every shard and the config server. This ensures that every **mongod** in the cluster will be +able to recover the most recently written value for all cluster server parameters on restart. Additionally, `setClusterParameter` blocks until the majority write succeeds in a replica set -deployment, which guarantees that the parameter value will not be rolled back after being set. -In a sharded cluster deployment, the new value has to be majority-committed on the config shard and +deployment, which guarantees that the parameter value will not be rolled back after being set. In a +sharded cluster deployment, the new value has to be majority-committed on the config shard and locally-committed on all other shards. The cluster parameters are persisted in the `config.clusterParameters` collections and cached in -memory on every **mongod**. The cache updates are done by the `ClusterServerParameterOpObserver` class. -Every **mongos** also maintains an in-memory cache by polling the config server for updated cluster -server parameter values every `clusterServerParameterRefreshIntervalSecs` using the +memory on every **mongod**. The cache updates are done by the `ClusterServerParameterOpObserver` +class. Every **mongos** also maintains an in-memory cache by polling the config server for updated +cluster server parameter values every `clusterServerParameterRefreshIntervalSecs` using the `ClusterParameterRefresher` periodic job. `getClusterParameter` returns the cached value of the requested cluster server parameter on the node @@ -347,10 +348,10 @@ following members to the resulting type: was updated; used by runtime audit configuration, and to prevent concurrent and redundant cluster parameter updates. -It is highly recommended to specify validation rules or a callback function via the `param.validator` -field. These validators are called before the new value of the cluster server parameter is written -to disk during `setClusterParameter`. -See [server_parameter_with_storage_test.idl][cluster-server-param-with-storage-test] and +It is highly recommended to specify validation rules or a callback function via the +`param.validator` field. These validators are called before the new value of the cluster server +parameter is written to disk during `setClusterParameter`. See +[server_parameter_with_storage_test.idl][cluster-server-param-with-storage-test] and [server_parameter_with_storage_test_structs.idl][cluster-server-param-with-storage-test-structs] for examples. @@ -394,21 +395,21 @@ Tue `reset()` method must be implemented and should update the cluster server pa default value. All cluster server parameters are tenant-aware, meaning that on serverless clusters, each tenant has -an isolated set of parameters. The `setClusterParameter` and `getClusterParameter` commands will pass -the `tenantId` on the command request to the `ServerParameter`'s methods. On dedicated +an isolated set of parameters. The `setClusterParameter` and `getClusterParameter` commands will +pass the `tenantId` on the command request to the `ServerParameter`'s methods. On dedicated (non-serverless) clusters, `boost::none` will be passed. IDL-defined cluster server parameters will handle the passed-in `tenantId` automatically and store separate parameter values per-tenant. -Specialized server parameters will have to take care to correctly handle the passed-in `tenantId` and -to enforce tenant isolation. +Specialized server parameters will have to take care to correctly handle the passed-in `tenantId` +and to enforce tenant isolation. Like normal server parameters, cluster server parameters can be defined to be dependent on a minimum -FCV version or a specific feature flag using the `condition.min_fcv` and `condition.feature_flag` syntax discussed -above. During FCV downgrade, the cluster parameter value stored on disk will be deleted if either: -(1) The downgraded FCV is lower than the cluster parameter's `min_fcv`, or (2) The cluster -parameter's `feature_flag` is disabled on the downgraded FCV. While a cluster server parameter is -disabled due to either of these conditions, `setClusterParameter` on it will always fail, and -`getClusterParameter` will fail on **mongod**, and return the default value on **mongos** -- this -difference in behavior is due to **mongos** being unaware of the current FCV. +FCV version or a specific feature flag using the `condition.min_fcv` and `condition.feature_flag` +syntax discussed above. During FCV downgrade, the cluster parameter value stored on disk will be +deleted if either: (1) The downgraded FCV is lower than the cluster parameter's `min_fcv`, or (2) +The cluster parameter's `feature_flag` is disabled on the downgraded FCV. While a cluster server +parameter is disabled due to either of these conditions, `setClusterParameter` on it will always +fail, and `getClusterParameter` will fail on **mongod**, and return the default value on **mongos** +-- this difference in behavior is due to **mongos** being unaware of the current FCV. See [server_parameter_specialized_test.idl][specialized-cluster-server-param-test-idl] and [server_parameter_specialized_test.h][specialized-cluster-server-param-test-data] for examples. @@ -582,9 +583,11 @@ classDiagram [parameters.idl]: ../src/mongo/db/commands/parameters.idl [set-parameter]: https://docs.mongodb.com/manual/reference/parameters/#synopsis [get-parameter]: https://docs.mongodb.com/manual/reference/command/getParameter/#getparameter -[quiet-param]: https://github.com/mongodb/mongo/search?q=serverGlobalParams+quiet+extension:idl&type=code +[quiet-param]: + https://github.com/mongodb/mongo/search?q=serverGlobalParams+quiet+extension:idl&type=code [ftdc-file-size-param]: ../src/mongo/db/ftdc/ftdc_server.idl [cluster-server-param-with-storage-test]: ../src/mongo/idl/server_parameter_with_storage_test.idl -[cluster-server-param-with-storage-test-structs]: ../src/mongo/idl/server_parameter_with_storage_test_structs.idl +[cluster-server-param-with-storage-test-structs]: + ../src/mongo/idl/server_parameter_with_storage_test_structs.idl [specialized-cluster-server-param-test-idl]: ../src/mongo/idl/server_parameter_specialized_test.idl [specialized-cluster-server-param-test-data]: ../src/mongo/idl/server_parameter_specialized_test.h diff --git a/docs/test_commands.md b/docs/test_commands.md index 72720cc6e5f..d56a07dbbe9 100644 --- a/docs/test_commands.md +++ b/docs/test_commands.md @@ -1,7 +1,7 @@ # Test Commands -All test commands are denoted with the `.testOnly()` modifier to the `MONGO_REGISTER_COMMAND` invocation. -For example: +All test commands are denoted with the `.testOnly()` modifier to the `MONGO_REGISTER_COMMAND` +invocation. For example: ```c++ MONGO_REGISTER_COMMAND(EchoCommand).testOnly(); @@ -9,9 +9,9 @@ MONGO_REGISTER_COMMAND(EchoCommand).testOnly(); ## How to enable -To be able to run these commands, the server must be started with the `enableTestCommands=1` -server parameter (e.g. `--setParameter enableTestCommands=1`). Resmoke.py often sets this server -parameter for testing. +To be able to run these commands, the server must be started with the `enableTestCommands=1` server +parameter (e.g. `--setParameter enableTestCommands=1`). Resmoke.py often sets this server parameter +for testing. ## Examples diff --git a/docs/testing/README.md b/docs/testing/README.md index 745a997274a..5696a5d5ec1 100644 --- a/docs/testing/README.md +++ b/docs/testing/README.md @@ -1,7 +1,7 @@ # Testing -Most tests for MongoDB are run through resmoke, our test runner and orchestration tool. -The entry point for resmoke can be found at `buildscripts/resmoke.py` +Most tests for MongoDB are run through resmoke, our test runner and orchestration tool. The entry +point for resmoke can be found at `buildscripts/resmoke.py` ## Concepts @@ -9,9 +9,12 @@ Learn more about related topics using their own targeted documentation: - [resmoke](../../buildscripts/resmokelib/README.md), the test runner - [suites](../../buildscripts/resmokeconfig/suites/README.md), how tests are grouped and configured -- [fixtures](../../buildscripts/resmokelib/testing/fixtures/README.md), specify the server topology that tests run against -- [hooks](../../buildscripts/resmokelib/testing/hooks/README.md), logic to run before, after and/or between individual tests -- [testcases](../../buildscripts/resmokelib/testing/testcases/README.md), Python-based unittest interfaces that resmoke can run as different "kinds" of tests. +- [fixtures](../../buildscripts/resmokelib/testing/fixtures/README.md), specify the server topology + that tests run against +- [hooks](../../buildscripts/resmokelib/testing/hooks/README.md), logic to run before, after and/or + between individual tests +- [testcases](../../buildscripts/resmokelib/testing/testcases/README.md), Python-based unittest + interfaces that resmoke can run as different "kinds" of tests. ## Basic Example @@ -35,4 +38,7 @@ Now, **run the test content** from one test file: buildscripts/resmoke.py run --suites=no_passthrough jstests/noPassthrough/shell/js/string.js ``` -The suite defined in [buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml) includes that `string.js` file via glob selections, specifies no fixtures, no hooks, and a minimal config for the executor. +The suite defined in +[buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml) +includes that `string.js` file via glob selections, specifies no fixtures, no hooks, and a minimal +config for the executor. diff --git a/docs/testing/fsm_concurrency_testing_framework.md b/docs/testing/fsm_concurrency_testing_framework.md index 7d99a0daffe..53169fc3363 100644 --- a/docs/testing/fsm_concurrency_testing_framework.md +++ b/docs/testing/fsm_concurrency_testing_framework.md @@ -2,80 +2,69 @@ ## Overview -The FSM tests are meant to exercise concurrency within MongoDB. The suite -consists of workloads, which define discrete units of work as states in a FSM, -and runners, which define which tests to run and how they should be run. Each -workload defines states, which are JS functions that perform some meaningful -series of tasks and assertions, and transitions, which define how to move -between those states. A single workload begins by executing its setup function, -which is called once during the runner's thread of execution. Next, the runner -generates the number of threads specified by the workload, and each spawned -thread executes the start state (typically named "init") defined by the -workload. From this point on, each worker thread executes its own independent -copy of the FSM, and will randomly move between states (after executing the -function) based on the probabilities defined in the workload's transition table. -Each worker thread continues doing so until the number of transitions it makes -has reached the number of iterations defined by the workload. Once all the -worker threads have finished, the runner executes the workload's teardown -function. +The FSM tests are meant to exercise concurrency within MongoDB. The suite consists of workloads, +which define discrete units of work as states in a FSM, and runners, which define which tests to run +and how they should be run. Each workload defines states, which are JS functions that perform some +meaningful series of tasks and assertions, and transitions, which define how to move between those +states. A single workload begins by executing its setup function, which is called once during the +runner's thread of execution. Next, the runner generates the number of threads specified by the +workload, and each spawned thread executes the start state (typically named "init") defined by the +workload. From this point on, each worker thread executes its own independent copy of the FSM, and +will randomly move between states (after executing the function) based on the probabilities defined +in the workload's transition table. Each worker thread continues doing so until the number of +transitions it makes has reached the number of iterations defined by the workload. Once all the +worker threads have finished, the runner executes the workload's teardown function. ![fsm.png](../images/testing/fsm.png) -The runner provides two modes of execution for workloads: serial and parallel. -Serial mode runs the provided workloads one after the other, -waiting for all threads of a workload to complete before moving on to the next -workload. Parallel mode runs subsets of the provided workloads in separate +The runner provides two modes of execution for workloads: serial and parallel. Serial mode runs the +provided workloads one after the other, waiting for all threads of a workload to complete before +moving on to the next workload. Parallel mode runs subsets of the provided workloads in separate threads simultaneously. -New methods were added to allow for finer-grained assertions under different -situations. For example, a test that inserts a document into a collection, and -wants to assert its existence will fail if another test removes that document. -One option would have been to disable all assertions when running a mixture of -different workloads together, but doing so would make the system incapable of -detecting anything other than server crashes. Another option would have been to -design the workloads to be conflict-free (e.g. writing to separate collections, -using commutative operators), but this would leave large gaps in the achievable -test coverage. Neither of those options were found to be very appealing. -Instead, we chose to introduce the concept of an "assertion level" that acts as -a precondition for when an assertion is evaluated. This allows us to still make -some assertions, even when running a mixture of different workloads together. -There are three assertion levels: `ALWAYS`, `OWN_COLL`, and `OWN_DB`. They can -be thought of as follows: +New methods were added to allow for finer-grained assertions under different situations. For +example, a test that inserts a document into a collection, and wants to assert its existence will +fail if another test removes that document. One option would have been to disable all assertions +when running a mixture of different workloads together, but doing so would make the system incapable +of detecting anything other than server crashes. Another option would have been to design the +workloads to be conflict-free (e.g. writing to separate collections, using commutative operators), +but this would leave large gaps in the achievable test coverage. Neither of those options were found +to be very appealing. Instead, we chose to introduce the concept of an "assertion level" that acts +as a precondition for when an assertion is evaluated. This allows us to still make some assertions, +even when running a mixture of different workloads together. There are three assertion levels: +`ALWAYS`, `OWN_COLL`, and `OWN_DB`. They can be thought of as follows: -- `ALWAYS`: A statement that remains unequivocally true, regardless of what - another workload might be doing to the collection I was given (hint: think - defensively). Examples include "1 = 1" or inserting a document into a - collection (disregarding any unique indices). +- `ALWAYS`: A statement that remains unequivocally true, regardless of what another workload might + be doing to the collection I was given (hint: think defensively). Examples include "1 = 1" or + inserting a document into a collection (disregarding any unique indices). -- `OWN_COLL`: A statement that is true only if I am the only workload operating - on the collection I was given. Examples include counting the number of - documents in a collection or updating a previously inserted document. +- `OWN_COLL`: A statement that is true only if I am the only workload operating on the collection I + was given. Examples include counting the number of documents in a collection or updating a + previously inserted document. -- `OWN_DB`: A statement that is true only if I am the only workload operating on - the database I was given. Examples include renaming a collection or verifying - that a collection is capped. The workload typically relies on the use of - another collection aside from the one given. +- `OWN_DB`: A statement that is true only if I am the only workload operating on the database I was + given. Examples include renaming a collection or verifying that a collection is capped. The + workload typically relies on the use of another collection aside from the one given. ## Creating your own workload -All workloads are stored in `jstests/concurrency/fsm_workloads` and as specific -examples you can refer to +All workloads are stored in `jstests/concurrency/fsm_workloads` and as specific examples you can +refer to 1. `jstests/concurrency/fsm_example.js` 1. `jstests/concurrency/fsm_example_inheritance.js` -for writing new workloads. Every workload is loaded in as inline JavaScript -using the "load" function, which is a lot more like a `#include` than -`require.js`. This means that whatever variables are declared in the global -scope of the file will become part of the scope where load is called. The runner -will be looking for a variable called `$config` which will store the +for writing new workloads. Every workload is loaded in as inline JavaScript using the "load" +function, which is a lot more like a `#include` than `require.js`. This means that whatever +variables are declared in the global scope of the file will become part of the scope where load is +called. The runner will be looking for a variable called `$config` which will store the configuration of your workload. ### The $config object -There should be exactly one `$config` per workload. For style consistency as -well as safety, be sure to wrap the value of `$config` in an anonymous function. -This will create a JS closure and a new scope: +There should be exactly one `$config` per workload. For style consistency as well as safety, be sure +to wrap the value of `$config` in an anonymous function. This will create a JS closure and a new +scope: ```javascript $config = (function() { @@ -93,19 +82,17 @@ $config = (function() { )(); ``` -When finished executing, `$config` must return an object containing the properties -above (some of which are optional, see below). +When finished executing, `$config` must return an object containing the properties above (some of +which are optional, see below). ### Defining states -It's best to also declare states within its own closure so as not to interfere -with the scope of $config. Each state takes two arguments, the db object and the -collection name. For later, note that this db and collection are the only one -that you can be guaranteed to "own" when asserting. Try to make each state a -discrete unit of work that can stand alone without the other states. -Additionally, try to define each function that makes up a state -with a name as opposed to anonymously - this makes easier to read backtraces -when things go wrong. +It's best to also declare states within its own closure so as not to interfere with the scope of +$config. Each state takes two arguments, the db object and the collection name. For later, note that +this db and collection are the only one that you can be guaranteed to "own" when asserting. Try to +make each state a discrete unit of work that can stand alone without the other states. Additionally, +try to define each function that makes up a state with a name as opposed to anonymously - this makes +easier to read backtraces when things go wrong. ```javascript $config = (function () { @@ -146,14 +133,12 @@ $config = (function () { ### Defining transitions -The transitions object defines the probabilities of moving from one state to a -different state. When a state's function is finished executing, the FSM randomly -chooses the next state using the probabilities provided in the transitions -object. The probabilities of the transitions object do not necessarily need to -sum to 1.0, since the mechanism for choosing the next state uses normalized -random values. Here it is not necessary to use a separate closure. In the -example below, we're denoting an equal probability of moving to either of the -scan states from the init state: +The transitions object defines the probabilities of moving from one state to a different state. When +a state's function is finished executing, the FSM randomly chooses the next state using the +probabilities provided in the transitions object. The probabilities of the transitions object do not +necessarily need to sum to 1.0, since the mechanism for choosing the next state uses normalized +random values. Here it is not necessary to use a separate closure. In the example below, we're +denoting an equal probability of moving to either of the scan states from the init state: ```javascript $config = (function () { @@ -174,15 +159,13 @@ $config = (function () { ### Setup and teardown functions -The setup and teardown functions are special in that they'll only be executed in -one thread. See the Runners section for more information about when they're run -relative to other workloads in various modes. The setup and teardown functions -take three arguments: db, coll, and cluster. The setup function (and -corresponding teardown) should perform most of the initialization your workload -needs, for example setting parameters on the server, adding seed data, or -setting up indexes. Note that rather than executing adminCommands (and others) -against the provided `db` you should use the provided -`cluster.executeOnMongodNodes` and `cluster.executeOnMongosNodes` functionality. +The setup and teardown functions are special in that they'll only be executed in one thread. See the +Runners section for more information about when they're run relative to other workloads in various +modes. The setup and teardown functions take three arguments: db, coll, and cluster. The setup +function (and corresponding teardown) should perform most of the initialization your workload needs, +for example setting parameters on the server, adding seed data, or setting up indexes. Note that +rather than executing adminCommands (and others) against the provided `db` you should use the +provided `cluster.executeOnMongodNodes` and `cluster.executeOnMongosNodes` functionality. ```javascript $config = (function () { @@ -224,18 +207,16 @@ $config = (function () { ### The `data` object -The `data` object preserves information between different states of an FSM within -an individual thread. Within a single state, the data object becomes the 'this' -context in which the state executes. Additionally, a tid attribute is added to -data by the runner to allow each thread to access a unique ID. Data is usually -defined above states inside the config, but left below it in the returned -object. Data is also available as the 'this' context in setup and teardown -functions. Note that once the FSM begins, the context data that was passed to -the setup function is copied into each thread - meaning each thread has its own -copy of the data and modifications to data will not be passed back to the -teardown function outside of what was changed in setup. Additionally, in -composition, each workload has its own data, meaning you don't have to worry -about properties being overridden by workloads other than the current one. +The `data` object preserves information between different states of an FSM within an individual +thread. Within a single state, the data object becomes the 'this' context in which the state +executes. Additionally, a tid attribute is added to data by the runner to allow each thread to +access a unique ID. Data is usually defined above states inside the config, but left below it in the +returned object. Data is also available as the 'this' context in setup and teardown functions. Note +that once the FSM begins, the context data that was passed to the setup function is copied into each +thread - meaning each thread has its own copy of the data and modifications to data will not be +passed back to the teardown function outside of what was changed in setup. Additionally, in +composition, each workload has its own data, meaning you don't have to worry about properties being +overridden by workloads other than the current one. ```javascript $config = (function () { @@ -255,57 +236,50 @@ $config = (function () { #### `threadCount` -threadCount is the number of threads that will be used to run your workload in -Serial and Parallel modes. In both modes, the number of threads you provide will -execute the FSM simultaneously, cycling through different states of the -workload. Note that in serial mode, no other threads will be running outside of -those pertaining to this workload, and in parallel mode, other workloads will -also be given threads to execute their FSM. In some cases in parallel mode, this -number will be scaled down to make sure that all workloads can fit within the -number of threads available due to system or performance constraints. +threadCount is the number of threads that will be used to run your workload in Serial and Parallel +modes. In both modes, the number of threads you provide will execute the FSM simultaneously, cycling +through different states of the workload. Note that in serial mode, no other threads will be running +outside of those pertaining to this workload, and in parallel mode, other workloads will also be +given threads to execute their FSM. In some cases in parallel mode, this number will be scaled down +to make sure that all workloads can fit within the number of threads available due to system or +performance constraints. #### `iterations` -This is just the number of states the FSM will go through before exiting. NOTE: -it is _not_ the number of times each state will be executed. +This is just the number of states the FSM will go through before exiting. NOTE: it is _not_ the +number of times each state will be executed. #### `startState` (optional) -Default value is 'init'. If your workload does not have an init state than you -must specify in which state to begin. +Default value is 'init'. If your workload does not have an init state than you must specify in which +state to begin. ### Workload helpers -`jstests/concurrency/fsm_workload_helpers` contains a few files that you can -include using 'load' at the top of a workload. These provide auxiliary -functionality that might be necessary for some workloads. The most important of -which is probably server_types.js +`jstests/concurrency/fsm_workload_helpers` contains a few files that you can include using 'load' at +the top of a workload. These provide auxiliary functionality that might be necessary for some +workloads. The most important of which is probably server_types.js #### server_types.js -This helper file contains four functions: isMongos, isMongod, isMMAPv1, and -isWiredTiger. These can be used to restrict operations on different -functionality available in sharded environments, as well as based on storage -engine, and work as you would expect. One thing to note is that before calling -either isMMAPv1 or isWiredTiger, first verify isMongod. When special casing -functionality for sharded environments or storage engines, try to special case a -test for the exceptionality while still leaving in place assertions for either -case. +This helper file contains four functions: isMongos, isMongod, isMMAPv1, and isWiredTiger. These can +be used to restrict operations on different functionality available in sharded environments, as well +as based on storage engine, and work as you would expect. One thing to note is that before calling +either isMMAPv1 or isWiredTiger, first verify isMongod. When special casing functionality for +sharded environments or storage engines, try to special case a test for the exceptionality while +still leaving in place assertions for either case. #### indexed_noindex.js -This helper can be used along with inheritance, to create a workload that is -exactly the same as an existing workload, but with the index created during -setup removed. In order to use this replace the function you provide to the -extendWorkload function with indexedNoindex. Additionally, ensure that the -workload you are extending has a function in its data object called -"getIndexSpec" that returns the spec for the index to be removed. +This helper can be used along with inheritance, to create a workload that is exactly the same as an +existing workload, but with the index created during setup removed. In order to use this replace the +function you provide to the extendWorkload function with indexedNoindex. Additionally, ensure that +the workload you are extending has a function in its data object called "getIndexSpec" that returns +the spec for the index to be removed. ```javascript import {extendWorkload} from "jstests/concurrency/fsm_libs/extend_workload.js"; -load( - "jstests/concurrency/fsm_workload_modifiers/collection_write_path/indexed_noindex.js", -); // for indexedNoindex +load("jstests/concurrency/fsm_workload_modifiers/collection_write_path/indexed_noindex.js"); // for indexedNoindex import {$config as $baseConfig} from "jstests/concurrency/fsm_workloads/workload_with_index.js"; export const $config = extendWorkload($baseConfig, indexedNoIndex); @@ -313,90 +287,80 @@ export const $config = extendWorkload($baseConfig, indexedNoIndex); #### drop_utils.js -These helpers provide safe methods for dropping collections, databases, roles, -and users created during a workload's execution. The methods take a regular -expression that the collection, database, role, or user name must match for it -to be dropped. Prefixing the items in any of these categories you create with a -prefix defined by your workload name is a good idea since the workload file name -can be assumed unique and will allow you to only affect your workload in these -cases. +These helpers provide safe methods for dropping collections, databases, roles, and users created +during a workload's execution. The methods take a regular expression that the collection, database, +role, or user name must match for it to be dropped. Prefixing the items in any of these categories +you create with a prefix defined by your workload name is a good idea since the workload file name +can be assumed unique and will allow you to only affect your workload in these cases. ## Test runners -By default, all runners below are allowed to open a maximum of -`maxAllowedConnections` (= 100 by default) explicit connections. In replicated -and sharded environments, implicit connections are created to the original -mongod provided to the mongo shell executing the runner (one for each thread). -This behavior cannot be controlled, but it highlights the importance of always -using the db object provided in the FSM states rather than the global db which -will always correspond to the mongod the mongo shell initially connected to. +By default, all runners below are allowed to open a maximum of `maxAllowedConnections` (= 100 by +default) explicit connections. In replicated and sharded environments, implicit connections are +created to the original mongod provided to the mongo shell executing the runner (one for each +thread). This behavior cannot be controlled, but it highlights the importance of always using the db +object provided in the FSM states rather than the global db which will always correspond to the +mongod the mongo shell initially connected to. ### Execution modes #### Serial -Serial is the simplest of all three modes and basically works as explained -above. Setup is run single threaded, data is copied into multiple threads where -the states are executed, and once all the threads have finished a teardown -function is run and the runner moves onto the next workload. +Serial is the simplest of all three modes and basically works as explained above. Setup is run +single threaded, data is copied into multiple threads where the states are executed, and once all +the threads have finished a teardown function is run and the runner moves onto the next workload. ![fsm_serial_example.png](../images/testing/fsm_serial_example.png) #### Parallel (Simultaneous) -In parallel or simultaneous mode (the naming convention has been slightly -inconsistent), the ordering becomes a little different. All workloads have their -setup functions run, then threads are spawned for each workload, and once they -all complete, all threads have their teardown function run. +In parallel or simultaneous mode (the naming convention has been slightly inconsistent), the +ordering becomes a little different. All workloads have their setup functions run, then threads are +spawned for each workload, and once they all complete, all threads have their teardown function run. ![fsm_simultaneous_example.png](../images/testing/fsm_simultaneous_example.png) ### Existing runners -The existing runners all use `jstests/concurrency/fsm_libs/runner.js` to -actually execute the workloads. Most information about arguments and available -runWorkloads methods can be found by inspecting the source. Below you can find -the existing runners explained. The first argument to the three runWorkloads -methods (each corresponding to a different run mode), is an array of workload -files to run. clusterOptions, the second argument to the runWorkloads functions, -is explained in the other components section below. Execution options for -runWorkloads functions, the third argument, can contain the following options -(some depend on the run mode): +The existing runners all use `jstests/concurrency/fsm_libs/runner.js` to actually execute the +workloads. Most information about arguments and available runWorkloads methods can be found by +inspecting the source. Below you can find the existing runners explained. The first argument to the +three runWorkloads methods (each corresponding to a different run mode), is an array of workload +files to run. clusterOptions, the second argument to the runWorkloads functions, is explained in the +other components section below. Execution options for runWorkloads functions, the third argument, +can contain the following options (some depend on the run mode): -- `numSubsets` - Not available in serial mode, determines how many subsets of - workloads to execute in parallel mode -- `subsetSize` - Not available in serial mode, determines how large each subset of - workloads executed is +- `numSubsets` - Not available in serial mode, determines how many subsets of workloads to execute + in parallel mode +- `subsetSize` - Not available in serial mode, determines how large each subset of workloads + executed is #### fsm_all.js -Runs all workloads serially. For each workload, `$config.threadCount` threads -are spawned and each thread runs for exactly `$config.iterations` steps starting -at `$config.startState` and transitioning to other states based on the -transition probabilities defined in $config.transitions. +Runs all workloads serially. For each workload, `$config.threadCount` threads are spawned and each +thread runs for exactly `$config.iterations` steps starting at `$config.startState` and +transitioning to other states based on the transition probabilities defined in $config.transitions. #### fsm_all_simultaneous.js options: numSubsets, subsetSize -Runs numSubsets subsets of size subsetSize of all workloads. The workloads in -each subset are started in parallel and each workload is run according to -settings in `$config`. +Runs numSubsets subsets of size subsetSize of all workloads. The workloads in each subset are +started in parallel and each workload is run according to settings in `$config`. #### fsm_all_replication.js -Sets up a replica set (with 3 mongods by default) and runs workloads serially or -in parallel. For example, +Sets up a replica set (with 3 mongods by default) and runs workloads serially or in parallel. For +example, `runWorkloadsSerially([, , ...], { replication: true } )` -creates a replica set with 3 members and runs some workloads serially on the -primary. +creates a replica set with 3 members and runs some workloads serially on the primary. #### fsm_all_sharded.js -Sets up a sharded cluster (with 2 shards and 1 mongos by default) and runs -workloads serially or in parallel. For example, +Sets up a sharded cluster (with 2 shards and 1 mongos by default) and runs workloads serially or in +parallel. For example, `runWorkloadsInParallel([, , ...], { sharded: true } )` @@ -404,36 +368,33 @@ creates a sharded cluster and runs workloads in parallel. #### fsm_all_sharded_replication.js -Sets up a sharded cluster (with 2 shards, each having 3 replica set members, and -1 mongos by default) and runs workloads serially or in parallel. +Sets up a sharded cluster (with 2 shards, each having 3 replica set members, and 1 mongos by +default) and runs workloads serially or in parallel. ### Excluding a workload -If any workloads fail because of known bugs in MongoDB, persistent MCI failures -or timeouts, the troublesome workload can be excluded from running by placing it -in the exclusion array in the corresponding runner. Please remember to place a -comment next to the excluded workload name identifying the reason a workload is -being excluded. For example, +If any workloads fail because of known bugs in MongoDB, persistent MCI failures or timeouts, the +troublesome workload can be excluded from running by placing it in the exclusion array in the +corresponding runner. Please remember to place a comment next to the excluded workload name +identifying the reason a workload is being excluded. For example, `'agg_sort_external.js', // SERVER-16700 Deadlock on WiredTiger LSM` -Each file should also have two predefined sections - one for known bugs and one -for restrictions. The one above would be considered a known bug. However, -excluding a compact workload from sharded runners would be a restriction because -compact can only be run against individual mongods. +Each file should also have two predefined sections - one for known bugs and one for restrictions. +The one above would be considered a known bug. However, excluding a compact workload from sharded +runners would be a restriction because compact can only be run against individual mongods. ## Other components of the FSM library -Most of these components live in jstests/concurrency/fsm_libs and provide the -functionality used by the runner. +Most of these components live in jstests/concurrency/fsm_libs and provide the functionality used by +the runner. ### ThreadManager -Responsible for spawning and joining worker threads. Each spawned thread is -wrapped in a try/finally block to ensure that the database connection implicitly -created during the thread's execution is eventually closed explicitly. The -ThreadManager sets a random seed `([0, randInt(1e13))` which is the range of -`new Date().getTime())` before executing each workload. +Responsible for spawning and joining worker threads. Each spawned thread is wrapped in a try/finally +block to ensure that the database connection implicitly created during the thread's execution is +eventually closed explicitly. The ThreadManager sets a random seed `([0, randInt(1e13))` which is +the range of `new Date().getTime())` before executing each workload. ### Worker Thread @@ -441,36 +402,30 @@ Thread spawned by ThreadManager and used to run a Finite State Machine. ### Cluster -cluster.js is responsible for providing the cluster object that is passed to -setup and teardown functions, and the initial connection to a db to be used by -runner to pass to the workloads. For anything except for standalone, it makes -use of the shell's built-in cluster test helpers like `ShardingTest` and -`ReplSetTest`. clusterOptions are passed to cluster.js for initialization. +cluster.js is responsible for providing the cluster object that is passed to setup and teardown +functions, and the initial connection to a db to be used by runner to pass to the workloads. For +anything except for standalone, it makes use of the shell's built-in cluster test helpers like +`ShardingTest` and `ReplSetTest`. clusterOptions are passed to cluster.js for initialization. clusterOptions include: - `replication`: boolean, whether or not to use replication in the cluster -- `sameCollection`: boolean, whether or not all workloads are passed the same - collection +- `sameCollection`: boolean, whether or not all workloads are passed the same collection - `sameDB`: boolean, whether or not all workloads are passed the same DB -- `setupFunctions`: object, containing at most two functions under the keys - 'mongod' and 'mongos'. This allows you to run a function against all mongod or - mongos nodes in the cluster as part of the cluster initialization. Each - function takes a single argument, the db object against which configuration - can be run (will be set for each mongod/mongos) +- `setupFunctions`: object, containing at most two functions under the keys 'mongod' and 'mongos'. + This allows you to run a function against all mongod or mongos nodes in the cluster as part of the + cluster initialization. Each function takes a single argument, the db object against which + configuration can be run (will be set for each mongod/mongos) - `sharded`: boolean, whether or not to use sharding in the cluster -Note that sameCollection and sameDB can increase contention for a resource, but -will also decrease the strength of the assertions by ruling out the use of OwnDB -and OwnColl assertions. +Note that sameCollection and sameDB can increase contention for a resource, but will also decrease +the strength of the assertions by ruling out the use of OwnDB and OwnColl assertions. ### Miscellaneous Execution Notes -- A `CountDownLatch` (exposed through the v8-based mongo shell, as of MongoDB 3.0) - is used as a synchronization primitive by the ThreadManager to wait until all - spawned threads have finished being spawned before starting workload - execution. -- If more than 20% of the threads fail while spawning, we abort the test. If - fewer than 20% of the threads fail while spawning we allow the non-failed - threads to continue with the test. The 20% threshold is somewhat arbitrary; - the goal is to abort if "mostly all" of the threads failed but to tolerate "a - few" threads failing. +- A `CountDownLatch` (exposed through the v8-based mongo shell, as of MongoDB 3.0) is used as a + synchronization primitive by the ThreadManager to wait until all spawned threads have finished + being spawned before starting workload execution. +- If more than 20% of the threads fail while spawning, we abort the test. If fewer than 20% of the + threads fail while spawning we allow the non-failed threads to continue with the test. The 20% + threshold is somewhat arbitrary; the goal is to abort if "mostly all" of the threads failed but to + tolerate "a few" threads failing. diff --git a/docs/testing/hang_analyzer.md b/docs/testing/hang_analyzer.md index e41b4929cb1..93e2d6b513f 100644 --- a/docs/testing/hang_analyzer.md +++ b/docs/testing/hang_analyzer.md @@ -1,37 +1,34 @@ # Hang Analyzer -The hang analyzer is a tool to collect cores and other information from processes -that are suspected to have hung. Any task which exceeds its timeout in Evergreen -will automatically be hang-analyzed, with information being written compressed -and uploaded to S3. +The hang analyzer is a tool to collect cores and other information from processes that are suspected +to have hung. Any task which exceeds its timeout in Evergreen will automatically be hang-analyzed, +with information being written compressed and uploaded to S3. -The hang analyzer can also be invoked locally at any time. For all non-Jepsen -tasks, the invocation is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python`. You may need to substitute `python` with the name of the python binary -you are using, which may be one of `python`, `python3`, or on Windows: `Python`, -`Python3`. +The hang analyzer can also be invoked locally at any time. For all non-Jepsen tasks, the invocation +is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python`. You may need to +substitute `python` with the name of the python binary you are using, which may be one of `python`, +`python3`, or on Windows: `Python`, `Python3`. -For jepsen tasks, the invocation is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -p dbtest,java,mongo,mongod,mongos,python,_test`. +For jepsen tasks, the invocation is +`buildscripts/resmoke.py hang-analyzer -o file -o stdout -p dbtest,java,mongo,mongod,mongos,python,_test`. ## Interesting Processes -The hang analyzer detects and runs against processes which are considered -interesting. +The hang analyzer detects and runs against processes which are considered interesting. -Tasks whose name contains "jepsen": any process whose name exactly matches one -of `dbtest,java,mongo,mongod,mongos,python,_test`. +Tasks whose name contains "jepsen": any process whose name exactly matches one of +`dbtest,java,mongo,mongod,mongos,python,_test`. -In all other scenarios, including local use of the hang-analyzer, an interesting -process is any of: +In all other scenarios, including local use of the hang-analyzer, an interesting process is any of: - process that starts with `python` or `live-record` - one which has been spawned as a child process of resmoke. -The resmoke subcommand `hang-analyzer` will send SIGUSR1/use SetEvent to signal -resmoke to: +The resmoke subcommand `hang-analyzer` will send SIGUSR1/use SetEvent to signal resmoke to: - Print stack traces for all python threads -- Collect core dumps and other information for any non-python child - processes, see `Data Collection` below +- Collect core dumps and other information for any non-python child processes, see `Data Collection` + below - Re-signal any python child processes to do the same ## Data Collection @@ -41,8 +38,8 @@ Data collection occurs in the following sequence: - Pause all non-python processes - Grab debug symbols on non-Sanitizer builds - Signal python Processes -- Dump cores of as many processes as possible, until the disk quota is exceeded. - The default quota is 90% of total volume space. +- Dump cores of as many processes as possible, until the disk quota is exceeded. The default quota + is 90% of total volume space. - Collect additional, non-core data. Ideally: - Print C++ Stack traces @@ -54,13 +51,12 @@ Data collection occurs in the following sequence: - Dump java processes (Jepsen tests) with jstack - SIGABRT (Unix)/terminate (Windows) go processes -Note that the list of non-core data collected is only accurate on Linux. Other -platforms only perform a subset of these operations. +Note that the list of non-core data collected is only accurate on Linux. Other platforms only +perform a subset of these operations. -Additionally, note that the hang analyzer is subject to Evergreen post task -timeouts, and may not have enough time to collect all information before -being terminated by the Evergreen agent. When running locally there is no -timeout, and the hang analyzer may ironically hang indefinitely. +Additionally, note that the hang analyzer is subject to Evergreen post task timeouts, and may not +have enough time to collect all information before being terminated by the Evergreen agent. When +running locally there is no timeout, and the hang analyzer may ironically hang indefinitely. ### Implementations diff --git a/docs/testing/network_fault_injection_mongobridge.md b/docs/testing/network_fault_injection_mongobridge.md index 7a11bd9dea9..8a36bb023bf 100644 --- a/docs/testing/network_fault_injection_mongobridge.md +++ b/docs/testing/network_fault_injection_mongobridge.md @@ -2,11 +2,23 @@ ## Overview -[Mongobridge](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L1) is a network fault injection testing tool that allows test authors to intentionally simulate network issues such as connection failures, message delays, or packet loss during communication to any node in a cluster. It acts as a transparent proxy between MongoDB processes and their clients, enabling controlled network fault injection for testing distributed system behavior. +[Mongobridge](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L1) +is a network fault injection testing tool that allows test authors to intentionally simulate network +issues such as connection failures, message delays, or packet loss during communication to any node +in a cluster. It acts as a transparent proxy between MongoDB processes and their clients, enabling +controlled network fault injection for testing distributed system behavior. ## How It Works -When `ReplSetTest` or `ShardingTest` are instructed to use `mongobridge`, they will [set up a mongobridge process](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/jstests/libs/replsettest.js#L2962) for each node that [creates a ProxiedConnection](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L323-L324) between the node and any clients (including other nodes in the cluster) attempting to communicate with it. When test authors send a command to a node, mongobridge [intercepts the command and applies any configured actions](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L395-L430) onto the commands before forwarding the command along to the node itself. This allows simple fault injection from the test author's perspective. +When `ReplSetTest` or `ShardingTest` are instructed to use `mongobridge`, they will +[set up a mongobridge process](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/jstests/libs/replsettest.js#L2962) +for each node that +[creates a ProxiedConnection](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L323-L324) +between the node and any clients (including other nodes in the cluster) attempting to communicate +with it. When test authors send a command to a node, mongobridge +[intercepts the command and applies any configured actions](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L395-L430) +onto the commands before forwarding the command along to the node itself. This allows simple fault +injection from the test author's perspective. ## Quick Start @@ -23,7 +35,8 @@ To use mongobridge in your tests: }); ``` - - **Test commands must be enabled**: Mongobridge's `*From` commands require `enableTestCommands: true` (which is the default in test environments) + - **Test commands must be enabled**: Mongobridge's `*From` commands require + `enableTestCommands: true` (which is the default in test environments) 2. **Inject network faults** using bridge commands: @@ -38,11 +51,16 @@ To use mongobridge in your tests: st.rs0.getPrimary().acceptConnectionsFrom(st.rs0.getSecondary()); ``` -3. Operations that depend on communication between the affected nodes will fail or timeout as expected. +3. Operations that depend on communication between the affected nodes will fail or timeout as + expected. ## What to keep in mind -Be aware that there are consequences to injecting network faults between nodes that can cause downstream impact in (for example) heartbeats, sync source selection, and SDAM, and so after a fault has been injected the test may not be in the state you expect it to be in for future commands. It is best to keep mongobridge tests relatively short and targeted to ensure that flakiness due to these faults doesn't impact the rest of your testing. +Be aware that there are consequences to injecting network faults between nodes that can cause +downstream impact in (for example) heartbeats, sync source selection, and SDAM, and so after a fault +has been injected the test may not be in the state you expect it to be in for future commands. It is +best to keep mongobridge tests relatively short and targeted to ensure that flakiness due to these +faults doesn't impact the rest of your testing. ## Command Reference @@ -71,7 +89,8 @@ node.acceptConnectionsFrom([node1, node2, node3]); // Multiple nodes node.rejectConnectionsFrom(otherNode); ``` -**Effect**: New connections are rejected, existing connections are closed when a new request is sent over them +**Effect**: New connections are rejected, existing connections are closed when a new request is sent +over them **Use case**: Simulating complete network partitions @@ -183,7 +202,8 @@ primary.discardMessagesFrom(secondary, 0.3); ### Limitations -- **OP_QUERY exhaust**: Not supported for legacy exhaust queries (OP_MSG exhaust cursors are supported) +- **OP_QUERY exhaust**: Not supported for legacy exhaust queries (OP_MSG exhaust cursors are + supported) - **Direct connections**: Only works when connections go through the bridge proxy - **TLS support**: Mongobridge is not supported if the cluster is using TLS. diff --git a/docs/testing/otel_resmoke.md b/docs/testing/otel_resmoke.md index a915394035c..056391c64d2 100644 --- a/docs/testing/otel_resmoke.md +++ b/docs/testing/otel_resmoke.md @@ -11,26 +11,32 @@ Using OTel we capture the following things 3. Duration of hooks before and after test/suite 4. Resmoke archiver (when there is a failure we archive core dumps) -To see this visually navigate to the [resmoke dataset](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/home) and view a recent trace. +To see this visually navigate to the +[resmoke dataset](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/home) +and view a recent trace. ## A look at source code ### Configuration -The bulk of configuration is done in the -`_set_up_tracing(...)` method in [configure_resmoke.py#L164](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/configure_resmoke.py#L164). This method includes documentation on how it works. +The bulk of configuration is done in the `_set_up_tracing(...)` method in +[configure_resmoke.py#L164](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/configure_resmoke.py#L164). +This method includes documentation on how it works. ## BatchedBaggageSpanProcessor -See documentation [batched_baggage_span_processor.py#L8](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/batched_baggage_span_processor.py#L8) +See documentation +[batched_baggage_span_processor.py#L8](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/batched_baggage_span_processor.py#L8) ## FileSpanExporter -See documentation [file_span_exporter.py#L16](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/file_span_exporter.py#L16) +See documentation +[file_span_exporter.py#L16](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/file_span_exporter.py#L16) ## Capturing Data -We mostly capture data by using a decorator on methods. Example taken from [job.py#L200](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L200) +We mostly capture data by using a decorator on methods. Example taken from +[job.py#L200](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L200) ``` TRACER = trace.get_tracer("resmoke") @@ -41,7 +47,11 @@ def func_name(...): span.set_attribute("attr1", True) ``` -This system is nice because the decorator captures exceptions and other failures and a user can never forget to close a span. On occasion we will also start a span using the `with` clause in python. However, the decorator method is preferred since the method below makes more of a readability impact on the code. This example is taken from [job.py#L215](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L215) +This system is nice because the decorator captures exceptions and other failures and a user can +never forget to close a span. On occasion we will also start a span using the `with` clause in +python. However, the decorator method is preferred since the method below makes more of a +readability impact on the code. This example is taken from +[job.py#L215](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L215) ``` with TRACER.start_as_current_span("func_name", attributes={}): @@ -51,4 +61,9 @@ with TRACER.start_as_current_span("func_name", attributes={}): ## Insights We Have Made (so far) -Using [this dashboard](https://ui.honeycomb.io/mongodb-4b/environments/production/board/3bATQLb38bh/Server-CI) and [this query](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/result/GFa2YJ6d4vU/a/7EYuMJtH8KX/Slowest-Resmoke-Tests) we can see the most expensive single js tests. We plan to make tickets for teams to fix these long running tests for cloud savings as well as developer time savings. +Using +[this dashboard](https://ui.honeycomb.io/mongodb-4b/environments/production/board/3bATQLb38bh/Server-CI) +and +[this query](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/result/GFa2YJ6d4vU/a/7EYuMJtH8KX/Slowest-Resmoke-Tests) +we can see the most expensive single js tests. We plan to make tickets for teams to fix these long +running tests for cloud savings as well as developer time savings. diff --git a/docs/testing/resmoke_modules.md b/docs/testing/resmoke_modules.md index d38075c4258..6f2f4c8a052 100644 --- a/docs/testing/resmoke_modules.md +++ b/docs/testing/resmoke_modules.md @@ -1,10 +1,14 @@ # Resmoke Module Configuration -This configuration allows additional modules to be added to Resmoke, providing more context about their associated directories. Modules can specify directories for fixtures, hooks, suites, and JavaScript tests, which Resmoke incorporates during its testing process. +This configuration allows additional modules to be added to Resmoke, providing more context about +their associated directories. Modules can specify directories for fixtures, hooks, suites, and +JavaScript tests, which Resmoke incorporates during its testing process. ## Adding a New Module -To add a new module to Resmoke, define the module name and specify its `fixture_dirs`, `hook_dirs`, `suite_dirs`, and `jstest_dirs` in the YAML configuration. Each field should be a list of directory paths. +To add a new module to Resmoke, define the module name and specify its `fixture_dirs`, `hook_dirs`, +`suite_dirs`, and `jstest_dirs` in the YAML configuration. Each field should be a list of directory +paths. ### Example YAML Configuration @@ -25,9 +29,12 @@ my_new_module: - **`fixture_dirs`**: Directories containing fixtures associated with the module. - **`hook_dirs`**: Directories containing hooks associated with the module. - **`suite_dirs`**: Directories containing suites with test configurations. -- **`jstest_dirs`**: Directories containing JavaScript tests specific to the module. This ensures module-specific tests are excluded from other suite configurations when the module is disabled. +- **`jstest_dirs`**: Directories containing JavaScript tests specific to the module. This ensures + module-specific tests are excluded from other suite configurations when the module is disabled. ## Notes -- Any suite can use jstests from any directory, when the module is enabled the configured jstest dirs does nothing. Only when the module is disabled does it filter out the tests that might be configured in a suite from a different module. +- Any suite can use jstests from any directory, when the module is enabled the configured jstest + dirs does nothing. Only when the module is disabled does it filter out the tests that might be + configured in a suite from a different module. - Fields can be omitted or empty lists diff --git a/docs/thread_pools.md b/docs/thread_pools.md index c17681f9278..db0deeeb78f 100644 --- a/docs/thread_pools.md +++ b/docs/thread_pools.md @@ -1,55 +1,48 @@ # Thread Pools -A thread pool ([Wikipedia][thread_pools_wikipedia]) accepts and executes -lightweight work items called "tasks", using a carefully managed group -of dedicated long-running worker threads. The worker threads perform -the work items in parallel without forcing each work item to assume the -burden of starting and destroying a dedicated thead. +A thread pool ([Wikipedia][thread_pools_wikipedia]) accepts and executes lightweight work items +called "tasks", using a carefully managed group of dedicated long-running worker threads. The worker +threads perform the work items in parallel without forcing each work item to assume the burden of +starting and destroying a dedicated thead. ## Classes ### `ThreadPoolInterface` -The [`ThreadPoolInterface`][thread_pool_interface.h] abstract interface is -an extension of the `OutOfLineExecutor` (see [the executors architecture -guide][executors]) abstract interface, adding `startup`, `shutdown`, and -`join` virtual member functions. It is the base class for our thread -pool classes. +The [`ThreadPoolInterface`][thread_pool_interface.h] abstract interface is an extension of the +`OutOfLineExecutor` (see [the executors architecture guide][executors]) abstract interface, adding +`startup`, `shutdown`, and `join` virtual member functions. It is the base class for our thread pool +classes. ### `ThreadPool` -[`ThreadPool`][thread_pool.h] is the most basic concrete thread pool. The -number of worker threads is adaptive, but configurable with a min/max -range. Idle worker threads are reaped (down to the configured min), while -new worker threads can be created when needed (up to the configured max). +[`ThreadPool`][thread_pool.h] is the most basic concrete thread pool. The number of worker threads +is adaptive, but configurable with a min/max range. Idle worker threads are reaped (down to the +configured min), while new worker threads can be created when needed (up to the configured max). ### `ThreadPoolTaskExecutor` -[`ThreadPoolTaskExecutor`][thread_pool_task_executor.h] is not a thread -pool, but rather a `TaskExecutor` that uses a `ThreadPoolInterface` and -a `NetworkInterface` to execute scheduled tasks. It's configured with a -`ThreadPoolInterface` over which it _takes_ ownership, and a -`NetworkInterface`, of which it _shares_ ownership. With these resources -it implements the elaborate `TaskExecutor` interface (see [executors]). +[`ThreadPoolTaskExecutor`][thread_pool_task_executor.h] is not a thread pool, but rather a +`TaskExecutor` that uses a `ThreadPoolInterface` and a `NetworkInterface` to execute scheduled +tasks. It's configured with a `ThreadPoolInterface` over which it _takes_ ownership, and a +`NetworkInterface`, of which it _shares_ ownership. With these resources it implements the elaborate +`TaskExecutor` interface (see [executors]). ### `NetworkInterfaceThreadPool` -[`NetworkInterfaceThreadPool`][network_interface_thread_pool.h] is a -thread pool implementation that doesn't actually own any worker threads. -It runs its tasks on the background thread of a +[`NetworkInterfaceThreadPool`][network_interface_thread_pool.h] is a thread pool implementation that +doesn't actually own any worker threads. It runs its tasks on the background thread of a [`NetworkInterface`][network_interface.h]. -Incoming tasks that are scheduled from the `NetworkInterface`'s thread -are run immediately. Otherwise they are queued to be run by the -`NetworkInterface` thread when it is available. +Incoming tasks that are scheduled from the `NetworkInterface`'s thread are run immediately. +Otherwise they are queued to be run by the `NetworkInterface` thread when it is available. ### `ThreadPoolMock` -[`ThreadPoolMock`][thread_pool_mock.h] is a `ThreadPoolInterface`. It is not -a mock of a `ThreadPool`. It has no configurable stored responses. It has -one worker thread and a pointer to a `NetworkInterfaceMock`, and with these -resources it simulates a thread pool well enough to be used by a -`ThreadPoolTaskExecutor` in unit tests. +[`ThreadPoolMock`][thread_pool_mock.h] is a `ThreadPoolInterface`. It is not a mock of a +`ThreadPool`. It has no configurable stored responses. It has one worker thread and a pointer to a +`NetworkInterfaceMock`, and with these resources it simulates a thread pool well enough to be used +by a `ThreadPoolTaskExecutor` in unit tests. [thread_pools_wikipedia]: https://en.wikipedia.org/wiki/Thread_pool [executors]: ../src/mongo/executor/README.md diff --git a/docs/unit_test.md b/docs/unit_test.md index 95464b3434e..d962140ae33 100644 --- a/docs/unit_test.md +++ b/docs/unit_test.md @@ -1,13 +1,14 @@ -Note: this doc is being continuously updated while changes are being made to the unit test framework. +Note: this doc is being continuously updated while changes are being made to the unit test +framework. # Overview # Features The MongoDB unit test framework is a thin layer built atop GoogleTest, so most GoogleTest features -(see [Google Test documentation][google_test_docs]) are available for use aside from anything -listed out in [Banned Features](#banned-features). The unit testing framework also includes -enhanced reporting of test output (see +(see [Google Test documentation][google_test_docs]) are available for use aside from anything listed +out in [Banned Features](#banned-features). The unit testing framework also includes enhanced +reporting of test output (see [Enhanced Reporting of Test Output](#enhanced-reporting-of-test-output)). The core unittest features can be accessed by including the `mongo/unittest/unittest.h` header and @@ -18,8 +19,8 @@ using the `mongo_cc_unit_test` bazel rule. ### Parameterized tests Parameterized tests are a GoogleTest feature that allows the same test logic to be run with -different values or types (see GoogleTest docs on -[Value-Parameterized Tests][value_parameterized_tests] and [Typed Tests][typed_tests]). +different values or types (see GoogleTest docs on [Value-Parameterized +Tests][value_parameterized_tests] and [Typed Tests][typed_tests]). ```cpp class TestFixture : @@ -41,8 +42,8 @@ TEST_P(TestFixture, MongoTest) { ### GoogleMock GoogleMock can be used by including the `mongo/unittest/unittest.h` header. You should never -directly include ``. There are matchers for common mongo types such as `BSONObj` -in `mongo/unittest/matcher.h`. +directly include ``. There are matchers for common mongo types such as `BSONObj` in +`mongo/unittest/matcher.h`. ## Banned Features @@ -63,9 +64,9 @@ GoogleTest fatal assertions, such as no fatal assertions allowed in non-void hel ## Enhanced Reporting of Test Output -The Enhanced Reporter improves test reporting by colorizing and formatting output, maintaining -a progress indicator, printing enhanced failure information, and suppressing log output on -passing tests. +The Enhanced Reporter improves test reporting by colorizing and formatting output, maintaining a +progress indicator, printing enhanced failure information, and suppressing log output on passing +tests. These command line flags may be used to configure the Enhanced Reporter: @@ -74,9 +75,9 @@ These command line flags may be used to configure the Enhanced Reporter: ## Death Tests -The MongoDB unit testing framework uses `DEATH_TEST` (with `DEATH_TEST_F`, `DEATH_TEST_REGEX`, -and `DEATH_TEST_REGEX_F` variants) to test code that is expected to cause the process to -terminate. This should replace all uses of the `ASSERT_DEATH` macro from GoogleTest (see +The MongoDB unit testing framework uses `DEATH_TEST` (with `DEATH_TEST_F`, `DEATH_TEST_REGEX`, and +`DEATH_TEST_REGEX_F` variants) to test code that is expected to cause the process to terminate. This +should replace all uses of the `ASSERT_DEATH` macro from GoogleTest (see [unittest/death_test.h][death_test_h] for more details). Similar to GoogleTest, `DEATH_TEST` test suite names should be suffixed with `DeathTest`. For @@ -98,8 +99,10 @@ DEATH_TEST_F(FixtureNameDeathTest, TestName) { } ``` -[death_test_naming]: https://github.com/google/googletest/blob/main/docs/advanced.md#death-test-naming +[death_test_naming]: + https://github.com/google/googletest/blob/main/docs/advanced.md#death-test-naming [death_test_h]: ../src/mongo/unittest/death_test.h [google_test_docs]: https://github.com/google/googletest/blob/main/docs/primer.md -[value_parameterized_tests]: https://github.com/google/googletest/blob/main/docs/advanced.md#value-parameterized-tests +[value_parameterized_tests]: + https://github.com/google/googletest/blob/main/docs/advanced.md#value-parameterized-tests [typed_tests]: https://github.com/google/googletest/blob/main/docs/advanced.md#typed-tests diff --git a/docs/vpat.md b/docs/vpat.md index c7917417406..2ea6abc7a17 100644 --- a/docs/vpat.md +++ b/docs/vpat.md @@ -56,9 +56,10 @@ Contact for more Information: https://www.mongodb.com/contact ### Note to 1194.22 The Board interprets paragraphs (a) through (k) of this section as consistent with the following -priority 1 Checkpoints of the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) (May 5 1999) published by the Web -Accessibility Initiative of the World Wide Web Consortium: Paragraph (a) - 1.1, (b) - 1.4, (c\) - 2.1, (d) - 6.1, -(e) - 1.2, (f) - 9.1, (g) - 5.1, (h) - 5.2, (i) - 12.1, (j) - 7.1, (k) - 11.4. +priority 1 Checkpoints of the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) (May 5 1999) +published by the Web Accessibility Initiative of the World Wide Web Consortium: Paragraph (a) - 1.1, +(b) - 1.4, (c\) - 2.1, (d) - 6.1, (e) - 1.2, (f) - 9.1, (g) - 5.1, (h) - 5.2, (i) - 12.1, (j) - 7.1, +(k) - 11.4. ## Section 1194.23 Telecommunications Products – Detail diff --git a/jstests/README.md b/jstests/README.md index 390b49667f7..2caf52f2f1c 100644 --- a/jstests/README.md +++ b/jstests/README.md @@ -1,84 +1,160 @@ # Javascript Test Guide -At MongoDB we write integration tests in JavaScript. These are tests written to exercise some behavior of a running MongoDB server, replica set, or sharded cluster. This guide aims to provide some general guidelines and best practices on how to write good tests. +At MongoDB we write integration tests in JavaScript. These are tests written to exercise some +behavior of a running MongoDB server, replica set, or sharded cluster. This guide aims to provide +some general guidelines and best practices on how to write good tests. ## Principles ### Minimize the test case as much as possible while still exercising and testing the desired behavior. -- For example, if you are testing that document deletion works correctly, it may be entirely sufficient to insert just a single document and then delete that document. Inserting multiple documents would be unnecessary. A guiding principle on this is to ask yourself how easy it would be for a new person coming to this test to quickly understand it. If there are multiple documents being inserted into a collection, in a test that only tests document deletion, a newcomer might ask the question: “is it important that the test uses multiple documents, or incidental?”. It is best if you can remove these kinds of questions from a person’s mind, by keeping only the absolute essential parts of a test. -- We should always strive for unittesting when possible, so if the functionality you want to test can be covered by a unit test, we should write a unit test instead. +- For example, if you are testing that document deletion works correctly, it may be entirely + sufficient to insert just a single document and then delete that document. Inserting multiple + documents would be unnecessary. A guiding principle on this is to ask yourself how easy it would + be for a new person coming to this test to quickly understand it. If there are multiple documents + being inserted into a collection, in a test that only tests document deletion, a newcomer might + ask the question: “is it important that the test uses multiple documents, or incidental?”. It is + best if you can remove these kinds of questions from a person’s mind, by keeping only the absolute + essential parts of a test. +- We should always strive for unittesting when possible, so if the functionality you want to test + can be covered by a unit test, we should write a unit test instead. ### Add a block comment at the top of the JavaScript test file giving a clear and concise overview of what a test is trying to verify. -- For tests that are more complicated, a brief description of the test steps might be useful as well. +- For tests that are more complicated, a brief description of the test steps might be useful as + well. ### Keep debuggability in mind. -- Assertion error messages should contain all information relevant to debugging the test. This means the server’s response from the failed command should almost always be included in the assertion error message. It can also be helpful to include parameters that vary during the test to avoid requiring the investigator to use the logs/backtrace to determine what the test was attempting to do. -- Think about how easy it would be to debug your test if something failed and a newcomer only had the logs of the test to look at. This can help guide your decision on what log messages to include and to what level of detail. The jsTestLog function is useful for this, as it is good at visually demarcating different phases of a test. As a tip, run your test a few times and just study the log messages, imagining you are an engineer debugging the test with only these logs to look at. Think about how understandable the logs would be to a newcomer. It is easy to add log messages to a test but then forget to see how they would actually appear. -- Never insert identical documents unless necessary. It is very useful in debugging to be able to figure out where a given piece of data came from. -- If a test does the same thing multiple times, consider factoring it out into a library. Shorter running tests are easier to debug and code duplication is always bad. +- Assertion error messages should contain all information relevant to debugging the test. This means + the server’s response from the failed command should almost always be included in the assertion + error message. It can also be helpful to include parameters that vary during the test to avoid + requiring the investigator to use the logs/backtrace to determine what the test was attempting to + do. +- Think about how easy it would be to debug your test if something failed and a newcomer only had + the logs of the test to look at. This can help guide your decision on what log messages to include + and to what level of detail. The jsTestLog function is useful for this, as it is good at visually + demarcating different phases of a test. As a tip, run your test a few times and just study the log + messages, imagining you are an engineer debugging the test with only these logs to look at. Think + about how understandable the logs would be to a newcomer. It is easy to add log messages to a test + but then forget to see how they would actually appear. +- Never insert identical documents unless necessary. It is very useful in debugging to be able to + figure out where a given piece of data came from. +- If a test does the same thing multiple times, consider factoring it out into a library. Shorter + running tests are easier to debug and code duplication is always bad. ### Do not hardcode collection or database names, especially if they are used multiple times throughout a test. -It is best to use variable names that attempt to describe what a value is used for. For example, naming a variable that stores a collection named `collectionToDrop` is much better than just naming the variable `collName`. +It is best to use variable names that attempt to describe what a value is used for. For example, +naming a variable that stores a collection named `collectionToDrop` is much better than just naming +the variable `collName`. ### Make every effort to make your test as deterministic as possible. -- Non-deterministic tests add noise to our build system and, in general, make it harder for yourself and other engineers to determine if the system really is working correctly or not. Flaky integration tests should be considered bugs, and we should not allow them to be committed to the server codebase. One way to make jstests more deterministic is to use failpoints to force the events happening in expected order. However, if we have to use failpoints to make this test deterministic, we should consider write a unit test instead. -- Note that our fuzzer and concurrency test suites are often an exception to this rule. In those cases we sometimes give up some level of determinism in order to trigger a wider class of rare edge cases. For targeted JavaScript integration tests, however, highly deterministic tests should be the goal. +- Non-deterministic tests add noise to our build system and, in general, make it harder for yourself + and other engineers to determine if the system really is working correctly or not. Flaky + integration tests should be considered bugs, and we should not allow them to be committed to the + server codebase. One way to make jstests more deterministic is to use failpoints to force the + events happening in expected order. However, if we have to use failpoints to make this test + deterministic, we should consider write a unit test instead. +- Note that our fuzzer and concurrency test suites are often an exception to this rule. In those + cases we sometimes give up some level of determinism in order to trigger a wider class of rare + edge cases. For targeted JavaScript integration tests, however, highly deterministic tests should + be the goal. ### Think hard about all the assumptions that the test relies on. -- For example, if a certain phase of the test ran much slower or much faster, would it cause your test to fail for the wrong reason? -- If your test includes hard-coded timeouts, make sure they are set appropriately. If a test is waiting for a certain condition to be true, and the test should not proceed until that condition is met, it is often correct to just wait “indefinitely”, instead of adding some arbitrary timeout value, like 30 seconds. In practice this usually means setting some reasonable upper limit, for example, 10 minutes. -- Also, for replication tests, make sure data exists on the right nodes at the right time. For example, if you a do a write and don’t explicitly wait for it to replicate, it might not reach a secondary node before you try to do the next step of the test. -- Does your test require data to be stored persistently? Remember that we have test variants that run on in-memory/ephemeral storage engines -- There are timeouts in the test suites and we aim to make all tests in the same suite finish before timeout. That says we should always make the test run quickly to keep the test short in terms of duration. +- For example, if a certain phase of the test ran much slower or much faster, would it cause your + test to fail for the wrong reason? +- If your test includes hard-coded timeouts, make sure they are set appropriately. If a test is + waiting for a certain condition to be true, and the test should not proceed until that condition + is met, it is often correct to just wait “indefinitely”, instead of adding some arbitrary timeout + value, like 30 seconds. In practice this usually means setting some reasonable upper limit, for + example, 10 minutes. +- Also, for replication tests, make sure data exists on the right nodes at the right time. For + example, if you a do a write and don’t explicitly wait for it to replicate, it might not reach a + secondary node before you try to do the next step of the test. +- Does your test require data to be stored persistently? Remember that we have test variants that + run on in-memory/ephemeral storage engines +- There are timeouts in the test suites and we aim to make all tests in the same suite finish before + timeout. That says we should always make the test run quickly to keep the test short in terms of + duration. ### Make tests fail as early as possible. -- If something goes wrong early in the test, it’s much harder to diagnose when that error becomes visible much later. -- Wrap every command in assert.commandWorked, or assert.commandFailedWithCode. There is also assert.commandFailed that won't check the return error code, but we should always try to use assert.commandFailedWithCode to make sure the test won't pass on an unexpected error. +- If something goes wrong early in the test, it’s much harder to diagnose when that error becomes + visible much later. +- Wrap every command in assert.commandWorked, or assert.commandFailedWithCode. There is also + assert.commandFailed that won't check the return error code, but we should always try to use + assert.commandFailedWithCode to make sure the test won't pass on an unexpected error. ### Be aware of all the configurations and variants that your test might run under. -- Make sure that your test still works correctly if is run in a different configuration or on a different platform than the one you might have tested on. -- Varying storage engines and suites can often affect a test’s behavior. For example, maybe your test fails unexpectedly if it runs with authentication turned on with an in-memory storage engine. You don’t have to run a new test on every possible platform before committing it, but you should be confident that your test doesn’t break in an unexpected configuration. +- Make sure that your test still works correctly if is run in a different configuration or on a + different platform than the one you might have tested on. +- Varying storage engines and suites can often affect a test’s behavior. For example, maybe your + test fails unexpectedly if it runs with authentication turned on with an in-memory storage engine. + You don’t have to run a new test on every possible platform before committing it, but you should + be confident that your test doesn’t break in an unexpected configuration. ### Avoid assertions that verify properties indirectly. -All assertions in a test should attempt to verify the most specific property possible. For example, if you are trying to test that a certain collection exists, it is better to assert that the collection’s exact name exists in the list of collections, as opposed to verifying that the collection count is equal to 1. The desired collection’s existence is sufficient for the collection count to be 1, but not necessary (a different collection could exist in its place). Be wary of adding these kind of indirect assertions in a test. +All assertions in a test should attempt to verify the most specific property possible. For example, +if you are trying to test that a certain collection exists, it is better to assert that the +collection’s exact name exists in the list of collections, as opposed to verifying that the +collection count is equal to 1. The desired collection’s existence is sufficient for the collection +count to be 1, but not necessary (a different collection could exist in its place). Be wary of +adding these kind of indirect assertions in a test. ### Test Isolation -Your JS test will likely be running with many other files before and after it. It's important to start from a known state, and to restore that state (to a reasonable extent) at the end of your test content. +Your JS test will likely be running with many other files before and after it. It's important to +start from a known state, and to restore that state (to a reasonable extent) at the end of your test +content. -- **Before**: If there are critical assumptions about the environment that your test needs, assert for it explicitly before proceeding to the real test content (instead of debugging side effects of that not being the case) - - If you have a precondition on the _environment_, use [`@tags`](./tags.md) instead of just an early-return. This will avoid the test being scheduled in the first place if the environment is not supported. -- **After**: If you are modifying the fixture, do everything possible to safely restore those changes at the end of your test content, even after a test failure. Resmokes' `--continueOnFailure` flag is used in CI, so the fixture is shared across many test files, and is only torn down at the end. - - Note, a fixture _can_ immediately "abort" after a test failure, only if [archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is configured, but that shouldn't be assumed because that is a per-suite configuration (and your test can run in many passthrough suite combinations). - - One easy approach to restoring your state is to use the [Mocha-style](#use-mocha-style-constructs) `after` hooks in your test content. +- **Before**: If there are critical assumptions about the environment that your test needs, assert + for it explicitly before proceeding to the real test content (instead of debugging side effects of + that not being the case) + - If you have a precondition on the _environment_, use [`@tags`](./tags.md) instead of just an + early-return. This will avoid the test being scheduled in the first place if the environment is + not supported. +- **After**: If you are modifying the fixture, do everything possible to safely restore those + changes at the end of your test content, even after a test failure. Resmokes' + `--continueOnFailure` flag is used in CI, so the fixture is shared across many test files, and is + only torn down at the end. + - Note, a fixture _can_ immediately "abort" after a test failure, only if + [archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is + configured, but that shouldn't be assumed because that is a per-suite configuration (and your + test can run in many passthrough suite combinations). + - One easy approach to restoring your state is to use the + [Mocha-style](#use-mocha-style-constructs) `after` hooks in your test content. ## Modern JS: Modules in Practice -We have fully migrated to the modularized JavaScript world so any new test should use modules and adapt the new style. +We have fully migrated to the modularized JavaScript world so any new test should use modules and +adapt the new style. ### Only import/export what you need. It's always important to keep the test context clean so we should only import/export what we need. -- The unused import is against [no-unused-vars](https://eslint.org/docs/latest/rules/no-unused-vars) rule in ESLint though we haven't enforced it. -- We don't have a linter to check export since it's hard to tell the necessity, but we should only export the modules that are imported by other tests or will be needed in the future. +- The unused import is against [no-unused-vars](https://eslint.org/docs/latest/rules/no-unused-vars) + rule in ESLint though we haven't enforced it. +- We don't have a linter to check export since it's hard to tell the necessity, but we should only + export the modules that are imported by other tests or will be needed in the future. ### Declare variables in proper scope. -In the past, we have seen tests referring some "undeclared" or "redeclared" variables, which are actually introduced through `load()`. Now with modules, the scope is more clear. We can use global variables properly to setup the test and don't need to worry about polluting other tests. +In the past, we have seen tests referring some "undeclared" or "redeclared" variables, which are +actually introduced through `load()`. Now with modules, the scope is more clear. We can use global +variables properly to setup the test and don't need to worry about polluting other tests. ### Name variables properly when exporting. -To avoid naming conflicts, we should not make the name of exported variables too general which could easily conflict with another variable from the test which import your module. For example, in the following case, the module exported a variable named `alphabet` and it will lead to a re-declaration error. +To avoid naming conflicts, we should not make the name of exported variables too general which could +easily conflict with another variable from the test which import your module. For example, in the +following case, the module exported a variable named `alphabet` and it will lead to a re-declaration +error. ``` import {alphabet} from "/matts/module.js"; @@ -87,7 +163,9 @@ const alphabet = "xyz"; // ERROR ### Prefer let/const over var -`let/const` should be preferred over `var` since these can help detect double declaration at the first place. Like, in the naming conflict example, if the second line is using var, it could easily mess up without throwing an error. +`let/const` should be preferred over `var` since these can help detect double declaration at the +first place. Like, in the naming conflict example, if the second line is using var, it could easily +mess up without throwing an error. ### Export in ES6 style @@ -116,7 +194,8 @@ This can help the language server to discover the methods and provide code navig ### Use Mocha-style Constructs -The [mochalite.js](../jstests/libs/mochalite.js) library ports over a subset of [MochaJS](https://mochajs.org/) functionality for the shell, including: +The [mochalite.js](../jstests/libs/mochalite.js) library ports over a subset of +[MochaJS](https://mochajs.org/) functionality for the shell, including: - `it` test contruction - `describe` suite structures @@ -125,19 +204,13 @@ The [mochalite.js](../jstests/libs/mochalite.js) library ports over a subset of - `before` and `after` hooks, to run _once_ around _all_ `it` tests - `beforeEach` and `afterEach` hooks, to run around _each_ `it` test - The above (excluding `describe` variants) also support `async` functions -- Resmoke test filtering using the `--mochagrep` flag, which mirrors the [`grep`](https://mochajs.org/#-grep-regexp-g-regexp) flag from MochaJS +- Resmoke test filtering using the `--mochagrep` flag, which mirrors the + [`grep`](https://mochajs.org/#-grep-regexp-g-regexp) flag from MochaJS Example using several APIs: ```js -import { - after, - afterEach, - before, - beforeEach, - describe, - it, -} from "jstests/libs/mochalite.js"; +import {after, afterEach, before, beforeEach, describe, it} from "jstests/libs/mochalite.js"; describe("simple inserts and finds", () => { before(() => { @@ -157,9 +230,7 @@ describe("simple inserts and finds", () => { assert.eq(this.fixtureDB.find({name: "test"}).count(), 1); }); it("should error on invalid data", () => { - const e = assert.throws(() => - this.fixtureDB.insert({notafield: undefined}), - ); + const e = assert.throws(() => this.fixtureDB.insert({notafield: undefined})); assert.eq(e.message, "Field 'notafield' not found"); }); }); @@ -182,7 +253,9 @@ buildscripts/resmoke.py run --suites=no_passthrough --mochagrep "do something" j ## Test Tags -JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary. Not scheduling a test to run is much faster than the test doing an early-return when preconditions are not met. +JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary. +Not scheduling a test to run is much faster than the test doing an early-return when preconditions +are not met. The simplest use case is having something like the following at the top of your js test file: diff --git a/jstests/libs/property_test_helpers/README.md b/jstests/libs/property_test_helpers/README.md index 942794afc0d..6c7a2c5a7c1 100644 --- a/jstests/libs/property_test_helpers/README.md +++ b/jstests/libs/property_test_helpers/README.md @@ -4,19 +4,31 @@ For a short introduction to property-based testing or fast-check, see [Appendix] ## Core PBT Design -The 'Core PBTs' are a subset of our property-based tests that use a shared schema and models. Their purpose is to provide basic coverage of our query language that may not be tested by the rest of our jstests. This means only simple stages such as $project, $match, $sort, etc are covered. More complicated stages such as $lookup or $facet are not tested. PBTs outside of the core set may test these more complex features. +The 'Core PBTs' are a subset of our property-based tests that use a shared schema and models. Their +purpose is to provide basic coverage of our query language that may not be tested by the rest of our +jstests. This means only simple stages such as $project, $match, $sort, etc are covered. More +complicated stages such as $lookup or $facet are not tested. PBTs outside of the core set may test +these more complex features. -These tests have been highly effective at finding bugs. As of writing they have caught 24 bugs in 8 months. See [SERVER-89308](https://jira.mongodb.org/browse/SERVER-89308) for a full list of issues. +These tests have been highly effective at finding bugs. As of writing they have caught 24 bugs in 8 +months. See [SERVER-89308](https://jira.mongodb.org/browse/SERVER-89308) for a full list of issues. The Core PBT design is built off of a few key principles about randomized testing: ### Properties Dictate the Models -In our fuzzer, we have grammar for most of MQL. While this provides more coverage, it means the property we assert is weaker. We can add as much as we'd like to the model, because the property comes second to the model. We're willing to add exceptions to the property to make it work. +In our fuzzer, we have grammar for most of MQL. While this provides more coverage, it means the +property we assert is weaker. We can add as much as we'd like to the model, because the property +comes second to the model. We're willing to add exceptions to the property to make it work. -However, the "model dictates the property" design also backfired, because in addition to exceptions in the property, we need to post-process the generated queries. Adding $sort to several places throughout an aggregation pipeline means we are no longer testing MQL, but rather an artificial subset of MQL that a user would never write. +However, the "model dictates the property" design also backfired, because in addition to exceptions +in the property, we need to post-process the generated queries. Adding $sort to several places +throughout an aggregation pipeline means we are no longer testing MQL, but rather an artificial +subset of MQL that a user would never write. -For this reason, the properties come first in our Core PBTs, and have few exceptions. They dictate what model we use so no postprocessing is needed. The PBT models are significantly smaller than the fuzzer models. +For this reason, the properties come first in our Core PBTs, and have few exceptions. They dictate +what model we use so no postprocessing is needed. The PBT models are significantly smaller than the +fuzzer models. ### Small Schema @@ -24,19 +36,32 @@ For this reason, the properties come first in our Core PBTs, and have few except A small number of fields in our schema allows us to find interesting interactions more easily. -An example of an interaction could be query optimizations. Let's say an optimization on `[{$match: {*field*: 5}}, {$sort: {*field*: 1}}]` only kicks in when the two fields are the same. In a PBT where there are one thousand possible fields (`a`, `b`, `c`, but also `a.b.c`, `a.a.a` and all combinations), the probability of finding this optimization is `1/1000`. With six fields, it's increased to `1/6`. +An example of an interaction could be query optimizations. Let's say an optimization on +`[{$match: {*field*: 5}}, {$sort: {*field*: 1}}]` only kicks in when the two fields are the same. In +a PBT where there are one thousand possible fields (`a`, `b`, `c`, but also `a.b.c`, `a.a.a` and all +combinations), the probability of finding this optimization is `1/1000`. With six fields, it's +increased to `1/6`. -Another interaction is between queries and indexes. Queries and indexes generated from a small schema make the indexes more likely to be used. +Another interaction is between queries and indexes. Queries and indexes generated from a small +schema make the indexes more likely to be used. -Bugs tend to come from interactions and special cases. A query that has no optimizations applied and does not use an index requires much less complicated logic, which is correlated to less bugs. +Bugs tend to come from interactions and special cases. A query that has no optimizations applied and +does not use an index requires much less complicated logic, which is correlated to less bugs. #### Simple Values to Avoid MQL Inconsistencies -Related to [Properties Dictate the Models](#properties-dictate-the-models), a simpler document model also allows for stronger properties. +Related to [Properties Dictate the Models](#properties-dictate-the-models), a simpler document model +also allows for stronger properties. -There are inconsistencies in our query language that are accepted behavior, but cause issues in property-based testing. We can work around them by being careful about the values we allow in documents. +There are inconsistencies in our query language that are accepted behavior, but cause issues in +property-based testing. We can work around them by being careful about the values we allow in +documents. -[SERVER-12869](https://jira.mongodb.org/browse/SERVER-12869) is an issue that stems from null and missing being encoded the same way in our index format. This means a covering plan (a plan with no `FETCH` node) cannot distinguish between null and missing. This inconsistency is the cause of lots of noise from our fuzzer, since one differing value in a query result can propogate. In our Core PBTs, we do not allow missing fields. This means: +[SERVER-12869](https://jira.mongodb.org/browse/SERVER-12869) is an issue that stems from null and +missing being encoded the same way in our index format. This means a covering plan (a plan with no +`FETCH` node) cannot distinguish between null and missing. This inconsistency is the cause of lots +of noise from our fuzzer, since one differing value in a query result can propogate. In our Core +PBTs, we do not allow missing fields. This means: - Documents must have all fields in the schema - We can only index fields in the schema @@ -44,7 +69,9 @@ There are inconsistencies in our query language that are accepted behavior, but `null` is allowed. -Floating point values are another area the PBTs avoid. Results can differ depending on the order of floating point operations. These differences can propogate. For this reason the only number values allowed are integers. +Floating point values are another area the PBTs avoid. Results can differ depending on the order of +floating point operations. These differences can propogate. For this reason the only number values +allowed are integers. ## Modeling Workloads @@ -62,8 +89,9 @@ A workload consists of a collection model and an aggregation model, in the follo } ``` -Using one workload model instead of separate (and independent) collection models and agg models allows them to be interrelated. -For example, if we want to model a PBT to test partial indexes where every query should satisfy the partial index filter, we can write: +Using one workload model instead of separate (and independent) collection models and agg models +allows them to be interrelated. For example, if we want to model a PBT to test partial indexes where +every query should satisfy the partial index filter, we can write: ``` fc.record({ @@ -78,7 +106,8 @@ fc.record({ }); ``` -and this is a valid workload model. If the collection and aggregation models are passed separately, they would be independent an unable to coordinate with shared arbitraries (like `partialFilter`). +and this is a valid workload model. If the collection and aggregation models are passed separately, +they would be independent an unable to coordinate with shared arbitraries (like `partialFilter`). ### Schema @@ -95,11 +124,13 @@ The Core PBT schema is: } ``` -For now, this is also a valid model for a document in a time-series collection (where `t` is the time field and `m` is the meta field), but the models may diverge. +For now, this is also a valid model for a document in a time-series collection (where `t` is the +time field and `m` is the meta field), but the models may diverge. ### Query Generation -These models cover a limited number of aggregation stages, located in `jstests/libs/property_test_helpers/models`. The supported stages are: +These models cover a limited number of aggregation stages, located in +`jstests/libs/property_test_helpers/models`. The supported stages are: - $project - $addFields @@ -112,7 +143,8 @@ These models cover a limited number of aggregation stages, located in `jstests/l #### Query Families Rather than generating single, standalone queries, our query model generates a "family" of queries. -At its leaves, a query family contains multiple values that the leaf could take on. For example instead of generating a single query with a concrete value `1` at the leaf: +At its leaves, a query family contains multiple values that the leaf could take on. For example +instead of generating a single query with a concrete value `1` at the leaf: ``` [{$match: {a: 1}}, {$project: {b: 0}}] @@ -133,7 +165,8 @@ Then we extract several queries that have the same shape. ``` This allows us to write properties that use the plan cache more often rather than relying on chance. -Properties can use the `getQuery` interface to ask for queries with different shapes, or the same shape with different leaf values plugged in. +Properties can use the `getQuery` interface to ask for queries with different shapes, or the same +shape with different leaf values plugged in. ## Core PBTs @@ -143,15 +176,15 @@ Details are provided at the top of each file. ## Debugging a PBT Failure -Currently, all PBTs have a fixed seed. -This means that as long as the bug it found is deterministic on the server's side, the PBT will consistently run into the issue. -If the bug is not deterministic, the PBT may or may not fail. +Currently, all PBTs have a fixed seed. This means that as long as the bug it found is deterministic +on the server's side, the PBT will consistently run into the issue. If the bug is not deterministic, +the PBT may or may not fail. ### Shrinking (Minimizing) -Once a counterexample (a failing case) to the property is found, fast-check tests will automatically attempt to shrink the issue. -Shrinking often does not reach the global minimum counterexample, since fast-check cannot make certain jumps. -For example it has no way of knowing that +Once a counterexample (a failing case) to the property is found, fast-check tests will automatically +attempt to shrink the issue. Shrinking often does not reach the global minimum counterexample, since +fast-check cannot make certain jumps. For example it has no way of knowing that `{$and: [{a: {$eq: 1}}]}` @@ -163,30 +196,39 @@ or even `{a: 1}` -This could be solved if fast-check had domain-specific knowledge about MQL or if it fuzzed counterexamples during shrinking. -However the counterexamples are usually small enough where there isn't much left to shrink. +This could be solved if fast-check had domain-specific knowledge about MQL or if it fuzzed +counterexamples during shrinking. However the counterexamples are usually small enough where there +isn't much left to shrink. -For non-deterministic issues, fast-check's shrinking is not as effective because it receives mixed signals from the property on whether the shrunk counterexamples fail or not. +For non-deterministic issues, fast-check's shrinking is not as effective because it receives mixed +signals from the property on whether the shrunk counterexamples fail or not. ### Failure Output -After a failure is minimized, the counterexample is printed out. -This includes debug data such as the counterexample that fast-check found and the error it ran into. -The counterexample will be a workload (see [Modeling Workloads](#modeling-workloads)), containing all information about the collection and queries run against it. +After a failure is minimized, the counterexample is printed out. This includes debug data such as +the counterexample that fast-check found and the error it ran into. The counterexample will be a +workload (see [Modeling Workloads](#modeling-workloads)), containing all information about the +collection and queries run against it. -To reproduce the issue, the workload can be copied and pasted into the failing property-based test, specifically by passing it in as the `examples` argument to `testProperty`. -fast-check will take these hand-written examples and run them before trying randomized examples. -See `partial_index_pbt.js` (which references `pbt_resolved_bugs.js`) for an example of this. -`partial_index_pbt.js` uses the `examples` argument to ensure workloads that previously would fail are run. -It can be used in the same way to repro existing bugs from BFs. +To reproduce the issue, the workload can be copied and pasted into the failing property-based test, +specifically by passing it in as the `examples` argument to `testProperty`. fast-check will take +these hand-written examples and run them before trying randomized examples. See +`partial_index_pbt.js` (which references `pbt_resolved_bugs.js`) for an example of this. +`partial_index_pbt.js` uses the `examples` argument to ensure workloads that previously would fail +are run. It can be used in the same way to repro existing bugs from BFs. # Appendix ## Property-Based Testing (PBT) -Property-based testing is a testing method that asserts properties hold over many example inputs. In our use of PBT, it involves two components, a "model" and a "property function". The model is a description of the object we are testing. It is used to generate examples of what the object looks like. These examples are routed into the property function, which asserts that the object has the characteristics we expect them to have. +Property-based testing is a testing method that asserts properties hold over many example inputs. In +our use of PBT, it involves two components, a "model" and a "property function". The model is a +description of the object we are testing. It is used to generate examples of what the object looks +like. These examples are routed into the property function, which asserts that the object has the +characteristics we expect them to have. -Let's say we wrote a new integer addition function `add` that we'd like to test. We could calculate the correct answer to different addition problems, and assert that `add` behaves correctly. +Let's say we wrote a new integer addition function `add` that we'd like to test. We could calculate +the correct answer to different addition problems, and assert that `add` behaves correctly. ``` assert.eq(add(1, 2), 3); @@ -194,7 +236,9 @@ assert.eq(add(-1, 1), 0); ... ``` -In addition to tests written with concrete values, we could also write a PBT to test for characteristics we expect `add` to have. Addition is commutative for example, meaning `add(a, b)` should always equal `add(b, a)`. We can write a function for this: +In addition to tests written with concrete values, we could also write a PBT to test for +characteristics we expect `add` to have. Addition is commutative for example, meaning `add(a, b)` +should always equal `add(b, a)`. We can write a function for this: ``` function testAdd(a, b){ @@ -202,12 +246,20 @@ function testAdd(a, b){ } ``` -The input to `testAdd` could use the builtin Javascript `Random` package, or a PBT library such as fast-check. +The input to `testAdd` could use the builtin Javascript `Random` package, or a PBT library such as +fast-check. -The way the query team uses PBT tends to be more complex, and almost always involves modeling a subset of our query language, documents, and indexes. Our fuzzer is a form of property-based testing, since we generate random queries and assert correctness against different controls (an older mongo version, a collection without indexes, etc) +The way the query team uses PBT tends to be more complex, and almost always involves modeling a +subset of our query language, documents, and indexes. Our fuzzer is a form of property-based +testing, since we generate random queries and assert correctness against different controls (an +older mongo version, a collection without indexes, etc) ## fast-check -fast-check (located in jstests/third_party/fast_check/fc-3.1.0.js) is a property-based testing framework for javascript/typescript. It provides building-block components to use for larger models, and has functionality to test properties against these models. It also has built-in logic for shrinking (minimizing) counterexamples to properties. +fast-check (located in jstests/third_party/fast_check/fc-3.1.0.js) is a property-based testing +framework for javascript/typescript. It provides building-block components to use for larger models, +and has functionality to test properties against these models. It also has built-in logic for +shrinking (minimizing) counterexamples to properties. -For an example of how to use fast-check to write a property-based test, see [project_coalescing.js](../../aggregation/sources/project/project_coalescing.js) +For an example of how to use fast-check to write a property-based test, see +[project_coalescing.js](../../aggregation/sources/project/project_coalescing.js) diff --git a/jstests/multiVersion/README.md b/jstests/multiVersion/README.md index c9e9714cea3..d146f88d88c 100644 --- a/jstests/multiVersion/README.md +++ b/jstests/multiVersion/README.md @@ -4,5 +4,7 @@ These tests test upgrade/downgrade behavior expected between different versions Those that begin failing upon branching should be assessed by the owner teams: -- Is the test only applicable to specific versions during specific development cycles? If so, delete it from irrelevant branches and master. -- Does the test add value for "last" (dynamic) version features? If so, modify the test to be more robust. These should always pass regardless of MongoDB version. +- Is the test only applicable to specific versions during specific development cycles? If so, delete + it from irrelevant branches and master. +- Does the test add value for "last" (dynamic) version features? If so, modify the test to be more + robust. These should always pass regardless of MongoDB version. diff --git a/jstests/multiVersion/genericSetFCVUsage/fcv_core/README.md b/jstests/multiVersion/genericSetFCVUsage/fcv_core/README.md index fb157a68ae5..692930596e0 100644 --- a/jstests/multiVersion/genericSetFCVUsage/fcv_core/README.md +++ b/jstests/multiVersion/genericSetFCVUsage/fcv_core/README.md @@ -1,3 +1,4 @@ # FCV / setFCV core infrastructure -This folder contains tests the core FCV and setFCV upgrade/downgrade infrastructure. It does not contain tests linked to any other particular feature. +This folder contains tests the core FCV and setFCV upgrade/downgrade infrastructure. It does not +contain tests linked to any other particular feature. diff --git a/jstests/query_golden/README.plan_stability.md b/jstests/query_golden/README.plan_stability.md index dbd0d5faa98..77b0e910ffd 100644 --- a/jstests/query_golden/README.plan_stability.md +++ b/jstests/query_golden/README.plan_stability.md @@ -1,6 +1,8 @@ # Introduction -The plan_stability tests record the current winning plan for a set of ~ 1K queries produced by SPM-3816. If those plans ever change, the test is expected to fail at which point a human would decide if the changed plans are for the better or for the worse. +The plan_stability tests record the current winning plan for a set of ~ 1K queries produced by +SPM-3816. If those plans ever change, the test is expected to fail at which point a human would +decide if the changed plans are for the better or for the worse. # Running @@ -13,7 +15,8 @@ $ buildscripts/resmoke.py run \ jstests/query_golden/plan_stability.js ``` -There are several resmoke suites predefined for different plan ranking modes, for which it is not needed to add mongod parameters: +There are several resmoke suites predefined for different plan ranking modes, for which it is not +needed to add mongod parameters: ```bash query_golden_cbr_automatic @@ -42,7 +45,9 @@ To obtain a diff that contains an individual diff fragment for each changed plan 2. Edit the `~/.golden_test_config.yml` to use a customized diff command: ```yml -diffCmd: 'git -c diff.plan_stability.xfuncname=">>>pipeline" diff --unified=0 --function-context --no-index "{{expected}}" "{{actual}}"' +diffCmd: + 'git -c diff.plan_stability.xfuncname=">>>pipeline" diff --unified=0 --function-context --no-index + "{{expected}}" "{{actual}}"' ``` 3. You can now run `buildscripts/golden_test.py diff` as usual and the output will look like this: @@ -68,15 +73,20 @@ This provides the plan that changed, the pipeline it belonged to, and the execut ## Using the summarization scripts -The `feature-extractor` internal repository contains a summarization script that can be used to obtain a summary of the failed test as well as information on the individual regressions that should be looked into. Please see `scripts/cbr/README.md` in that repository for more information. +The `feature-extractor` internal repository contains a summarization script that can be used to +obtain a summary of the failed test as well as information on the individual regressions that should +be looked into. Please see `scripts/cbr/README.md` in that repository for more information. # Debugging failures ## Which pipeline is the problematic one? -In Evergreen, the diff will most likely show a pipeline **below** the counters. This is however the following pipeline in the test, not the one you are looking for. The problematic pipeline is the one that comes **before** it in the `expected_output` file. +In Evergreen, the diff will most likely show a pipeline **below** the counters. This is however the +following pipeline in the test, not the one you are looking for. The problematic pipeline is the one +that comes **before** it in the `expected_output` file. -In local execution, if your environment is configured as described above, the diff will show the actual pipeline of interest, **above** the counters. +In local execution, if your environment is configured as described above, the diff will show the +actual pipeline of interest, **above** the counters. ## Running the offending pipelines manually @@ -98,7 +108,8 @@ and wait until the script has advanced to the following log line: [js_test:plan_stability] [jsTest] ---- ``` -2. Connect to `mongodb://127.0.0.1:20000` and run the offending pipeline against the `db.plan_stability` collection. +2. Connect to `mongodb://127.0.0.1:20000` and run the offending pipeline against the + `db.plan_stability` collection. ```bash mongosh mongodb://127.0.0.1:20000 @@ -113,7 +124,10 @@ db.plan_stability.aggregate(pipeline).explain().queryPlanner.rejectedPlans.sort( ## Converting the pipeline to JavaScript -The pipelines in the diff are **EJSON**-ish, while the mongosh shell expects **JavaScript**. EJSON-ish and JavaScript are identical when it comes to basic types, such as strings and integers, but if the pipeline contains timestamps and decimals, the JSON needs to be converted to JavaScript using `EJSON.parse()`: +The pipelines in the diff are **EJSON**-ish, while the mongosh shell expects **JavaScript**. +EJSON-ish and JavaScript are identical when it comes to basic types, such as strings and integers, +but if the pipeline contains timestamps and decimals, the JSON needs to be converted to JavaScript +using `EJSON.parse()`: ```js > pipelineStr = '[{"$match":{"field20_Timestamp_idx":{"$gt":{"$timestamp":{"t":1760551205,"i":0}}}},"field12_Decimal128_idx":{"$lte":{"$numberDecimal":"35.1"}}}]'; @@ -130,23 +144,26 @@ The pipelines in the diff are **EJSON**-ish, while the mongosh shell expects **J db.plan_stability2.aggregate(pipeline); ``` -Note that **ISO Timestamps** need to be handled separately. JSON will store those as strings, resulting in loss of typing information that `EJSON.parse()` can not recover. This will result in a semantic change in the query unless manually converted to an `ISODate` object: +Note that **ISO Timestamps** need to be handled separately. JSON will store those as strings, +resulting in loss of typing information that `EJSON.parse()` can not recover. This will result in a +semantic change in the query unless manually converted to an `ISODate` object: ```js // Manually convert // [{"$match":{"field19_datetime_idx":{"$gte":"2024-01-27T00:00:00.000Z"}}}] // to the correct JavaScript -pipeline = [ - {$match: {field19_datetime_idx: {$gte: ISODate("2024-01-27T00:00:00.000Z")}}}, -]; +pipeline = [{$match: {field19_datetime_idx: {$gte: ISODate("2024-01-27T00:00:00.000Z")}}}]; ``` ## Is the new plan better or worse? -For the majority of the plans, it will be obvious if the new plan is better or worse because all the execution counters would have moved in the same direction without any ambiguity. +For the majority of the plans, it will be obvious if the new plan is better or worse because all the +execution counters would have moved in the same direction without any ambiguity. -Some plans, such as those involving `$sort` or `$limit` will sometimes change in a way that makes some counters better while others become worse. For those queries, consider running them manually multiple times to compare their wallclock execution times: +Some plans, such as those involving `$sort` or `$limit` will sometimes change in a way that makes +some counters better while others become worse. For those queries, consider running them manually +multiple times to compare their wallclock execution times: ```javascript pipeline = [...]; @@ -162,11 +179,15 @@ You can also modify `collSize` in `plan_stability.js` to temporarily use a large If you want to run a comparison between estimation methods `X` and `Y`: -1. If method `X` is not multi-planning, place the `jstests/query_golden/expected_files/X` for estimation method `X` in the root of `expected_files`, so that they are used as the base for the comparison; +1. If method `X` is not multi-planning, place the `jstests/query_golden/expected_files/X` for + estimation method `X` in the root of `expected_files`, so that they are used as the base for the + comparison; -2. Temporary remove the expected files for method `Y` from `expected_files/query_golden/expected_files/Y` so that they are not considered; +2. Temporary remove the expected files for method `Y` from + `expected_files/query_golden/expected_files/Y` so that they are not considered; -3. Run the test as described above, specifying `featureFlagCostBasedRanker`/`internalQueryCBRCEMethod`; +3. Run the test as described above, specifying + `featureFlagCostBasedRanker`/`internalQueryCBRCEMethod`; 4. Use the summarization script as described above to produce a report. @@ -179,5 +200,5 @@ To accept the new plans, use `buildscripts/golden_test.py accept`, as with any o ## Removing individual pipelines If a given pipeline proves flaky, that is, is flipping between one plan and another for no reason, -you can comment it out from the test with a note. Re-run the test and then run `buildscripts/golden_test.py accept` -to persist the change. +you can comment it out from the test with a note. Re-run the test and then run +`buildscripts/golden_test.py accept` to persist the change. diff --git a/jstests/query_golden/join_opt/README.plan_stability.join_opt.md b/jstests/query_golden/join_opt/README.plan_stability.join_opt.md index 2a3c602fbb2..94738e43106 100644 --- a/jstests/query_golden/join_opt/README.plan_stability.join_opt.md +++ b/jstests/query_golden/join_opt/README.plan_stability.join_opt.md @@ -1,21 +1,26 @@ # Introduction -The plan stability tests for join optimization are golden tests that execute a number of joins against the TPC-H dataset. +The plan stability tests for join optimization are golden tests that execute a number of joins +against the TPC-H dataset. For each pipeline we persist the following in the golden test output: - the MQL command, including the base table and the pipeline - a concise representation of the winning plan for the query -- execution counters that quantify the effort it took to execute the query in terms of docs and keys examined +- execution counters that quantify the effort it took to execute the query in terms of docs and keys + examined - data about the resultset, such as the number of rows returned ## Prerequisites This test requires the following: -- The `mongorestore` tool, accessible on the $PATH. This tool is part of the [MongoDB Database Tools](https://www.mongodb.com/try/download/database-tools) package. +- The `mongorestore` tool, accessible on the $PATH. This tool is part of the + [MongoDB Database Tools](https://www.mongodb.com/try/download/database-tools) package. -- The TPC-H dataset, located in a directory named `tpc-h` that is on the same level as the mongodb repository. The dataset is available from the `query-benchmark-data` S3 bucket. You can retrieve it as follows: +- The TPC-H dataset, located in a directory named `tpc-h` that is on the same level as the mongodb + repository. The dataset is available from the `query-benchmark-data` S3 bucket. You can retrieve + it as follows: ```bash mkdir ~/tpc-h @@ -26,7 +31,8 @@ aws sso login aws s3 cp s3://query-benchmark-data/tpc-h/tpch-0.1-normalized.archive.gz tpc-h/tpch-0.1-normalized.archive.gz --region us-east-1 ``` -In evergreen, tasks such as `query_golden_join_optimization_plan_stability` make sure the prerequisites are already in place. +In evergreen, tasks such as `query_golden_join_optimization_plan_stability` make sure the +prerequisites are already in place. - The golden test framework configured with a custom diff rule @@ -77,13 +83,16 @@ The report contains the following information: - the most-improved queries, useful for obtaining examples for presentation purposes; - all individual failures, categorized and pretty-printed. -The report has one section per jstest -- if you are running multiple tests, each one will be processed and reported separately. +The report has one section per jstest -- if you are running multiple tests, each one will be +processed and reported separately. ## Debugging -> [!WARNING] > **_WARNING:_** The queries dumped by this test, the diff tooling or the summary report may contain EJSON constructs, such as $numberDecimal -> that are not properly processed by `coll.aggregate()` unless converted using `EJSON.parse()`. Typing information around ISO dates may have also been lost, so manually recreate those as `ISODate(...)`. -> See the "A note on the queries" section below for more information. +> [!WARNING] > **_WARNING:_** The queries dumped by this test, the diff tooling or the summary +> report may contain EJSON constructs, such as $numberDecimal that are not properly processed by +> `coll.aggregate()` unless converted using `EJSON.parse()`. Typing information around ISO dates may +> have also been lost, so manually recreate those as `ISODate(...)`. See the "A note on the queries" +> section below for more information. ### Determining the offending query @@ -91,7 +100,9 @@ Each query has an `idx` key that can be used to track it across files and report ### Starting a populated MongoDB instance -To obtain a running, populated MongoDB instance, run `resmoke.py run` with the `--pauseAfterPopulate` option. This will start mongod, load the data and then pause resmoke at the following line: +To obtain a running, populated MongoDB instance, run `resmoke.py run` with the +`--pauseAfterPopulate` option. This will start mongod, load the data and then pause resmoke at the +following line: ``` [js_test:plan_stability_join_opt_tpch] [jsTest] TestData.pauseAfterPopulate is set. Pausing indefinitely ... @@ -124,15 +135,18 @@ The collections will be restored to the `tpch` database. ## A note on the queries -The queries you see in files, diffs, bug reports may be in various formats, depending on whether they were dumped using JavaScript, python, or some other method. +The queries you see in files, diffs, bug reports may be in various formats, depending on whether +they were dumped using JavaScript, python, or some other method. -Therefore, it is important to obtain the query plan of the query and make sure that what you are seeing locally matches the plan from the bug report. +Therefore, it is important to obtain the query plan of the query and make sure that what you are +seeing locally matches the plan from the bug report. The following caveats are currently known: ### Typing information for timestamps -Typing information for timestamps is frequently lost, so a query may contain ISO timestamps as strings: +Typing information for timestamps is frequently lost, so a query may contain ISO timestamps as +strings: ```json {"l_commitdate": {"$lt": "1993-03-17T00:00:00"}} @@ -146,7 +160,8 @@ You will need to manually convert this into a timestamp: {'l_commitdate': {'$lt': new ISODate('1993-03-17T00:00:00')}} ``` -Since the typing information has been lost somewhere along the pipeline, no existing library is available to restore it for you. +Since the typing information has been lost somewhere along the pipeline, no existing library is +available to restore it for you. ### EJSON output @@ -158,6 +173,8 @@ Sometimes the query will be provided in EJSON, so you will see: in the output. -In mongosh, `aggregate()` does not support EJSON directly, so passing EJSON to it will succeed but will not produce the expected results. +In mongosh, `aggregate()` does not support EJSON directly, so passing EJSON to it will succeed but +will not produce the expected results. -Either pass this output as `EJSON.parse()` (if your input is a string), `EJSON.deserialize()` (if your input is parsed already) or manually convert it to standard MQL. +Either pass this output as `EJSON.parse()` (if your input is a string), `EJSON.deserialize()` (if +your input is parsed already) or manually convert it to standard MQL. diff --git a/jstests/suites/README.md b/jstests/suites/README.md index 9ec94a60b1a..f39527e48bd 100644 --- a/jstests/suites/README.md +++ b/jstests/suites/README.md @@ -2,15 +2,18 @@ Bazel test targets for resmoke suites. -For documentation of the `resmoke_suite_test` rule, see [bazel/resmoke/README.md](bazel/resmoke/README.md). +For documentation of the `resmoke_suite_test` rule, see +[bazel/resmoke/README.md](bazel/resmoke/README.md). ## Configuring -In addition to attributes for `resmoke_suite_test`, the following are options for configuring test targets. +In addition to attributes for `resmoke_suite_test`, the following are options for configuring test +targets. ### tags -Arbitrary tags may also be added to group test targets for batch execution. For example, a custom tag lets you run all matching suites at once: +Arbitrary tags may also be added to group test targets for batch execution. For example, a custom +tag lets you run all matching suites at once: ``` bazel test //jstests/suites/... --test_tag_filters=my_tag @@ -26,7 +29,8 @@ The following tags have special meaning: ### target_compatible_with -Configure platforms/build options that the test is compatible with. Use this to exclude the test suite from platforms in CI. +Configure platforms/build options that the test is compatible with. Use this to exclude the test +suite from platforms in CI. Example — exclude the test on PPC/S390x, MacOS, and TSAN builds: diff --git a/jstests/tags.md b/jstests/tags.md index 7fbd9c5f4f3..0cd41d6f446 100644 --- a/jstests/tags.md +++ b/jstests/tags.md @@ -1,6 +1,8 @@ # JS Test Tags -JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary. Not scheduling a test to run is much faster than the test doing an early-return when preconditions are not met. +JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary. +Not scheduling a test to run is much faster than the test doing an early-return when preconditions +are not met. The simplest use case is having something like the following at the top of your js test file: @@ -38,7 +40,10 @@ and can also include (meta) comments: */ ``` -The tags are meant to be used in suite configurations, to [`include_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorinclude_with_any_tags) and/or [`exclude_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorexclude_with_any_tags): +The tags are meant to be used in suite configurations, to +[`include_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorinclude_with_any_tags) +and/or +[`exclude_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorexclude_with_any_tags): ```bash test_kind: js_test @@ -50,7 +55,8 @@ selector: - disabled_for_fcv_6_1_upgrade ``` -Build variants can also use tags via the `test_flags` expansion, which facilitates tag-exclusions _across suites_ that run with the variant: +Build variants can also use tags via the `test_flags` expansion, which facilitates tag-exclusions +_across suites_ that run with the variant: ``` expansions: @@ -60,6 +66,9 @@ Build variants can also use tags via the `test_flags` expansion, which facilitat ## Available Tags -There is no current exhaustive list, since tags are arbitrary labels and do not need to be "registered". However, tags are always "global", and many are reused. Names should have communicate clear intent; and be reused/consolidated when appropriate. +There is no current exhaustive list, since tags are arbitrary labels and do not need to be +"registered". However, tags are always "global", and many are reused. Names should have communicate +clear intent; and be reused/consolidated when appropriate. -> Use `buildscripts/resmoke.py list-tags` to find which tags are actively referenced by suite configs, although there may be more in JS files and Build Variant expansions. +> Use `buildscripts/resmoke.py list-tags` to find which tags are actively referenced by suite +> configs, although there may be more in JS files and Build Variant expansions. diff --git a/jstests/with_mongot/cross_repo_testing_requirements.md b/jstests/with_mongot/cross_repo_testing_requirements.md index a2a97872e9a..267e3ac3b25 100644 --- a/jstests/with_mongot/cross_repo_testing_requirements.md +++ b/jstests/with_mongot/cross_repo_testing_requirements.md @@ -1,4 +1,6 @@ -Server engineers working on `search`, `changeStreams`, or any code that interacts with these features on v8.1+ branches are required to run all mongot integration tests defined on `10gen/mongod` and `10gen/mongot` before committing. +Server engineers working on `search`, `changeStreams`, or any code that interacts with these +features on v8.1+ branches are required to run all mongot integration tests defined on +`10gen/mongod` and `10gen/mongot` before committing. The simplest way to do this is to use this command to create your evergreen patch: @@ -6,14 +8,23 @@ The simplest way to do this is to use this command to create your evergreen patc This will auto select the e2e tests defined on both repos. -Unfortunately, evergreen doesn't support multiple alias options. For that reason, if you would like to create a patch that selects e2e tests defined on both repos AND the server's required variants in one fell swoop: +Unfortunately, evergreen doesn't support multiple alias options. For that reason, if you would like +to create a patch that selects e2e tests defined on both repos AND the server's required variants in +one fell swoop: evergreen patch -p mongodb-mongo-master --trigger-alias search-integration --alias required-and-mongot-e2e-tests -If your evergreen patch shows your changes failed a search e2e test defined on `10gen/mongod`, you can follow [these instructions](https://github.com/mongodb/mongo/blob/master/jstests/with_mongot/e2e/mongot_testing_instructions.md) for running that test locally on your VM. +If your evergreen patch shows your changes failed a search e2e test defined on `10gen/mongod`, you +can follow +[these instructions](https://github.com/mongodb/mongo/blob/master/jstests/with_mongot/e2e/mongot_testing_instructions.md) +for running that test locally on your VM. -If your evergreen patch shows your changes failed an e2e test defined on `10gen/mongot`, please reach out in #search-query-engineering for assistance from mongot engineers in translating and addressing the failure. +If your evergreen patch shows your changes failed an e2e test defined on `10gen/mongot`, please +reach out in #search-query-engineering for assistance from mongot engineers in translating and +addressing the failure. ### Didn't Find What You're Looking For? -Visit [the landing page](https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/search/README.md) for all `$search`/`$vectorSearch`/`$searchMeta` related documentation for server contributors. +Visit +[the landing page](https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/search/README.md) +for all `$search`/`$vectorSearch`/`$searchMeta` related documentation for server contributors. diff --git a/jstests/with_mongot/e2e/mongot_testing_instructions.md b/jstests/with_mongot/e2e/mongot_testing_instructions.md index ac9aaaad5db..26edd3d4a02 100644 --- a/jstests/with_mongot/e2e/mongot_testing_instructions.md +++ b/jstests/with_mongot/e2e/mongot_testing_instructions.md @@ -1,12 +1,20 @@ # Introduction -To run aggregation pipelines containing $search or $vectorSearch stages, you will need a mongot binary. You have the choice of running tests with a mongot binary currently running in production on Atlas (release), the latest mongot binary created via the most recent merge to 10gen/mongot repo (latest), or a mongot binary with unmerged local changes. +To run aggregation pipelines containing $search or $vectorSearch stages, you will need a mongot +binary. You have the choice of running tests with a mongot binary currently running in production on +Atlas (release), the latest mongot binary created via the most recent merge to 10gen/mongot repo +(latest), or a mongot binary with unmerged local changes. ## Using release or latest mongot -In order to acquire a release or latest mongot binary, from your ~/mongo directory you will need to know your virtual workstations OS and architecture. Assuming your VM is on ubuntu (the default), run `lscpu` in your terminal and inspect the first line of the response to confirm your VM's architecture. +In order to acquire a release or latest mongot binary, from your ~/mongo directory you will need to +know your virtual workstations OS and architecture. Assuming your VM is on ubuntu (the default), run +`lscpu` in your terminal and inspect the first line of the response to confirm your VM's +architecture. -The default behavior of setup-mongot-repro assume you want to download the latest version of mongot binary compatible with linux x86_64. In which case, if this works for your VM/testing needs, you can run: +The default behavior of setup-mongot-repro assume you want to download the latest version of mongot +binary compatible with linux x86_64. In which case, if this works for your VM/testing needs, you can +run: ###### @@ -48,7 +56,8 @@ If your VM is running macos, you can install the latest macos compatible mongot bazel run db-contrib-tool -- setup-mongot-repro-env --platform macos --installDir build/install/bin -Clearly, many options to play around with! To learn more about setup-mongot-repro-env command line options, use +Clearly, many options to play around with! To learn more about setup-mongot-repro-env command line +options, use ###### @@ -56,7 +65,8 @@ Clearly, many options to play around with! To learn more about setup-mongot-repr ## Compiling mongot from source -If you want to need to include unmerged changes in your mongot binary, you can compile a mongot with said changes locally on your VM. You will need to: +If you want to need to include unmerged changes in your mongot binary, you can compile a mongot with +said changes locally on your VM. You will need to: 1. **Clone the mongot repo to your VM** @@ -65,8 +75,7 @@ If you want to need to include unmerged changes in your mongot binary, you can c git clone git@github.com:10gen/mongot.git 2. **cd into your mongot repo and checkout the in-development branch you're interested in** -3. **Compile mongot** - If your VM is linux x86_64: +3. **Compile mongot** If your VM is linux x86_64: ###### @@ -84,7 +93,8 @@ If your VM is linux aarch64: tar -xvzf bazel-bin/deploy/mongot-localdev.tgz -5. **Move the resulting mongot binary** into the build directory that the server build system places mongod, mongos and shell binaries: +5. **Move the resulting mongot binary** into the build directory that the server build system places + mongod, mongos and shell binaries: ###### @@ -92,7 +102,8 @@ If your VM is linux aarch64: ## Adding Tests -To create a new search integration test, add a jstest to **jstests/with_mongot/e2e**. Any tests added there can be run in **single node replica set** or **sharded cluster environment**. +To create a new search integration test, add a jstest to **jstests/with_mongot/e2e**. Any tests +added there can be run in **single node replica set** or **sharded cluster environment**. **To run your test as a single node replica set:** @@ -112,19 +123,40 @@ To note, until SERVER-86616 is completed, your test will have to follow a partic 2. Create a search index 3. Run your queries -This order is required to ensure correctness. This is due to the nature of data replication between mongot and mongod. Mongot replicates data from mongod via a $changeStream and is thus eventually consistent with mongod collection data. Currently, the testing infrastructure ensures correctness by expecting engineers do not make document changes after index creation (as dictated by above order) + by having the createSearchIndex shell helper wait until mongot confirms the requested mongot index is queryable before returning. More specifically, createSearchIndex uses the status of the search index (READY) generated from $listSearchIndexes to know that the collection data has been fully replicated and indexed. If we update documents or add documents after index creation, the status of $listSearchIndexes doesn't guarantee anything about the status of data replication and queries could return incorrect results. +This order is required to ensure correctness. This is due to the nature of data replication between +mongot and mongod. Mongot replicates data from mongod via a +$changeStream and is thus eventually +consistent with mongod collection data. Currently, the testing infrastructure ensures correctness by +expecting engineers do not make document changes after index creation (as dictated by above order) + +by having the createSearchIndex shell helper wait until mongot confirms the requested mongot index +is queryable before returning. More specifically, createSearchIndex uses the status of the search +index (READY) generated from $listSearchIndexes to know that the collection data has been fully +replicated and indexed. If we update documents or add documents after index creation, the status of +$listSearchIndexes +doesn't guarantee anything about the status of data replication and queries could return incorrect +results. ## Downloading a mongot binary from an evergreen artifact -You can download the mongot binary that a specific evergreen patch or version utilized, which can be useful for trying to replicate errors. +You can download the mongot binary that a specific evergreen patch or version utilized, which can be +useful for trying to replicate errors. -You can download the mongot binary from any build variant that compiles mongot--i.e., variants which include the expansion `build_mongot: true` ([example](https://github.com/mongodb/mongo/blob/848b5264be2d0f93d21ffe2e4058e810f8ea18f2/etc/evergreen_yml_components/variants/amazon/test_dev_master_branch_only.yml#L194)). More specifically, that includes: +You can download the mongot binary from any build variant that compiles mongot--i.e., variants which +include the expansion `build_mongot: true` +([example](https://github.com/mongodb/mongo/blob/848b5264be2d0f93d21ffe2e4058e810f8ea18f2/etc/evergreen_yml_components/variants/amazon/test_dev_master_branch_only.yml#L194)). +More specifically, that includes: -- Compile variants that are depended upon by variants which run the search end to end tests, such as the variant `amazon-linux2023-arm64-static-compile` _(! Amazon Linux 2023 arm64 Enterprise Shared Library Compile & Static Analysis)_, which is depended upon by _! Amazon Linux 2023 arm64 Atlas Enterprise (all feature flags)_ -- Variants that compile mongot **and** run the search end to end tests, such as: `amazon-linux2023-arm64-mongot-integration-patchable` _(AL2023 arm64 mongot integration tasks)_ - - Note that this will be true of any of the build variants that include `mongot` in the name, such as _Enterprise RHEL 8.0 Mongot Integration_ +- Compile variants that are depended upon by variants which run the search end to end tests, such as + the variant `amazon-linux2023-arm64-static-compile` _(! Amazon Linux 2023 arm64 Enterprise Shared + Library Compile & Static Analysis)_, which is depended upon by _! Amazon Linux 2023 arm64 Atlas + Enterprise (all feature flags)_ +- Variants that compile mongot **and** run the search end to end tests, such as: + `amazon-linux2023-arm64-mongot-integration-patchable` _(AL2023 arm64 mongot integration tasks)_ + - Note that this will be true of any of the build variants that include `mongot` in the name, such + as _Enterprise RHEL 8.0 Mongot Integration_ -If you're confused about evergreen build variants, check out [Intro to Evergreen Concepts](https://docs.google.com/document/d/1kHi0YuzuRcMs1sRgXRRwy5-cSF4vasAT8lQjkg2hXCU/edit?usp=sharing). +If you're confused about evergreen build variants, check out +[Intro to Evergreen Concepts](https://docs.google.com/document/d/1kHi0YuzuRcMs1sRgXRRwy5-cSF4vasAT8lQjkg2hXCU/edit?usp=sharing). The general format of the command is: @@ -132,22 +164,33 @@ The general format of the command is: bazel run db-contrib-tool -- setup-repro-env --variant -Specifically, to download from the `AL2023 x86 mongot integration tasks cron only` build variant, you could run: +Specifically, to download from the `AL2023 x86 mongot integration tasks cron only` build variant, +you could run: ###### bazel run db-contrib-tool -- setup-repro-env --variant amazon-linux-2023-x86-mongot-integration-cron-only 23b790a2a81767b8edbbc266043a205029867b74 -By default, the download will be placed in `build/multiversion_bin//dist_test/`, but you can also specify a location via the `--installDir` option. For example: +By default, the download will be placed in +`build/multiversion_bin//dist_test/`, but you can also specify a +location via the `--installDir` option. For example: ###### bazel run db-contrib-tool -- setup-repro-env --variant amazon-linux2023-arm64-static-compile 23b790a2a81767b8edbbc266043a205029867b74 --installDir=build/multiversion_bin/my_variant -Will place the mongot binary in `build/multiversion_bin/my_variant/23b790a2a81767b8edbbc266043a205029867b74/dist_test/bin/mongot-localdev` +Will place the mongot binary in +`build/multiversion_bin/my_variant/23b790a2a81767b8edbbc266043a205029867b74/dist_test/bin/mongot-localdev` -General information about the `setup-repro-env` command can be found in its [README](https://github.com/10gen/db-contrib-tool/blob/main/src/db_contrib_tool/setup_repro_env/README.md#setting-up-a-specific-mongodb-version). Note that if you want to download the mongot binary, you'll have to pass in an appropriate `--variant`: if you don't specify, a variant that makes sense for your machine's architecture will be automatically chosen for you, and will very likely will not be one of the variants that compiles mongot. +General information about the `setup-repro-env` command can be found in its +[README](https://github.com/10gen/db-contrib-tool/blob/main/src/db_contrib_tool/setup_repro_env/README.md#setting-up-a-specific-mongodb-version). +Note that if you want to download the mongot binary, you'll have to pass in an appropriate +`--variant`: if you don't specify, a variant that makes sense for your machine's architecture will +be automatically chosen for you, and will very likely will not be one of the variants that compiles +mongot. ### Didn't Find What You're Looking For? -Visit [the landing page](https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/search/README.md) for all $search/$vectorSearch/$searchMeta related documentation for server contributors. +Visit +[the landing page](https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/search/README.md) +for all $search/$vectorSearch/$searchMeta related documentation for server contributors. diff --git a/modules_poc/README.md b/modules_poc/README.md index 172539782ec..52cb1a639c4 100644 --- a/modules_poc/README.md +++ b/modules_poc/README.md @@ -1,9 +1,9 @@ # Modules POC -This folder contains a POC implementation of a module metrics tracker and enforcement. This documentation includes -basic information about modules, and commands which will run the scanner across the entire first-party codebase and merge the results. All -commands are assumed to run at the root of the checkout, inside of a correctly activated python -virtual env. +This folder contains a POC implementation of a module metrics tracker and enforcement. This +documentation includes basic information about modules, and commands which will run the scanner +across the entire first-party codebase and merge the results. All commands are assumed to run at the +root of the checkout, inside of a correctly activated python virtual env. ## What is a module @@ -19,16 +19,16 @@ TODO ## Assigning files to modules -The file `modules_poc/modules.yaml` contains a list of modules, each containing -a list of files. Each file must be contained in only one module. Note that -module assignment is not required to map neatly to team ownership. +The file `modules_poc/modules.yaml` contains a list of modules, each containing a list of files. +Each file must be contained in only one module. Note that module assignment is not required to map +neatly to team ownership. -In cases where multiple globs match a file, the current rule is that the -longest glob wins. This is used as a simpler-to-implement version of -most-specific glob wins, which we may switch to in the future. +In cases where multiple globs match a file, the current rule is that the longest glob wins. This is +used as a simpler-to-implement version of most-specific glob wins, which we may switch to in the +future. -When submitting a review, you are strongly encouraged to include -a generated diff of the changes to the modules list. This can be done by running: +When submitting a review, you are strongly encouraged to include a generated diff of the changes to +the modules list. This can be done by running: ```bash modules_poc/mod_mapping.py --dump-modules-list > modules.old @@ -53,19 +53,17 @@ Github will nicely format the diff if you put it in a block like this: ### Showing assigned and unassigned files -Run `modules_poc/mod_mapping.py --dump-modules` to produce a `modules_dump.yaml` -file in current directory. This file is a multi-level map from -module name to team name to directory path to list of file names. -For unassigned files it uses `__NONE__` as the module name, and for unowned -files it uses `__NO_OWNER__` as the team, both of which conveniently sort first. -For owned files it uses the part of the team-name after `@10gen/` with `-` -replaced with `_` to be friendlier to querying. In cases where multiple teams -own a file, the file is duplicated to each team's list. +Run `modules_poc/mod_mapping.py --dump-modules` to produce a `modules_dump.yaml` file in current +directory. This file is a multi-level map from module name to team name to directory path to list of +file names. For unassigned files it uses `__NONE__` as the module name, and for unowned files it +uses `__NO_OWNER__` as the team, both of which conveniently sort first. For owned files it uses the +part of the team-name after `@10gen/` with `-` replaced with `_` to be friendlier to querying. In +cases where multiple teams own a file, the file is duplicated to each team's list. -This file can be viewed directly in VSCode. The yaml plugin's breadcrumbs and -folding are very helpful. [`yq`](https://github.com/kislyuk/yq) -([`jq`](https://jqlang.org) for yaml) is also a powerful tool. Here are a few -examples using it, some of which produce enough output to be worth opening in vscode: +This file can be viewed directly in VSCode. The yaml plugin's breadcrumbs and folding are very +helpful. [`yq`](https://github.com/kislyuk/yq) ([`jq`](https://jqlang.org) for yaml) is also a +powerful tool. Here are a few examples using it, some of which produce enough output to be worth +opening in vscode: ```bash # list of teams @@ -88,8 +86,7 @@ yq '.core | with_entries(select(.key != "server_programmability"))' modules_dump ## Specifying public and private module APIs -To make an API or class available for use by other modules, add a -tag to its header declaration. +To make an API or class available for use by other modules, add a tag to its header declaration. ``` class MONGO_MOD_PUBLIC Foo { @@ -107,17 +104,15 @@ namespace MONGO_MOD_PRIVATE my_details { } // namespace MONGO_MOD_PRIVATE my details ``` -Elements inside a class or namespace default to the visibility of the -enclosing scope. Note that the canonical version of "inside" can be -subtle, with, e.g., member functions being "inside" the class definition, -not the location the member function is defined. All forward -declarations of the same function or class should have the same visibility -tags, and forward declarations across module boundaries should be avoided. +Elements inside a class or namespace default to the visibility of the enclosing scope. Note that the +canonical version of "inside" can be subtle, with, e.g., member functions being "inside" the class +definition, not the location the member function is defined. All forward declarations of the same +function or class should have the same visibility tags, and forward declarations across module +boundaries should be avoided. -If visibility is not specified at any containing scope, -it defaults to `MONGO_MOD_PRIVATE` (except in cases where the -header doesn't include `mongo/util/modules.h`, where the default is `UNKNOWN` -to facilitate incrementally tagging APIs). +If visibility is not specified at any containing scope, it defaults to `MONGO_MOD_PRIVATE` (except +in cases where the header doesn't include `mongo/util/modules.h`, where the default is `UNKNOWN` to +facilitate incrementally tagging APIs). Documentation for individual `MONGO_MOD_*` tags is present in [`mongo/util/modules.h`](../src/mongo/util/modules.h). @@ -131,14 +126,13 @@ buildscripts/poetry_sync.sh # make sure the python env has the right packages in python modules_poc/merge_decls.py ``` -`merge_decls.py` takes an optional flag `--[no-]intra-module` to indicate whether you want to include -intra module accesses and declarations that are only used from within their module or submodules. It -defaults to `--intra-module`, which provides the most information to consumers. +`merge_decls.py` takes an optional flag `--[no-]intra-module` to indicate whether you want to +include intra module accesses and declarations that are only used from within their module or +submodules. It defaults to `--intra-module`, which provides the most information to consumers. -Running `merge_decls.py` also validates that private APIs aren't being used -outside of where they are permitted. If any are, the script will fail, though -`merged_decls.json` will still be generated, and the -invalid uses will be printed to stdout. +Running `merge_decls.py` also validates that private APIs aren't being used outside of where they +are permitted. If any are, the script will fail, though `merged_decls.json` will still be generated, +and the invalid uses will be printed to stdout. FETCH[a < 5]` - Use index `{b: 1}` to find all documents where `b < 50` and fetch `a` and `_id`, where `a < 5`. -Because the predicate on `a` is significantly more selective than the predicate on `b`, it is likely that plan (1) will return all results during the trial period. In doing so, we hit the "EOF" end condition of multiplanning and plan (1) becomes the winning plan. +Because the predicate on `a` is significantly more selective than the predicate on `b`, it is likely +that plan (1) will return all results during the trial period. In doing so, we hit the "EOF" end +condition of multiplanning and plan (1) becomes the winning plan. ## Plan Ranking -After the trial period is complete, [`plan_ranker::pickBestPlan()`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/query/plan_ranker_util.h#L210) assigns each `QuerySolution` a score based on its performance during the trial period. The [formula for the scores](https://github.com/mongodb/mongo/blob/23fcb16382bb8962d5648adfa7a7a2a828c555ec/src/mongo/db/query/plan_ranker.h#L160) is as follows: +After the trial period is complete, +[`plan_ranker::pickBestPlan()`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/query/plan_ranker_util.h#L210) +assigns each `QuerySolution` a score based on its performance during the trial period. The +[formula for the scores](https://github.com/mongodb/mongo/blob/23fcb16382bb8962d5648adfa7a7a2a828c555ec/src/mongo/db/query/plan_ranker.h#L160) +is as follows: ``` score = 1 @@ -116,41 +205,61 @@ score = 1 + tieBreakers ``` -When all the scores are tabulated, the plan with the highest score is deemed as the winning plan and is executed in full. +When all the scores are tabulated, the plan with the highest score is deemed as the winning plan and +is executed in full. ### Productivity Ratio -The productivity ratio is defined as `advanced / works`. This value represents the efficiency of the query plan, or "results per unit of effort". +The productivity ratio is defined as `advanced / works`. This value represents the efficiency of the +query plan, or "results per unit of effort". - `advanced`: The number of query results produced by the plan - `works`: Units of effort performed by the plan -Consider an indexed sort versus a blocking sort. An indexed sort will have a higher productivity than a blocking sort, because the blocking sort will need to sort the entire collection, continuously calling `work()` and returning `NEED_TIME` until the collection is sorted and `ADVANCED` can be returned. +Consider an indexed sort versus a blocking sort. An indexed sort will have a higher productivity +than a blocking sort, because the blocking sort will need to sort the entire collection, +continuously calling `work()` and returning `NEED_TIME` until the collection is sorted and +`ADVANCED` can be returned. > ### Aside: Blocking Sort > -> If an index cannot be used to obtain the sort order for a query, a "blocking sort" must be used. This operation implies that the executor must process all input documents to the sort before returning any results. +> If an index cannot be used to obtain the sort order for a query, a "blocking sort" must be used. +> This operation implies that the executor must process all input documents to the sort before +> returning any results. ### EOF Bonus -During multiplanning, it is possible for a `QuerySolution` to produce its entire correct result if the result set's size is less than one `batchsize` and it hasn't exceeded the maximum number of `works`. In this case, the multiplanner returns `isEOF` for this `QuerySolution` and the completed plan is awarded an EOF bonus of 1. +During multiplanning, it is possible for a `QuerySolution` to produce its entire correct result if +the result set's size is less than one `batchsize` and it hasn't exceeded the maximum number of +`works`. In this case, the multiplanner returns `isEOF` for this `QuerySolution` and the completed +plan is awarded an EOF bonus of 1. -Because the productivity ratio and tie-breakers will never surpass the EOF Bonus, it is guaranteed that a plan that finishes executing during multiplanning will be the winning plan, if one exists. +Because the productivity ratio and tie-breakers will never surpass the EOF Bonus, it is guaranteed +that a plan that finishes executing during multiplanning will be the winning plan, if one exists. ### Tie-Breakers -In the event of a tie between plans, there are a few metrics regarding the shape of the candidate `QuerySolution`s that can be used to determine which is marginally better. +In the event of a tie between plans, there are a few metrics regarding the shape of the candidate +`QuerySolution`s that can be used to determine which is marginally better. -[Some such metrics](https://github.com/mongodb/mongo/blob/95d1830ce1acffd0108932d04538ed9dc995ade5/src/mongo/db/query/plan_ranker.h#L120-L158) include: +[Some such metrics](https://github.com/mongodb/mongo/blob/95d1830ce1acffd0108932d04538ed9dc995ade5/src/mongo/db/query/plan_ranker.h#L120-L158) +include: - `noFetchBonus`: Bonus for covered plans (i.e. plans that do not have a `FETCH` stage) - `noSortBonus`: Bonus for plans that do not have a [blocking `SORT`](#aside-blocking-sort) stage. -- `noIxisectBonus`: Bonus for plans that do not have an `AND_HASH` or `AND_SORTED` index intersection stage. -- `groupByDistinctBonus`: Bonus for using a `DISTINCT_SCAN` in an aggregation context, where the bonus is [proportional](https://github.com/mongodb/mongo/blob/340adb94bc0d348f42f7b427a06418dfd27f4bfc/src/mongo/db/query/plan_ranker.h#L156) to the productivity of the `DISTINCT_SCAN`. +- `noIxisectBonus`: Bonus for plans that do not have an `AND_HASH` or `AND_SORTED` index + intersection stage. +- `groupByDistinctBonus`: Bonus for using a `DISTINCT_SCAN` in an aggregation context, where the + bonus is + [proportional](https://github.com/mongodb/mongo/blob/340adb94bc0d348f42f7b427a06418dfd27f4bfc/src/mongo/db/query/plan_ranker.h#L156) + to the productivity of the `DISTINCT_SCAN`. -Standard bonus is a value of [`epsilon`](https://github.com/mongodb/mongo/blob/340adb94bc0d348f42f7b427a06418dfd27f4bfc/src/mongo/db/query/plan_ranker.h#L115), unless noted otherwise. +Standard bonus is a value of +[`epsilon`](https://github.com/mongodb/mongo/blob/340adb94bc0d348f42f7b427a06418dfd27f4bfc/src/mongo/db/query/plan_ranker.h#L115), +unless noted otherwise. -Each `QuerySolution` that meets any of these specifications is awarded an additional point value to break the tie. +Each `QuerySolution` that meets any of these specifications is awarded an additional point value to +break the tie. ## The Runtime Planning Process @@ -194,17 +303,31 @@ flowchart TD ## Alternative Planners -Although `MultiPlanner` is our "standard" case, not all queries utilize a `MultiPlanner`. Under certain conditions, we may use a different planner that is a subclass of the abstract class [`ClassicPlannerInterface`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/query/classic_runtime_planner/planner_interface.h#L59); [`CachedPlanner`](../../../query/plan_cache/README.md) is one such example. Each subclass of `ClassicPlannerInterface` overrides [`doPlan()`](https://github.com/mongodb/mongo/blob/6b012bcbe4610ef1e88f9f75d171faa017503713/src/mongo/db/query/classic_runtime_planner/planner_interface.h#L117). `MultiPlanner`'s override, for example, calls [`MultiPlanStage::pickBestPlan()`](https://github.com/mongodb/mongo/blob/6b012bcbe4610ef1e88f9f75d171faa017503713/src/mongo/db/query/classic_runtime_planner/multi_planner.cpp#L55). +Although `MultiPlanner` is our "standard" case, not all queries utilize a `MultiPlanner`. Under +certain conditions, we may use a different planner that is a subclass of the abstract class +[`ClassicPlannerInterface`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/query/classic_runtime_planner/planner_interface.h#L59); +[`CachedPlanner`](../../../query/plan_cache/README.md) is one such example. Each subclass of +`ClassicPlannerInterface` overrides +[`doPlan()`](https://github.com/mongodb/mongo/blob/6b012bcbe4610ef1e88f9f75d171faa017503713/src/mongo/db/query/classic_runtime_planner/planner_interface.h#L117). +`MultiPlanner`'s override, for example, calls +[`MultiPlanStage::pickBestPlan()`](https://github.com/mongodb/mongo/blob/6b012bcbe4610ef1e88f9f75d171faa017503713/src/mongo/db/query/classic_runtime_planner/multi_planner.cpp#L55). Each subclass is detailed below: ### [`SingleSolutionPassthroughPlanner`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/query/classic_runtime_planner/planner_interface.h#L146) -If only one `QuerySolution` exists for a query, no decision needs to be made about which plan to use. In this case, we initialize a `SingleSolutionPassthroughPlanner`, which [does no planning](https://github.com/mongodb/mongo/blob/6b012bcbe4610ef1e88f9f75d171faa017503713/src/mongo/db/query/classic_runtime_planner/single_solution_passthrough_planner.cpp#L44) and creates a classic plan executor for its plan immediately. +If only one `QuerySolution` exists for a query, no decision needs to be made about which plan to +use. In this case, we initialize a `SingleSolutionPassthroughPlanner`, which +[does no planning](https://github.com/mongodb/mongo/blob/6b012bcbe4610ef1e88f9f75d171faa017503713/src/mongo/db/query/classic_runtime_planner/single_solution_passthrough_planner.cpp#L44) +and creates a classic plan executor for its plan immediately. ### [`IdHackPlanner`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/query/classic_runtime_planner/planner_interface.h#L132) -An `IDHACK` query is a special query of the form `{_id: {$eq: }}`. If a query meets this specification, we can bypass query planning and construct a "find-by-\_id" plan. This is a ["fast path" optimization](https://github.com/mongodb/mongo/blob/2ceaa45d0b28d1a057c7538327958bdd3c7222db/src/mongo/db/query/get_executor.cpp#L1170) and therefore does not consult the [plan cache](#cachedplanner). It is a guarantee that we will only use the default `_id` index to execute the query. +An `IDHACK` query is a special query of the form `{_id: {$eq: }}`. If a query meets this +specification, we can bypass query planning and construct a "find-by-\_id" plan. This is a +["fast path" optimization](https://github.com/mongodb/mongo/blob/2ceaa45d0b28d1a057c7538327958bdd3c7222db/src/mongo/db/query/get_executor.cpp#L1170) +and therefore does not consult the [plan cache](#cachedplanner). It is a guarantee that we will only +use the default `_id` index to execute the query. ### [`SubPlanner`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/query/classic_runtime_planner/planner_interface.h#L199) @@ -212,7 +335,13 @@ Rooted `$or` queries are eligible for subplanning. > **Rooted `$or`**: A query that contains an `$or` operator at its top level. -In these cases, a [`SubplanStage`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/exec/subplan.h#L82) calls [`buildSubPlan()`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/query/get_executor.cpp#L720), which constructs a special `PlanStage` tree. Here, each branch undergoes multiplanning separately. After a winner has been determined from each branch, the branches are ranked against each other, until a winning plan prevails for the entire rooted `$or`. +In these cases, a +[`SubplanStage`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/exec/subplan.h#L82) +calls +[`buildSubPlan()`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/query/get_executor.cpp#L720), +which constructs a special `PlanStage` tree. Here, each branch undergoes multiplanning separately. +After a winner has been determined from each branch, the branches are ranked against each other, +until a winning plan prevails for the entire rooted `$or`. For example: @@ -220,11 +349,18 @@ For example: db.c.find({$or: [{a: 1, x: 1}, {b: 1, y: 2}]}) ``` -In this query, the two branches: `{a: 1, x: 1}` and `{b: 1, y: 2}` would be multiplanned separately. Once a winning plan exists for each branch, the plan with the better score becomes the plan for the whole query. +In this query, the two branches: `{a: 1, x: 1}` and `{b: 1, y: 2}` would be multiplanned separately. +Once a winning plan exists for each branch, the plan with the better score becomes the plan for the +whole query. -Note that subplanning's interaction with the [plan cache](../../../query/plan_cache/README.md#subplanning) is done on a _per-clause basis_. For details on this interaction, see [here](https://github.com/mongodb/mongo/blob/21e7b8cfb79f3a7baae651055d5e1b3549dbdfda/src/mongo/db/exec/subplan.h#L70-L80). +Note that subplanning's interaction with the +[plan cache](../../../query/plan_cache/README.md#subplanning) is done on a _per-clause basis_. For +details on this interaction, see +[here](https://github.com/mongodb/mongo/blob/21e7b8cfb79f3a7baae651055d5e1b3549dbdfda/src/mongo/db/exec/subplan.h#L70-L80). -If, for any reason, subplanning fails on one of the children individually, [`choosePlanWholeQuery()`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/exec/subplan.cpp#L115) is called which falls back to multiplanning on the query as a whole. +If, for any reason, subplanning fails on one of the children individually, +[`choosePlanWholeQuery()`](https://github.com/mongodb/mongo/blob/12390d154c1d06b6082a03d2410ff2b3578a323e/src/mongo/db/exec/subplan.cpp#L115) +is called which falls back to multiplanning on the query as a whole. --- diff --git a/src/mongo/db/exec/runtime_planners/classic_runtime_planner_for_sbe/README.md b/src/mongo/db/exec/runtime_planners/classic_runtime_planner_for_sbe/README.md index 17ad2225e62..48e3c5fd327 100644 --- a/src/mongo/db/exec/runtime_planners/classic_runtime_planner_for_sbe/README.md +++ b/src/mongo/db/exec/runtime_planners/classic_runtime_planner_for_sbe/README.md @@ -4,8 +4,10 @@ Runtime planning is an algorithm for plan selection. It runs a set of candidate period, measures their productivity based on the number of documents returned and picks the most productive plan as the result. -Classic query execution engine uses [PlanStage::work()](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/plan_stage.h#L207) as the main entry point. This method has a semantic of "do some small unit of work" and returns -one of 3 results: +Classic query execution engine uses +[PlanStage::work()](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/plan_stage.h#L207) +as the main entry point. This method has a semantic of "do some small unit of work" and returns one +of 3 results: - ADVANCED - I have returned a document - NEED_TIME - no document is returned, but I did some work @@ -14,10 +16,12 @@ one of 3 results: This makes runtime planning easy: just call work() in a round-robin and count ADVANCED / (number of works) ratio as productivity. -SBE query execution engine (see [README](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/sbe/README.md)) +SBE query execution engine (see +[README](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/sbe/README.md)) while better at executing plans, doesn't support this algorithm well. -SBE's entry point is [sbe::PlanStage::getNext()](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/sbe/stages/stages.h#L743) +SBE's entry point is +[sbe::PlanStage::getNext()](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/sbe/stages/stages.h#L743) method that has only two results: ADVANCED or IS_EOF. Which means that unlike work(), a call to getNext() may take an arbitrary large amount of time. Especially if there are blocking stages like SORT. @@ -25,15 +29,17 @@ SORT. This means any attempt to round-robin between SBE plans can take time proportional to the longest plan, instead of the shortest plan. -In prior attempts to use SBE's execution model for planning, we had to lower the trial period limits to keep planning time in check, but that reduced the quality of selected plans. +In prior attempts to use SBE's execution model for planning, we had to lower the trial period limits +to keep planning time in check, but that reduced the quality of selected plans. To get both performance benefits of SBE engine and quality of runtime planning from Classic engine, we use Classic Runtime Planners and execute the best plan using SBE engine. ## Planners -Depending on the query and current state of the database, there are 4 main cases for planning. -Each case represented as a [PlannerInterface](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/query/planner_interface.h#L77) +Depending on the query and current state of the database, there are 4 main cases for planning. Each +case represented as a +[PlannerInterface](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/query/planner_interface.h#L77) implementation. ### `SingleSolutionPassthroughPlanner` @@ -44,59 +50,85 @@ If there is only one solution: 2. Creates pinned cache entry for it, if SBE plan cache is enabled. 3. Returns SBE plan executor. -"Pinned" means the query won't be considered for replanning; it's appropriate here because this is currently the only possible plan. +"Pinned" means the query won't be considered for replanning; it's appropriate here because this is +currently the only possible plan. ### `MultiPlanner` If there are multiple solutions: 1. Builds Classic plans for each solution. -2. Uses [MultiPlanStage::pickBestPlan()](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/multi_plan.h#L115) +2. Uses + [MultiPlanStage::pickBestPlan()](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/multi_plan.h#L115) to pick the best solution. 3. If aggregation pipeline is present, extends the solution with it. 4. Builds SBE plan for the best solution. -5. If best solution reached EOF during planning and there is no aggregation pipeline, returns the existing Classic plan executor, which just outputs the documents already found during multiplanning. +5. If best solution reached EOF during planning and there is no aggregation pipeline, returns the + existing Classic plan executor, which just outputs the documents already found during + multiplanning. 6. Otherwise, returns SBE plan executor that will restart the query from scratch in SBE. ### `SubPlanner` -Subplanning is a process where each clause in a rooted $or query is planned separately. -For example, in match expression `{$or: [{a: 1}, {b: 1}]}`parts`{a: 1}`and`{b: 1}` will be -planned independently. This is determined by [SubplanStage::canUseSubplanning()](https://github.com/mongodb/mongo/blob/59bfa0cc51bfbdaf0cde7184e63db77f5015c0a6/src/mongo/db/exec/subplan.cpp#L81) +Subplanning is a process where each clause in a rooted +$or query is planned separately. +For example, in match expression `{$or: [{a: 1}, {b: 1}]}`parts`{a: +1}`and`{b: 1}` will be planned independently. This is determined by +[SubplanStage::canUseSubplanning()](https://github.com/mongodb/mongo/blob/59bfa0cc51bfbdaf0cde7184e63db77f5015c0a6/src/mongo/db/exec/subplan.cpp#L81) If subplanning can be used: -1. Uses [SubplanStage::pickBestPlan()](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/subplan.h#L129) +1. Uses + [SubplanStage::pickBestPlan()](https://github.com/mongodb/mongo/blob/bec23e4e782bae764122dfa1931cd0d2ad5a1e07/src/mongo/db/exec/subplan.h#L129) to pick the best solution. 2. If aggregation pipeline is present, extends the solution with it. 3. Returns SBE plan executor. ### `ValidCandidatePlanner` -This planner is used after we've taken a plan from the cache and the trial period has ended. This planner merely continues execution of the plan that was generated from the cache entry. +This planner is used after we've taken a plan from the cache and the trial period has ended. This +planner merely continues execution of the plan that was generated from the cache entry. ## Plan Caching -Classic Runtime Planning for SBE supports both the SBE plan cache and the classic plan cache. By default, the classic plan cache is always used. The SBE plan cache is used only when `featureFlagSbeFull` is enabled. +Classic Runtime Planning for SBE supports both the SBE plan cache and the classic plan cache. By +default, the classic plan cache is always used. The SBE plan cache is used only when +`featureFlagSbeFull` is enabled. ### Writing to the Cache -Each planner is responsible for writing to the correct cache, if necessary. For example, the MultiPlanner will write to the plan cache after the underlying (classic) MultiPlanStage picks the best plan. If the SBE cache is enabled, the entire SBE plan, including any pushed-down agg pipeline, will be written. Otherwise, the find() portion of the query which was multi-planned will get written to the classic cache. +Each planner is responsible for writing to the correct cache, if necessary. For example, the +MultiPlanner will write to the plan cache after the underlying (classic) MultiPlanStage picks the +best plan. If the SBE cache is enabled, the entire SBE plan, including any pushed-down agg pipeline, +will be written. Otherwise, the find() portion of the query which was multi-planned will get written +to the classic cache. -During sub-planning, with classic cache enabled, we sometimes write cache entries which are used only for future sub-planning. For example, if an OR query with an equality predicate on `a` and `b` is run, a cache entry will be written for both branches. If another OR query with a predicate on `a` and `c` is run, the cache entry for `a` may be re-used here, to avoid multi planning that branch again. +During sub-planning, with classic cache enabled, we sometimes write cache entries which are used +only for future sub-planning. For example, if an OR query with an equality predicate on `a` and `b` +is run, a cache entry will be written for both branches. If another OR query with a predicate on `a` +and `c` is run, the cache entry for `a` may be re-used here, to avoid multi planning that branch +again. ### Reading from the Cache -At the `get_executor` level, we determine which cache to read from based on the `featureFlagSbeFull` value. If a cache entry is found, a `PlannerGenerator` is then created which does the job of translating the cache entry into a `PlannerInterface` which can then be used to generate a plan. There are two `PlannerGenerator` types: +At the `get_executor` level, we determine which cache to read from based on the `featureFlagSbeFull` +value. If a cache entry is found, a `PlannerGenerator` is then created which does the job of +translating the cache entry into a `PlannerInterface` which can then be used to generate a plan. +There are two `PlannerGenerator` types: -`PlannerGeneratorFromSbeCacheEntry` will use the cache entry's SBE plan to create a clone with the parameters filled in, then run the trial period. +`PlannerGeneratorFromSbeCacheEntry` will use the cache entry's SBE plan to create a clone with the +parameters filled in, then run the trial period. -`PlannerGeneratorFromClassicCacheEntry` will take the QSN tree for the given query + pipeline, lower it to SBE via the stage builders, and run the trial period. +`PlannerGeneratorFromClassicCacheEntry` will take the QSN tree for the given query + pipeline, lower +it to SBE via the stage builders, and run the trial period. The PlannerGenerators will then produce one of: 1. A `SingleSolutionPassthroughPlanner` if the cached plan is the only option. -2. A `ValidCandidatePlanner`, if the trial period was successful and we should continue using the cached plan. -3. A `MultiPlanner` if the trial was not successful, or replanning is necessary for any other reason. +2. A `ValidCandidatePlanner`, if the trial period was successful and we should continue using the + cached plan. +3. A `MultiPlanner` if the trial was not successful, or replanning is necessary for any other + reason. -After this, the returned `Planner` encapsulates any work that was cached, and we continue planning as usual. +After this, the returned `Planner` encapsulates any work that was cached, and we continue planning +as usual. diff --git a/src/mongo/db/exec/sbe/README.md b/src/mongo/db/exec/sbe/README.md index 9c0eaed3ea2..ad1f70cd3c4 100644 --- a/src/mongo/db/exec/sbe/README.md +++ b/src/mongo/db/exec/sbe/README.md @@ -17,21 +17,21 @@ A value is any entity that we are interested in accessing or manipulating during These can range from simple values like integers and strings to more complex values like objects and arrays. They closely resemble values in functional programming languages, that is, they are neither shared, nor do they have identity (i.e. variables with the same numeric value are not conceptually -different entities). Some SBE values are [modeled off of -BSONTypes](https://github.com/mongodb/mongo/blob/f2b093acd48aee3c63d1a0e80a101eeb9925834a/src/mongo/bson/bsontypes.h#L63-L114) +different entities). Some SBE values are +[modeled off of BSONTypes](https://github.com/mongodb/mongo/blob/f2b093acd48aee3c63d1a0e80a101eeb9925834a/src/mongo/bson/bsontypes.h#L63-L114) while others represent internal C++ types such as [collators](https://github.com/mongodb/mongo/blob/d19ea3f3ff51925e3b45c593217f8901373e4336/src/mongo/db/exec/sbe/values/value.h#L216-L217). One type that deserves a special mention is `Nothing`, which indicates the absence of a value. It is often used in SBE to indicate that a result cannot be computed instead of raising an exception -(similar to the [Maybe -Monad]() in many -functional programming languages). +(similar to the +[Maybe Monad]() in +many functional programming languages). -Values are identified by a [1 byte -TypeTag](https://github.com/mongodb/mongo/blob/d19ea3f3ff51925e3b45c593217f8901373e4336/src/mongo/db/exec/sbe/values/value.h#L132-L254) -, which denotes the type of the value that we are looking at, and an [8 byte -value](https://github.com/mongodb/mongo/blob/d19ea3f3ff51925e3b45c593217f8901373e4336/src/mongo/db/exec/sbe/values/value.h#L328-L331), +Values are identified by a +[1 byte TypeTag](https://github.com/mongodb/mongo/blob/d19ea3f3ff51925e3b45c593217f8901373e4336/src/mongo/db/exec/sbe/values/value.h#L132-L254) +, which denotes the type of the value that we are looking at, and an +[8 byte value](https://github.com/mongodb/mongo/blob/d19ea3f3ff51925e3b45c593217f8901373e4336/src/mongo/db/exec/sbe/values/value.h#L328-L331), which is the value itself. If the value is shallow (that is, it requires 8 bytes or less to represent), then the 8 bytes are used to store the value itself. If the value requires more than 8 bytes, the 8 bytes are used to store a pointer to a heap-allocated block of memory which contains @@ -41,8 +41,7 @@ the value. In order to use values to implement the semantics of a given query language, we need a mechanism to compute over them. To accomplish this, SBE provides an expression language defined by the -[EExpression -class](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L55-L66). +[EExpression class](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L55-L66). EExpressions form a tree and their goal is to produce values during evaluation. It's worth noting that EExpressions aren't tied to expressions in the Mongo Query Language, rather, they are meant to be building blocks that can be combined to express arbitrary query language semantics. Below is an @@ -60,12 +59,14 @@ overview of the different EExpression types: - [ESwitch](https://github.com/mongodb/mongo/blob/a04e7eea7dea44ee536703dbd98e7f832a495d11/src/mongo/db/exec/sbe/expressions/expression.h#L509-L567): Represents a multi-conditional switch expression (a.k.a. if-then-elif-...-else expression). - [EFunction](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L416-L438): - Represents a named, built-in function supported natively by the engine. At the time of writing, there are over [150 such - functions](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.cpp#L564-L567). + Represents a named, built-in function supported natively by the engine. At the time of writing, + there are over + [150 such functions](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.cpp#L564-L567). Note that function parameters are evaluated first and then are passed as arguments to the function. - [EFail](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L511-L534): - Represents an exception and produces a query fatal error if reached at query runtime. It supports numeric error codes and error strings. + Represents an exception and produces a query fatal error if reached at query runtime. It supports + numeric error codes and error strings. - [ENumericConvert](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L536-L566): Represents the conversion of an arbitrary value to a target numeric type. - [EVariable](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L281-L319) @@ -75,27 +76,27 @@ overview of the different EExpression types: when we want to reference some intermediate value multiple times. - [ELocalLambda](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L487-L507) Represents an anonymous function which takes one or two input parameters. Many `EFunctions` accept - these as parameters. A good example of this is the [`traverseF` - function](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.cpp#L1329-L1357): + these as parameters. A good example of this is the + [`traverseF` function](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.cpp#L1329-L1357): it accepts 2 parameters: an input and an `ELocalLambda`. If the input is an array, the `ELocalLambda` is applied to each element in the array, otherwise, it is applied to the input on - its own. The second argument of the lambda receives the 0-based position of the element being examined; - when the `ELocalLambda` is being applied to the entire input, the second argument will have a value of -1. + its own. The second argument of the lambda receives the 0-based position of the element being + examined; when the `ELocalLambda` is being applied to the entire input, the second argument will + have a value of -1. -EExpressions cannot be executed directly. Rather, [they are -compiled](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L81-L84) +EExpressions cannot be executed directly. Rather, +[they are compiled](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L81-L84) into executable [`ByteCode`](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.h#L1382) -in the SBE Virtual Machine, or VM. SBE `ByteCode` execution closely resembles the [fetch, decode, -execute](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.cpp#L9638-L9641) -cycle in assembly/machine code execution. In particular, EExpressions are compiled to [a linear -buffer of -instructions](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.h#L1356-L1357) -and execution state is represented by a [program -counter](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.cpp#L9635-L9638) +in the SBE Virtual Machine, or VM. SBE `ByteCode` execution closely resembles the +[fetch, decode, execute](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.cpp#L9638-L9641) +cycle in assembly/machine code execution. In particular, EExpressions are compiled to +[a linear buffer of instructions](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.h#L1356-L1357) +and execution state is represented by a +[program counter](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.cpp#L9635-L9638) (a pointer into the instruction buffer, which is computed by taking the address of the buffer and -adding an offset to it) and [a -stack](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.h#L2192-L2199) +adding an offset to it) and +[a stack](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.h#L2192-L2199) which maintains values produced during execution. Generally speaking, instructions will obtain their arguments by popping arguments off of the stack and will push the result of evaluation onto the stack. @@ -108,8 +109,7 @@ in detail, please reference [the Virtual Machine section below](#virtual-machine To make use of SBE values (either those produced by executing `ByteCode`, or those maintained elsewhere), we need a mechanism to reference them throughout query execution. This is where slots come into play: A slot is a mechanism for reading and writing values at query runtime. Each slot is -[uniquely identified by a numeric -SlotId](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/values/slot.h#L41-L48). +[uniquely identified by a numeric SlotId](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/values/slot.h#L41-L48). Put another way, slots conceptually represent values that we care about during query execution, including: @@ -118,9 +118,8 @@ including: - The individual components of a sort key (where each component is bound to its own slot) - The result of executing some computation expressed in the input query -SlotIds by themselves don't provide a means to access or set values, rather, [slots are associated -with -SlotAccessors](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/values/slot.h#L50-L55), +SlotIds by themselves don't provide a means to access or set values, rather, +[slots are associated with SlotAccessors](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/values/slot.h#L50-L55), which provide the API to read the values bound to slots as well as to write new values into slots. There are several types of SlotAccessors, but the most common are the following: @@ -163,15 +162,16 @@ perform query execution using values, EExpressions, and slots. They are the node execution tree when combined. PlanStages are pull-based in that they pull data from their child stages (as opposed to a push-based model where stages offer data to parent stages). -A single `PlanStage` may have any number of children and performs some action, implements some algorithm, -or maintains some execution state, such as: +A single `PlanStage` may have any number of children and performs some action, implements some +algorithm, or maintains some execution state, such as: - Computing values bound to slots - Managing the lifetime of values in slots - Executing compiled `ByteCode` - Buffering values into memory -SBE PlanStages also follow an iterator model and perform query execution through the following steps: +SBE PlanStages also follow an iterator model and perform query execution through the following +steps: - First, a caller prepares a PlanStage tree for execution by calling `prepare()`. - Once the tree is prepared, the caller then calls `open()` to initialize the tree with any state @@ -182,15 +182,17 @@ SBE PlanStages also follow an iterator model and perform query execution through values from slots. - Finally, `close()` is called to indicate that query execution is complete and release resources. -The following subsections describe [the PlanStage API](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/stages.h#L557-L651) introduced above in greater detail: +The following subsections describe +[the PlanStage API](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/stages.h#L557-L651) +introduced above in greater detail: ### `virtual void prepare(CompileCtx& ctx) = 0;` This method prepares a `PlanStage` (and, recursively, its children) for execution by: - Performing slot resolution, that is, obtaining `SlotAccessors` for all slots that this stage - references and verifying that all slot accesses are valid. Typically, this is done by asking - child stages for a `SlotAccessor*` via `getAccessor()`. + references and verifying that all slot accesses are valid. Typically, this is done by asking child + stages for a `SlotAccessor*` via `getAccessor()`. - Compiling `EExpressions` into executable `ByteCode`. Note that `EExpressions` can reference slots through the `ctx` parameter. @@ -206,8 +208,8 @@ good example of this is the [`HashAggStage`](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/hash_agg.cpp#L273): stages above a `HashAggStage` in the tree parent stages cannot access slots below the `HashAggStage`. This is because this stage will exhaust its PlanStage subtree, which renders all -slots in said subtree invalid. For more details on slot resolution, consult [the corresponding -section](#slot-resolution). +slots in said subtree invalid. For more details on slot resolution, consult +[the corresponding section](#slot-resolution). ### `virtual void open(bool reOpen) = 0;` @@ -233,8 +235,8 @@ expensive and ultimately redundant. This is where the `reOpen` parameter of `ope set to `true`, it provides the opportunity to execute an optimized a sequence of `close()` and `open()` calls. -A good example of this is the [HashAgg -stage](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/hash_agg.cpp#L426): +A good example of this is the +[HashAgg stage](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/hash_agg.cpp#L426): calling `HashAgg::open()` involves draining a child stage and buffering values into a hash table. Closing the plan and then immediately opening it would involve destroying the internal hash table and then reconstructing it, which wastes a lot of work. Instead, calling `open(reOpen = true)` @@ -258,26 +260,23 @@ At the time of writing, there are 36 PlanStages. As such, only a handful of comm Advances a storage cursor over a collection. This stage can function both as a 'scan' (read the contents of a collection from start to finish) or as a 'seek' (position the cursor to a specific RecordId, and read until EOF or a RecordId upper bound). It returns `IS_EOF` if the cursor is -exhausted or if the underlying storage cursor has advanced beyond the seek bounds. ScanStage [owns -slots for the Record and RecordId -returned](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/scan.h#L393-L398) +exhausted or if the underlying storage cursor has advanced beyond the seek bounds. ScanStage +[owns slots for the Record and RecordId returned](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/scan.h#L393-L398) from the cursor and will update them on each call to `getNext()` if these slots are defined. -The ScanStage [supports binding the values of top level fields from records to -slots](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/scan.cpp#L584-L640). +The ScanStage +[supports binding the values of top level fields from records to slots](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/scan.cpp#L584-L640). This is a very useful optimization as it saves parent stages from having to perform a linear time lookup over the input BSON for each top level field. ### [IndexScanStageBase::getNext()](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/ix_scan.cpp#L289-L345) -Advances a storage cursor over an index. Note that this PlanStage is abstract and must [be derived -from to describe how to seek the -index](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/ix_scan.h#L127-L128). -Much like `ScanStage`, `IndexScanStageBase` [maintains slots for the index key and the RecordId corresponding to the -key](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/ix_scan.h#L172-L186). -It also has an optimization that allows for [binding a subset of the components of the index key -returned by the index to -slots](https://github.com/mongodb/mongo/blob/dbbabbdc0f3ef6cbb47500b40ae235c1258b741a/src/mongo/db/exec/sbe/values/value.cpp#L889-L929). +Advances a storage cursor over an index. Note that this PlanStage is abstract and must +[be derived from to describe how to seek the index](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/ix_scan.h#L127-L128). +Much like `ScanStage`, `IndexScanStageBase` +[maintains slots for the index key and the RecordId corresponding to the key](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/ix_scan.h#L172-L186). +It also has an optimization that allows for +[binding a subset of the components of the index key returned by the index to slots](https://github.com/mongodb/mongo/blob/dbbabbdc0f3ef6cbb47500b40ae235c1258b741a/src/mongo/db/exec/sbe/values/value.cpp#L889-L929). ### [FilterStage::getNext()](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/filter.h#L127-L141) @@ -290,11 +289,11 @@ Implements a join over an outer subplan and an inner subplan. Though it implemen Join algorithm, it is not necessarily used to implement a `$lookup` or even a traditional join, rather, it models a runtime loop. More precisely, for every call to `getNext()` on the outer stage, `LoopJoinStage` reopens the inner stage and calls `getNext()` on it. The inner stage is iterated on -subsequent `getNext()` calls until `IS_EOF` is returned. This stage supports [Right, Left, and Outer -Joins](https://github.com/mongodb/mongo/blob/dbbabbdc0f3ef6cbb47500b40ae235c1258b741a/src/mongo/db/exec/sbe/stages/loop_join.h#L47). +subsequent `getNext()` calls until `IS_EOF` is returned. This stage supports +[Right, Left, and Outer Joins](https://github.com/mongodb/mongo/blob/dbbabbdc0f3ef6cbb47500b40ae235c1258b741a/src/mongo/db/exec/sbe/stages/loop_join.h#L47). -Note that slots from the outer stage can be made visible to [inner stage via -LoopJoinStage::\_outerCorrelated](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/loop_join.cpp#L105-L107), +Note that slots from the outer stage can be made visible to +[inner stage via LoopJoinStage::\_outerCorrelated](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/loop_join.cpp#L105-L107), which adds said slots to the `CompileCtx` during `prepare()`. Conceptually, this is similar to the rules around scoped variables in for loops in many programming languages: @@ -333,7 +332,8 @@ Indexes: db.alumni.find({"major" : "Computer Science", "year": 2020}); ``` -The query plan chosen by the classic optimizer, represented as a `QuerySolution` tree, to answer this query is as follows: +The query plan chosen by the classic optimizer, represented as a `QuerySolution` tree, to answer +this query is as follows: ``` { @@ -369,9 +369,10 @@ The query plan chosen by the classic optimizer, represented as a `QuerySolution` } ``` -In particular, it is an `IXSCAN` over the `{"major": 1}` index, followed by a `FETCH` and a filter of -`year = 2020`. The SBE plan (generated by the [SBE stage builder](../docs/sbe.md#sbe-stage-builders) with the [plan -cache](#sbe-plan-cache) disabled) for this query plan is as follows: +In particular, it is an `IXSCAN` over the `{"major": 1}` index, followed by a `FETCH` and a filter +of `year = 2020`. The SBE plan (generated by the +[SBE stage builder](../docs/sbe.md#sbe-stage-builders) with the [plan cache](#sbe-plan-cache) +disabled) for this query plan is as follows: ``` *** SBE runtime environment slots *** @@ -406,8 +407,8 @@ at a point in time: Initially, all slots hold a value of `Nothing`. Note also that some slots have been omitted for brevity, namely, s3, s4, and s6 (which correspond to a `SnapshotId`, an index identifier and an index key pattern, respectively). These slots are used to implement the index key consistency and -corruption checks and as such, are beyond the scope of this example (see the [yielding -section](#yielding) for more information on these checks). +corruption checks and as such, are beyond the scope of this example (see the +[yielding section](#yielding) for more information on these checks). Execution starts by calling `getNext()` on the `filter` stage, which will call `getNext()` on its child `nlj` stage. `nlj` will call `getNext()` once on its outer child (the `ixseek` stage) before @@ -454,14 +455,14 @@ to the reader. ## SBE Plan Cache -There exists a plan cache for SBE; see the [relevant -README](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/query/README.md#sbe-plan-cache) +There exists a plan cache for SBE; see the +[relevant README](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/query/README.md#sbe-plan-cache) for more details. ## Runtime Planners -See [Classic Runtime Planners for SBE -README](/src/mongo/db/exec/runtime_planners/classic_runtime_planner_for_sbe/README.md). +See +[Classic Runtime Planners for SBE README](/src/mongo/db/exec/runtime_planners/classic_runtime_planner_for_sbe/README.md). ## Incomplete Sections Below (TODO) diff --git a/src/mongo/db/extension/CLAUDE.md b/src/mongo/db/extension/CLAUDE.md index e288db653b9..64ce785f3a8 100644 --- a/src/mongo/db/extension/CLAUDE.md +++ b/src/mongo/db/extension/CLAUDE.md @@ -1,6 +1,10 @@ ## Extensions API -The MongoDB Extensions API is a dynamic plugin system that loads shared libraries (`.so` files) into the server at startup to provide additional aggregation stages. Extensions are developed, versioned, and deployed independently of the server. The primary use case is moving Atlas Search stages (`$vectorSearch`, `$search`, `$searchMeta`) out of the server codebase. Only Rust extensions are supported in production; the C++ SDK here is for internal testing. +The MongoDB Extensions API is a dynamic plugin system that loads shared libraries (`.so` files) into +the server at startup to provide additional aggregation stages. Extensions are developed, versioned, +and deployed independently of the server. The primary use case is moving Atlas Search stages +(`$vectorSearch`, `$search`, `$searchMeta`) out of the server codebase. Only Rust extensions are +supported in production; the C++ SDK here is for internal testing. ### Architecture @@ -19,11 +23,18 @@ The MongoDB Extensions API is a dynamic plugin system that loads shared librarie └─────────────────────────────────────────────────────────┘ ``` -- **`public/`** - The C API header (`api.h`). All types and function pointer vtables crossing the ABI boundary. Written in C for ABI stability. Never add C++ types here. -- **`host/`** - Server-side integration: `DocumentSourceExtension`, extension loading, host services, unit tests. Must only use `host_connector` abstractions, never raw C API types. Namespace: `mongo::extension::host`. -- **`host_connector/`** - C++ wrappers for the host to safely call extension code. **Adapters** (in `adapter/`) wrap host C++ for extensions to call. **Handles** (in `handle/`) wrap extension C pointers for the host to call. Namespace: `mongo::extension::host_connector`. -- **`sdk/`** - C++ SDK for writing test extensions. Mirrors host_connector on the extension side. Namespace: `mongo::extension::sdk`. -- **`shared/`** - Utilities used by both host and SDK: `Handle` template, `ByteBuf`, `ExtensionStatus`, `GetNextResult`. +- **`public/`** - The C API header (`api.h`). All types and function pointer vtables crossing the + ABI boundary. Written in C for ABI stability. Never add C++ types here. +- **`host/`** - Server-side integration: `DocumentSourceExtension`, extension loading, host + services, unit tests. Must only use `host_connector` abstractions, never raw C API types. + Namespace: `mongo::extension::host`. +- **`host_connector/`** - C++ wrappers for the host to safely call extension code. **Adapters** (in + `adapter/`) wrap host C++ for extensions to call. **Handles** (in `handle/`) wrap extension C + pointers for the host to call. Namespace: `mongo::extension::host_connector`. +- **`sdk/`** - C++ SDK for writing test extensions. Mirrors host_connector on the extension side. + Namespace: `mongo::extension::sdk`. +- **`shared/`** - Utilities used by both host and SDK: `Handle` template, `ByteBuf`, + `ExtensionStatus`, `GetNextResult`. - **`test_examples/`** - Reference test extensions. Pattern-match these when adding new extensions. ### Key Design Rules @@ -40,43 +51,63 @@ The MongoDB Extensions API is a dynamic plugin system that loads shared librarie - **Adapters** wrap C++ implementations and expose them through C API vtables - **Handles** wrap C API pointers and provide type-safe C++ access to vtable functions -- Error conversion: `wrapCXXAndConvertExceptionToStatus()` (adapters) and `invokeCAndConvertStatusToException()` (handles) +- Error conversion: `wrapCXXAndConvertExceptionToStatus()` (adapters) and + `invokeCAndConvertStatusToException()` (handles) **Layering:** -- `mongo::extension::host` must only use types from `mongo::extension::host_connector`, never from `public/api.h` directly +- `mongo::extension::host` must only use types from `mongo::extension::host_connector`, never from + `public/api.h` directly - Exception: host services connector logic (HostPortal) lives in `host/` due to server dependencies **SDK constraints:** -- The SDK depends on `mongo/base` for BSONObj and DBException (SERVER-107651 tracks removal). Do NOT add new `mongo/base` usages beyond BSON and exception handling. -- In extension code, use `sdk_uassert()`, `sdk_tassert()` instead of the server's `uassert`/`tassert` +- The SDK depends on `mongo/base` for BSONObj and DBException (SERVER-107651 tracks removal). Do NOT + add new `mongo/base` usages beyond BSON and exception handling. +- In extension code, use `sdk_uassert()`, `sdk_tassert()` instead of the server's + `uassert`/`tassert` ### Common Mistakes -- **Using server assertion macros in extension code.** Use `sdk_uassert(code, msg, cond)` / `sdk_tassert(code, msg, cond)` from `sdk/assert_util.h`, not the server's `uassert`/`tassert`. -- **Referencing `public/api.h` types from `host/` code.** Host logic must go through `host_connector/` abstractions. +- **Using server assertion macros in extension code.** Use `sdk_uassert(code, msg, cond)` / + `sdk_tassert(code, msg, cond)` from `sdk/assert_util.h`, not the server's `uassert`/`tassert`. +- **Referencing `public/api.h` types from `host/` code.** Host logic must go through + `host_connector/` abstractions. - **Adding C++ types to `public/api.h`.** The public API is pure C for ABI stability. -- **Forgetting `--linkstatic=False` when building/testing extensions.** Extensions require dynamic linking. -- **Forgetting to add a new passthrough extension to `dist_test_extensions`** in `test_examples/BUILD.bazel`. If you add a `_mongo_extension` target but don't list it there, it won't be loaded in passthrough suites. -- **Letting BSONObj cross the C API boundary.** Serialize to `ByteView`/`ByteBuf` first, deserialize on the other side. -- **Letting C++ exceptions escape across the boundary.** Adapter code must catch all exceptions and convert to `MongoExtensionStatus*`. +- **Forgetting `--linkstatic=False` when building/testing extensions.** Extensions require dynamic + linking. +- **Forgetting to add a new passthrough extension to `dist_test_extensions`** in + `test_examples/BUILD.bazel`. If you add a `_mongo_extension` target but don't list it there, it + won't be loaded in passthrough suites. +- **Letting BSONObj cross the C API boundary.** Serialize to `ByteView`/`ByteBuf` first, deserialize + on the other side. +- **Letting C++ exceptions escape across the boundary.** Adapter code must catch all exceptions and + convert to `MongoExtensionStatus*`. ### Aggregation Stage Lifecycle Extension stages go through these phases, each modeled by a C API type: -1. **StageDescriptor** (`MongoExtensionAggStageDescriptor`) - Static factory registered at startup. Owns stage name and `parse()`. Lives for entire extension lifetime. -2. **ParseNode** (`MongoExtensionAggStageParseNode`) - Validates syntax, generates query shapes, **expands** (desugars) into resolved nodes. -3. **AstNode** (`MongoExtensionAggStageAstNode`) - Post-expansion. Provides static properties, binds to catalog context (namespace, UUID, explain verbosity). -4. **LogicalStage** (`MongoExtensionLogicalAggStage`) - Bound to instance context. Serialization, explain, optimization, distributed plan logic. Compiles to executable. -5. **ExecutableStage** (`MongoExtensionExecAggStage`) - Runtime: `open()`, `getNext()`, `reopen()`, `close()`. +1. **StageDescriptor** (`MongoExtensionAggStageDescriptor`) - Static factory registered at startup. + Owns stage name and `parse()`. Lives for entire extension lifetime. +2. **ParseNode** (`MongoExtensionAggStageParseNode`) - Validates syntax, generates query shapes, + **expands** (desugars) into resolved nodes. +3. **AstNode** (`MongoExtensionAggStageAstNode`) - Post-expansion. Provides static properties, binds + to catalog context (namespace, UUID, explain verbosity). +4. **LogicalStage** (`MongoExtensionLogicalAggStage`) - Bound to instance context. Serialization, + explain, optimization, distributed plan logic. Compiles to executable. +5. **ExecutableStage** (`MongoExtensionExecAggStage`) - Runtime: `open()`, `getNext()`, `reopen()`, + `close()`. -Stage types: **source** (produce documents), **transform** (input -> output), **desugar** (expand into pipeline of other stages during parsing). +Stage types: **source** (produce documents), **transform** (input -> output), **desugar** (expand +into pipeline of other stages during parsing). ### API Versioning -Uses MAJOR.MINOR (current: `0.1`). At startup, the host passes supported versions to `get_mongodb_extension`; the extension negotiates a compatible version. MAJOR must match; host minor >= extension minor. The server supports two major versions simultaneously (N and N-1). Minor bumps add default SDK implementations; major bumps maintain frozen old API header snapshots. +Uses MAJOR.MINOR (current: `0.1`). At startup, the host passes supported versions to +`get_mongodb_extension`; the extension negotiates a compatible version. MAJOR must match; host +minor >= extension minor. The server supports two major versions simultaneously (N and N-1). Minor +bumps add default SDK implementations; major bumps maintain frozen old API header snapshots. ### Where to Start Reading @@ -97,11 +128,15 @@ Uses MAJOR.MINOR (current: `0.1`). At startup, the host passes supported version This is the most sensitive file in the extensions system. Changes here affect ABI compatibility. -- **Adding a new vtable function:** Add it at the END of the vtable struct. Never reorder existing entries. -- **Adding a new type:** Follow the vtable + struct pattern. Include `destroy()` if ownership is transferred. +- **Adding a new vtable function:** Add it at the END of the vtable struct. Never reorder existing + entries. +- **Adding a new type:** Follow the vtable + struct pattern. Include `destroy()` if ownership is + transferred. - **Never remove or rename** existing functions or types - this breaks ABI. - **Bump the minor version** for backward-compatible additions. Bump major for breaking changes. -- When adding a new API function, you must also add: a host_connector handle method, an SDK base class virtual method (with default implementation for minor bumps), and adapter implementations on both sides. +- When adding a new API function, you must also add: a host_connector handle method, an SDK base + class virtual method (with default implementation for minor bumps), and adapter implementations on + both sides. ### Building @@ -116,7 +151,8 @@ bazel build install-extensions bazel build //src/mongo/db/extension/test_examples:foo_mongo_extension --linkstatic=False ``` -Extensions use Bazel transitions (`bazel/transitions.bzl`) for `--allocator=system` and `shared_archive=True`. Extensions are GPG-signed at build time. +Extensions use Bazel transitions (`bazel/transitions.bzl`) for `--allocator=system` and +`shared_archive=True`. Extensions are GPG-signed at build time. ### Unit Testing @@ -134,7 +170,8 @@ Host-side tests: `host/BUILD.bazel`. SDK-side tests: `sdk/tests/BUILD.bazel`. **Passthrough suites** run all `jstests/extensions/` tests across topologies: -- `extensions_standalone`, `extensions_single_node`, `extensions_single_shard`, `extensions_sharded_cluster`, `extensions_sharded_collections` +- `extensions_standalone`, `extensions_single_node`, `extensions_single_shard`, + `extensions_sharded_cluster`, `extensions_sharded_collections` ```bash # Run a passthrough suite @@ -152,20 +189,30 @@ python3 buildscripts/resmoke.py run --suites=no_passthrough --runAllFeatureFlagT Always use `--runAllFeatureFlagTests` - extension tests require `featureFlagExtensionsAPI`. -Resmoke auto-discovers `*_mongo_extension.so`, generates `.conf` files in `/tmp/mongo/extensions/`, and passes `loadExtensions` to the server. Extension options come from `test_examples/configurations.yml`. +Resmoke auto-discovers `*_mongo_extension.so`, generates `.conf` files in `/tmp/mongo/extensions/`, +and passes `loadExtensions` to the server. Extension options come from +`test_examples/configurations.yml`. ### Test Extension Naming Conventions -- **Passthrough extensions** (loaded in all passthrough suites): MUST have `_mongo_extension` suffix (e.g., `foo_mongo_extension`). Add to first section of `dist_test_extensions` in `test_examples/BUILD.bazel`. -- **No-passthrough-only extensions**: MUST NOT have `_mongo_extension` suffix (e.g., `vector_search_extension`). Add to second section. +- **Passthrough extensions** (loaded in all passthrough suites): MUST have `_mongo_extension` suffix + (e.g., `foo_mongo_extension`). Add to first section of `dist_test_extensions` in + `test_examples/BUILD.bazel`. +- **No-passthrough-only extensions**: MUST NOT have `_mongo_extension` suffix (e.g., + `vector_search_extension`). Add to second section. - **Bad extensions** (expected to fail loading): Use `_bad_extension` suffix. ### Adding a New Test Extension -1. Create `your_stage.cpp` in `test_examples/` (or a subdirectory). Pattern-match `foo.cpp` (simple transform) or `desugar/add_fields_match.cpp` (desugar). -2. Use macros: `DEFAULT_PARSE_NODE(Name)`, `DEFAULT_EXTENSION(Name)`, `REGISTER_EXTENSION(NameExtension)`, `DEFINE_GET_EXTENSION()`. +1. Create `your_stage.cpp` in `test_examples/` (or a subdirectory). Pattern-match `foo.cpp` (simple + transform) or `desugar/add_fields_match.cpp` (desugar). +2. Use macros: `DEFAULT_PARSE_NODE(Name)`, `DEFAULT_EXTENSION(Name)`, + `REGISTER_EXTENSION(NameExtension)`, `DEFINE_GET_EXTENSION()`. 3. If the extension needs options, add to `test_examples/configurations.yml`. -4. Add build targets to `test_examples/BUILD.bazel` using `signed_mongo_cc_extension_shared_library`. -5. Add the signed lib to `dist_test_extensions` (passthrough or no-passthrough section per naming convention). +4. Add build targets to `test_examples/BUILD.bazel` using + `signed_mongo_cc_extension_shared_library`. +5. Add the signed lib to `dist_test_extensions` (passthrough or no-passthrough section per naming + convention). 6. Write unit tests in `host/` and/or `sdk/tests/`, updating respective `BUILD.bazel`. -7. Write integration tests in `jstests/extensions/` (passthrough) or `jstests/noPassthrough/extensions/` (custom topology). +7. Write integration tests in `jstests/extensions/` (passthrough) or + `jstests/noPassthrough/extensions/` (custom topology). diff --git a/src/mongo/db/extension/README.md b/src/mongo/db/extension/README.md index 417a5bc1949..71deee694eb 100644 --- a/src/mongo/db/extension/README.md +++ b/src/mongo/db/extension/README.md @@ -1,11 +1,11 @@ # MongoDB Extensions API -This document aims to provide a high-level overview for the MongoDB Extensions API. -An extension is an ahead-of-time compiled object that is dynamically loaded into the server -to provide additional functionality. This object provides a handful of functions the server -may invoke to setup/teardown the extension and register new functionality. Each extension may be -updated independently from the server, meaning that functionality can be added or altered without -building and releasing a new version of the server. +This document aims to provide a high-level overview for the MongoDB Extensions API. An extension is +an ahead-of-time compiled object that is dynamically loaded into the server to provide additional +functionality. This object provides a handful of functions the server may invoke to setup/teardown +the extension and register new functionality. Each extension may be updated independently from the +server, meaning that functionality can be added or altered without building and releasing a new +version of the server. This is a work in progress and more sections will be added gradually. @@ -28,14 +28,14 @@ interact with it directly. The Host Connector layer is responsible for creating a safe interface for the C++ host code to interact with the extension using the C Public API. The host does not need to be aware of any of the C types that are introduced in the Public API. Instead, the Host Connector provides C++ classes and -functions which abstract away the complexity and memory ownership concerns of interfacing with the -C API. +functions which abstract away the complexity and memory ownership concerns of interfacing with the C +API. In general, every abstraction in the Public API has a respective C++ interface implemented in the Host Connector which the host is expected to use. This allows us to encapsulate and control where conversions across the API boundary between C and C++ take place, leading to more maintainable code -and minimizing the risk of programmer errors in the host code. The Host Connector code lives within the -C++ namespace `mongo::extension::host_connector` and can be found under the +and minimizing the risk of programmer errors in the host code. The Host Connector code lives within +the C++ namespace `mongo::extension::host_connector` and can be found under the `mongo/db/extension/host_connector` directory. The core host logic lives in `mongo/db/extension/host` within the C++ namespace @@ -44,9 +44,9 @@ In other words, logic in `mongo::extension::host` should only refer to data stru `mongo::extension::host_connector` and should _not_ refer to any data structures from the Public API directly. -**NOTE:** The exception to the `host`/`host_connector` division is the connector logic that wraps host services -(like the HostPortal). Since that logic has many other server dependencies, the host services -connector logic lives with the `host` logic. +**NOTE:** The exception to the `host`/`host_connector` division is the connector logic that wraps +host services (like the HostPortal). Since that logic has many other server dependencies, the host +services connector logic lives with the `host` logic. ## C++ SDK @@ -56,8 +56,8 @@ API. The Extensions API initiative will only support Rust extensions in production. The Search team will own the Rust SDK. However, the Query team develops and maintains a C++ SDK for the purpose of -writing internal unit and integration tests. The C++ SDK can be found under -`mongo/db/extension/sdk` directory. +writing internal unit and integration tests. The C++ SDK can be found under `mongo/db/extension/sdk` +directory. In general, every abstraction in the Public API has a respective C++ interface implemented in the C++ SDK which extension developers are expected to use to build their extension. This includes @@ -69,9 +69,9 @@ maintainable code and minimizing the risk of programmer errors in extension code TODO SERVER-107651 Remove SDK dependency on mongo/base library. -Currently, the SDK relies on the mongo/base library for BSON/BSONObj representation, -as well as DBException and other exception handling functionality. Ideally we should remove that -dependency since it's possible that linking mongo/base in an extension library could cause issues like +Currently, the SDK relies on the mongo/base library for BSON/BSONObj representation, as well as +DBException and other exception handling functionality. Ideally we should remove that dependency +since it's possible that linking mongo/base in an extension library could cause issues like host<>extension symbol conflicts at load time or run time. SERVER-107651 tracks the work to remove -that dependency entirely. In the meantime, please avoid adding new usages of that library outside -of BSON representation and exception handling. +that dependency entirely. In the meantime, please avoid adding new usages of that library outside of +BSON representation and exception handling. diff --git a/src/mongo/db/extension/host/README.md b/src/mongo/db/extension/host/README.md index 73909ac6571..611489c1b5c 100644 --- a/src/mongo/db/extension/host/README.md +++ b/src/mongo/db/extension/host/README.md @@ -26,8 +26,8 @@ comprehensive set of interfaces in the API which the extension must service. .so file and find the `get_mongodb_extension` symbol. Once the host has access to the top-level extension logic exposed through `get_mongodb_extension`, -the extension and host perform a version negotiation to agree upon an Extensions API version that -is supported by both modules. Last, the host calls `initialize()` for the extension to register its +the extension and host perform a version negotiation to agree upon an Extensions API version that is +supported by both modules. Last, the host calls `initialize()` for the extension to register its aggregation stages through the host portal. If there are any issues while loading an extension, an error will be logged and startup will fail. @@ -39,7 +39,9 @@ startup option were successfully loaded, and we will log a success message. Each extension loaded at startup must have: 1. A `SharedLibrary` file (`*.so`) - the compiled extension -2. A configuration file (`.conf`) - located under `/etc/mongo/extensions`. In test environments (when test commands are enabled), the config directory is `/tmp/mongo/extensions` instead. +2. A configuration file (`.conf`) - located under `/etc/mongo/extensions`. In test + environments (when test commands are enabled), the config directory is `/tmp/mongo/extensions` + instead. Configuration files use YAML syntax and must define: diff --git a/src/mongo/db/extension/host_connector/README.md b/src/mongo/db/extension/host_connector/README.md index e1e5d3aa511..c610bb3f457 100644 --- a/src/mongo/db/extension/host_connector/README.md +++ b/src/mongo/db/extension/host_connector/README.md @@ -13,11 +13,12 @@ boundary crossing is in host_connector (and shared). - **Handles** (in `shared/handle/`): Thin wrappers around C API pointers using the `VTableAPI` pattern and the `c_api_to_cpp_api` trait. They validate vtables and expose type-safe C++ methods. Used by both host_connector and SDK. -- **Adapters** (in `host_connector/adapter/`): Implement C API structs (e.g. `MongoExtensionHostPortal`) - by holding C++ implementations and forwarding C callbacks into them. They bridge the C API boundary - for host-provided services. +- **Adapters** (in `host_connector/adapter/`): Implement C API structs (e.g. + `MongoExtensionHostPortal`) by holding C++ implementations and forwarding C callbacks into them. + They bridge the C API boundary for host-provided services. **Reference:** `shared/handle/handle.h` for `VTableAPI` and the trait; `host_connector/handle/` for extension-level handles; `host_connector/adapter/` for adapters (e.g. `HostPortalAdapter`, -`HostServicesAdapter`). Most C API handles live under `shared/handle/` because **both** the host and the C++ SDK need -the same `VTableAPI` wrappers when calling across the boundary, so `host_connector/handle` is knowingly small. +`HostServicesAdapter`). Most C API handles live under `shared/handle/` because **both** the host and +the C++ SDK need the same `VTableAPI` wrappers when calling across the boundary, so +`host_connector/handle` is knowingly small. diff --git a/src/mongo/db/extension/public/README.md b/src/mongo/db/extension/public/README.md index fd77186b9a7..769fe72dec5 100644 --- a/src/mongo/db/extension/public/README.md +++ b/src/mongo/db/extension/public/README.md @@ -4,15 +4,15 @@ The canonical Public API header is `mongo/db/extension/public/api.h`. ## Implementing Polymorphism Across API Boundary -The API aims to provide flexibility to extension developers in choosing how an implementation -looks on the extension side of the API boundary. To this end, we provide an additional layer of +The API aims to provide flexibility to extension developers in choosing how an implementation looks +on the extension side of the API boundary. To this end, we provide an additional layer of indirection when defining the data types that comprise this API, which allows us to hide data members of API objects and implementation details entirely on the extension side of the API boundary. -We achieve this by implementing polymorphism in the C API, such that the vast majority of the -data structures that cross the API boundary only hold a single member: a pointer to a -virtual table that represents the common interface for the polymorphic type. +We achieve this by implementing polymorphism in the C API, such that the vast majority of the data +structures that cross the API boundary only hold a single member: a pointer to a virtual table that +represents the common interface for the polymorphic type. For example, `APIStruct` below only has a single member, a pointer to `APIStructVTable`, which requires that an extension implements the `foo` function, and assign it to the function pointer in @@ -46,8 +46,8 @@ For this reason, when allocated memory is passed across the API boundary with th transferring ownership to the caller, it must be done so via an interface that offers the functionality required to delegate the deallocation back to the original allocation context. -This API adopts the convention that all data structures that intend to transfer ownership to -the caller must provide a `destroy()` function pointer in their interface, as shown in the example +This API adopts the convention that all data structures that intend to transfer ownership to the +caller must provide a `destroy()` function pointer in their interface, as shown in the example below: ``` @@ -67,40 +67,40 @@ extern C { ``` In the Extensions API, the presence of a `destroy()` function in an interface indicates that the -type is associated with long lived memory whose ownership can be transferred across the API -boundary (i.e. from the extension to the host). It is important to note that when a function intends -to transfer ownership across the boundary, it must be explicitly stated and made clear in the +type is associated with long lived memory whose ownership can be transferred across the API boundary +(i.e. from the extension to the host). It is important to note that when a function intends to +transfer ownership across the boundary, it must be explicitly stated and made clear in the function’s documentation. ## MongoExtensionStatus Exceptions must never cross beyond the extension’s API boundary. This means that extension -developers must guarantee that no exceptions escape from the extension, and any such exceptions -must be converted to errors that can be passed across the boundary and interpreted by the host. +developers must guarantee that no exceptions escape from the extension, and any such exceptions must +be converted to errors that can be passed across the boundary and interpreted by the host. For the most part, this API adopts a convention that all function calls across the API boundary must return a `MongoExtensionStatus` which will inform the caller whether the API call was successful or -not. A zero error code indicates success, while a non-zero error code indicates an error during -the function execution. `MongoExtensionStatus` is a long-lived allocated object, since it needs to -to provide additional error information in the failure case. +not. A zero error code indicates success, while a non-zero error code indicates an error during the +function execution. `MongoExtensionStatus` is a long-lived allocated object, since it needs to to +provide additional error information in the failure case. Note that when a `MongoExtensionStatus` is returned by a function call, ownership is always -transferred to the caller of the function. Once the error is no longer needed by the caller, -its deallocation must be delegated to the other side of the API boundary. +transferred to the caller of the function. Once the error is no longer needed by the caller, its +deallocation must be delegated to the other side of the API boundary. ## MongoExtensionAggStageDescriptor -A `MongoExtensionAggStageDescriptor` describes features of an aggregation stage that are -not bound to the stage definition. This object functions as a factory to create a logical stage -through parsing. Note, that a `MongoExtensionAggStageDescriptor` is always fully owned by -the extension, and is expected to remain valid for the entire time an extension is loaded. +A `MongoExtensionAggStageDescriptor` describes features of an aggregation stage that are not bound +to the stage definition. This object functions as a factory to create a logical stage through +parsing. Note, that a `MongoExtensionAggStageDescriptor` is always fully owned by the extension, and +is expected to remain valid for the entire time an extension is loaded. ## MongoExtensionLogicalAggStage -A `MongoExtensionLogicalAggStage` describes a stage that has been parsed and bound to -instance specific context -- the stage definition and other context data from the pipeline. -These objects are suitable for pipeline optimization. Once optimization is complete they can -be used to generate objects for execution. +A `MongoExtensionLogicalAggStage` describes a stage that has been parsed and bound to instance +specific context -- the stage definition and other context data from the pipeline. These objects are +suitable for pipeline optimization. Once optimization is complete they can be used to generate +objects for execution. ## Extension Initialization diff --git a/src/mongo/db/ftdc/README.md b/src/mongo/db/ftdc/README.md index 459cc21bf5e..f3b561f610e 100644 --- a/src/mongo/db/ftdc/README.md +++ b/src/mongo/db/ftdc/README.md @@ -92,13 +92,12 @@ reference to the old file and starts writing immediately to the new file by call FTDC writes two types of files in `diagnostic.data`. -**Archive files**: `metrics.%Y-%m-%dT%H-%M-%SZ-CCCCC` -where +**Archive files**: `metrics.%Y-%m-%dT%H-%M-%SZ-CCCCC` where - `%Y-%m-%dT%H-%M-%SZ` - `strftime` format string to format a UTC date time string. Ex: `2024-03-08T22-58-41Z` -- `CCCCC` - is a five digit uniquifier in case multiple files are opened in one second. It is - always `00000` except in unit tests. +- `CCCCC` - is a five digit uniquifier in case multiple files are opened in one second. It is always + `00000` except in unit tests. It is an append only file which can be read with `bsondump`. It is composed of several types of bson documents. See [`Archive Format`](#archive-file-format). FTDC creates new archive files on server @@ -217,7 +216,8 @@ document as its baseline. #### Run length encoding of zeros A sequence of zeros is compressed to a pair of numbers `[0, x]` where `x` is non-zero positive -integer that indicates the number of zeros in a sequence. For instance, an array of zeros `[0, 0, 0, 0]` is transformed to `[0, 4]`. +integer that indicates the number of zeros in a sequence. For instance, an array of zeros +`[0, 0, 0, 0]` is transformed to `[0, 4]`. #### Varint compression @@ -242,8 +242,8 @@ For instance, for the following sequence of documents: {"a": 4, "x" : 2, "s" : "t"} ``` -The first `a : 1` is stored as the reference document. FTDC then builds an array of `[2, 3, 4, 2, 2, -2]` to represent the `a` field followed by the `x` field. +The first `a : 1` is stored as the reference document. FTDC then builds an array of +`[2, 3, 4, 2, 2, 2]` to represent the `a` field followed by the `x` field. Next, FTDC computes the delta for each sample in the chunk from the previous chunk. Nothing changes in the reference document but array is transformed to `[1, 1, 1, 0, 0, 0]`. diff --git a/src/mongo/db/fts/README.md b/src/mongo/db/fts/README.md index a82cfcc9327..b01e2e3e7bf 100644 --- a/src/mongo/db/fts/README.md +++ b/src/mongo/db/fts/README.md @@ -4,8 +4,8 @@ MongoDB has had support for creating 'text' indexes for a long time (since at le recently, this support has been deprioritized in favor of Atlas Search and the $search aggregation stage. -Please refer to our documentation for some more detailed examples of using the feature. -Here we will go into the implementation a bit. +Please refer to our documentation for some more detailed examples of using the feature. Here we will +go into the implementation a bit. An FTS index in MongoDB is implemented as a multikey, sparse B-Tree in terms of the actual data structure. As an example, consider this index: @@ -64,8 +64,8 @@ Note: - the score is calculated according to some fairly complex logic, involving the relative frequency of each term. Generally, rarer (more unique) terms get higher scores. - The index spec has two 'text' terms, but they are consolidated into one part of the index. Any - number of 'text' index components can occur next to each other, and will always result in a '\_fts' - component for the term and an '\_ftsx' component for the score. + number of 'text' index components can occur next to each other, and will always result in a + '\_fts' component for the term and an '\_ftsx' component for the score. - The scores are computed **with only one document considered at a time**. This means that if a particular token is quite rare within a single document, it'll get a high score even if it is otherwise quite common in the broader dataset. This is one notable benefit of the Atlas Search diff --git a/src/mongo/db/global_catalog/README_timeseries.md b/src/mongo/db/global_catalog/README_timeseries.md index 95c8b0e4811..916ed9c0a9e 100644 --- a/src/mongo/db/global_catalog/README_timeseries.md +++ b/src/mongo/db/global_catalog/README_timeseries.md @@ -1,15 +1,18 @@ # Sharded Time-Series Collections -For a general overview about how time-series collection are implemented, see [db/timeseries/README.md](../timeseries/README.md). -For an overview of query optimizations for time-series collections see [query/timeseries/README](../query/timeseries/README.md). -This section will focus on the implementation of sharded time-series collection, and assumes knowledge of time-series collections and sharding basics. +For a general overview about how time-series collection are implemented, see +[db/timeseries/README.md](../timeseries/README.md). For an overview of query optimizations for +time-series collections see [query/timeseries/README](../query/timeseries/README.md). This section +will focus on the implementation of sharded time-series collection, and assumes knowledge of +time-series collections and sharding basics. ## Creating a sharded time-series collection ### shardCollection command -Users can create a sharded time-series collection by running `shardCollection` on the **view** namespace -with the `timeseries` option. This will implicitly create a time-series collection if it doesn't exist. +Users can create a sharded time-series collection by running `shardCollection` on the **view** +namespace with the `timeseries` option. This will implicitly create a time-series collection if it +doesn't exist. The shard key pattern must meet all the existing restrictions and the following unique restrictions: @@ -17,14 +20,16 @@ The shard key pattern must meet all the existing restrictions and the following 2. The `timeField` must appear last if the shard key is compound. 3. The `timeField` can only have an ascending range key. -The primary shard (within the create collection DDL coordinator) will transform the `shardCollection` -command to a command on the **buckets** namespace, and convert the shard key to be on the buckets collection. -The command will then run as a typical `shardCollection` command. This means that the information -persisted on the sharding catalog (collection name and shard key in `config.collections`, -chunk boundaries in `config.chunks`) will reference the buckets namespace and its metadata. +The primary shard (within the create collection DDL coordinator) will transform the +`shardCollection` command to a command on the **buckets** namespace, and convert the shard key to be +on the buckets collection. The command will then run as a typical `shardCollection` command. This +means that the information persisted on the sharding catalog (collection name and shard key in +`config.collections`, chunk boundaries in `config.chunks`) will reference the buckets namespace and +its metadata. The table below shows how the shard key and the index backing the shard key is converted for sharded -time-series collections. For this example, the `timeseries` options are `{timeField: "t", metaField: "m"}`. +time-series collections. For this example, the `timeseries` options are +`{timeField: "t", metaField: "m"}`. | shard key on the view (as specified by the user in create/shardCollection) | shard key on the buckets (as persisted in config.collections) | Index on the buckets | | -------------------------------------------------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------- | @@ -37,44 +42,47 @@ time-series collections. For this example, the `timeseries` options are `{timeFi ### Sharding metadata For viewful timeseries collections, only the buckets collection (not the view) is stored on the -config server. The config server has a -`timeseriesFields` parameter set for each buckets collection, which is identical to `timeseriesOptions`. -The `timeseriesFields` parameter is then loaded in memory by the `CatalogCache`, `ChunkManager`, and -the collection metadata. +config server. The config server has a `timeseriesFields` parameter set for each buckets collection, +which is identical to `timeseriesOptions`. The `timeseriesFields` parameter is then loaded in memory +by the `CatalogCache`, `ChunkManager`, and the collection metadata. For viewless timeseries collections, there is no view namespace, so the config server has all of the metadata already in the `timeseriesFields` parameter. -If the granularity is updated through a `collMod` command, the config server `timeseriesFields` parameter -will also be updated. We treat the granularity value on the config server as the source of truth for the -granularity value of the collection. +If the granularity is updated through a `collMod` command, the config server `timeseriesFields` +parameter will also be updated. We treat the granularity value on the config server as the source of +truth for the granularity value of the collection. -During the `collMod` command, all queries must run with the updated granularity value. It's important -because the `control.min.` is set by rounding down the document's `timeField` value -by the specified granularity. If the shard key is on the `timeField`, documents are routed to shards based -on `control.min.` which relies on the granularity value. +During the `collMod` command, all queries must run with the updated granularity value. It's +important because the `control.min.` is set by rounding down the document's `timeField` +value by the specified granularity. If the shard key is on the `timeField`, documents are routed to +shards based on `control.min.` which relies on the granularity value. -That is why before performing any CRUD operations or aggregations on the shards, the granularity -on the config server is checked through the cached `CollectionRoutingInfo`. Therefore, operations will +That is why before performing any CRUD operations or aggregations on the shards, the granularity on +the config server is checked through the cached `CollectionRoutingInfo`. Therefore, operations will always run with the most up to date granularity value, and thus predicates will be routed correctly. ### How chunks are formatted The shard key can be on the `metaField` or any number of subfields of the `metaField`. This is the -recommended approach, since users should choose a `metaField` that will partition the measurements in -a slightly uniform way. In contrast, `timeField` values are monotonically increasing and thus could route all inserts to a single shard. +recommended approach, since users should choose a `metaField` that will partition the measurements +in a slightly uniform way. In contrast, `timeField` values are monotonically increasing and thus +could route all inserts to a single shard. If the shard key is on the `timeField` the chunk ranges will be defined on the buckets collection on -the `control.min.` field. The `control.min.` value is a rounded down lower boundary -for the bucket. It's possible (and likely) that no measurements (user documents) have this value. +the `control.min.` field. The `control.min.` value is a rounded down lower +boundary for the bucket. It's possible (and likely) that no measurements (user documents) have this +value. -Unlike normal sharded collections, a measurement’s location is not tightly bound to the chunk range, because the chunk range defines -where the **buckets** not measurements should go. Usually, a chunk range would never overlap with another chunk; we can assume all -measurements with a certain value defined by the chunk exist only on one chunk. However, this is not the case for time-series. -Bucket ranges do overlap and can belong to different chunks. This means that measurements that have values +Unlike normal sharded collections, a measurement’s location is not tightly bound to the chunk range, +because the chunk range defines where the **buckets** not measurements should go. Usually, a chunk +range would never overlap with another chunk; we can assume all measurements with a certain value +defined by the chunk exist only on one chunk. However, this is not the case for time-series. Bucket +ranges do overlap and can belong to different chunks. This means that measurements that have values exceeding the maximum value of the chunk range can exist on the chunk. -We will illustrate this with an example where the shard key is on the `timeField` and the `timeField = time`: +We will illustrate this with an example where the shard key is on the `timeField` and the +`timeField = time`: ``` // We have the following measurements: @@ -101,48 +109,52 @@ Chunk3 contains no buckets ``` -`Chunk1` contains `Bucket2`, but `Bucket2` has a measurement (`Doc2`) which is outside the chunk boundary, -but the measurement fits inside the bucket. Therefore, chunk boundaries in time-series collections do not -define where measurements are stored. +`Chunk1` contains `Bucket2`, but `Bucket2` has a measurement (`Doc2`) which is outside the chunk +boundary, but the measurement fits inside the bucket. Therefore, chunk boundaries in time-series +collections do not define where measurements are stored. ## CRUD operations -For inserts/updates/delete requests, mongos will receive the request on the **view** namespace and check -if the chunk manager has a routing table for the buckets collection. If it doesn't find one, it will check if a -buckets collection exists in the `CatalogCache` (see `CollectionRoutingInfoTargeter::_init`). If it finds a -buckets collection in either location, mongos will do the following: +For inserts/updates/delete requests, mongos will receive the request on the **view** namespace and +check if the chunk manager has a routing table for the buckets collection. If it doesn't find one, +it will check if a buckets collection exists in the `CatalogCache` (see +`CollectionRoutingInfoTargeter::_init`). If it finds a buckets collection in either location, mongos +will do the following: 1. Translate the request to be on the buckets collection namespace. 2. Extract the buckets collection's shard key. 3. Set the `isTimeSeriesNameSpace` flag to `true`. -4. For updates and deletes: Rewrite the query predicate (see `getBucketLevelPredicateForRouting`). The - field names are changed to match the buckets (`metaField` will become `"meta"`, and `timeField` will - become `control.min.` and `control.max.`). This rewritten predicate is used - for routing. +4. For updates and deletes: Rewrite the query predicate (see `getBucketLevelPredicateForRouting`). + The field names are changed to match the buckets (`metaField` will become `"meta"`, and + `timeField` will become `control.min.` and `control.max.`). This rewritten + predicate is used for routing. -These steps allow mongos to decide which shards to target, or if to broadcast the command. For example, -if the shard key is on the `metaField`, and there is no predicate on the `metaField` step 4 won't rewrite the -predicate and will pass an empty object into the shard key extractor. An empty object will also be passed -into the shard key extractor if there is a shard key on the `timeField` and no predicate on the `timeField`. -This will trigger mongos to broadcast the update/delete request. +These steps allow mongos to decide which shards to target, or if to broadcast the command. For +example, if the shard key is on the `metaField`, and there is no predicate on the `metaField` step 4 +won't rewrite the predicate and will pass an empty object into the shard key extractor. An empty +object will also be passed into the shard key extractor if there is a shard key on the `timeField` +and no predicate on the `timeField`. This will trigger mongos to broadcast the update/delete +request. -After mongos routes or broadcasts the request, the shards receive it. Then the shards check if the `isTimeSeriesNameSpace` -is set (see `timeseries::isTimeseriesViewRequest`). If it is set, the shards call specific time-series functions, just -like unsharded time-series collections. For example, for inserts, measurements will try to be inserted into an open bucket -in the bucket catalog, then a reopened bucket, and finally a new bucket will be opened if necessary. Updates -and deletes occur one bucket at a time, and the buckets will be unpacked if necessary. See -[db/timeseries/README.md](../timeseries/README.md) for more details about the specific implementations -of each CRUD operation. +After mongos routes or broadcasts the request, the shards receive it. Then the shards check if the +`isTimeSeriesNameSpace` is set (see `timeseries::isTimeseriesViewRequest`). If it is set, the shards +call specific time-series functions, just like unsharded time-series collections. For example, for +inserts, measurements will try to be inserted into an open bucket in the bucket catalog, then a +reopened bucket, and finally a new bucket will be opened if necessary. Updates and deletes occur one +bucket at a time, and the buckets will be unpacked if necessary. See +[db/timeseries/README.md](../timeseries/README.md) for more details about the specific +implementations of each CRUD operation. ## Query routing for aggregation ### Viewful timeseries collections This works similarly to queries on a view of a normal sharded collection. Users write a query on the -**view** namespace. Mongos routes the query to the primary shard. The primary shard resolves the view, -rewrites the query to be on the buckets collection, and throws a `CommandOnShardedViewNotSupportedOnMongod` -with the entire pipeline view definition in the returned error message. Mongos receives the expanded -view definition, and then routes the query as it typically would. +**view** namespace. Mongos routes the query to the primary shard. The primary shard resolves the +view, rewrites the query to be on the buckets collection, and throws a +`CommandOnShardedViewNotSupportedOnMongod` with the entire pipeline view definition in the returned +error message. Mongos receives the expanded view definition, and then routes the query as it +typically would. The aggregation stage to handle time-series buckets collections (`$_internalUnpackBucket`) is pushed down to the shards. For more information about `$_internalUnpackBucket` and query rewrites see @@ -158,32 +170,34 @@ in [query/timeseries/README](../query/timeseries/README.md). ## DDL operations -Users run DDL operations (`collMod`, `createIndexes`, `listIndexes`, `dropIndexes`, and etc...) on the -**view** namespace. The buckets collection is meant to be invisible to the end user: special permissions are -required to run DDL operations directly on it. The DDL coordinator translates the operation to the -buckets namespace using the function `setBucketNss`, stores it in the `ShardingCoordinatorMetadata`, -and sets the `isTimeseriesNamespace` flag. Specific DDL coordinators will do further time-series rewrites -as necessary. For example, the `CreateCollectionCoordinator` will check for the presence of `timeseriesFields` -in the `ChunkManager` to decide if the shard key needs to be rewritten before forwarding the request to the shards. +Users run DDL operations (`collMod`, `createIndexes`, `listIndexes`, `dropIndexes`, and etc...) on +the **view** namespace. The buckets collection is meant to be invisible to the end user: special +permissions are required to run DDL operations directly on it. The DDL coordinator translates the +operation to the buckets namespace using the function `setBucketNss`, stores it in the +`ShardingCoordinatorMetadata`, and sets the `isTimeseriesNamespace` flag. Specific DDL coordinators +will do further time-series rewrites as necessary. For example, the `CreateCollectionCoordinator` +will check for the presence of `timeseriesFields` in the `ChunkManager` to decide if the shard key +needs to be rewritten before forwarding the request to the shards. When the shards receive the DDL operation, the shards will decide if the operation body needs to be -translated to the buckets namespace. For example, `listIndexes` checks if the `isTimeseriesNamespace` -flag is set to return all of the indexes on the buckets collection. +translated to the buckets namespace. For example, `listIndexes` checks if the +`isTimeseriesNamespace` flag is set to return all of the indexes on the buckets collection. -Additionally, there are operations that must perform a "reverse" translation (from buckets to the view). -`listIndexes` will return the existing indexes (created by `shardCollection/createIndexes`) after translating the index -from the buckets collection to the time-series view. This is because the buckets collection should be -invisible to the end-user. +Additionally, there are operations that must perform a "reverse" translation (from buckets to the +view). `listIndexes` will return the existing indexes (created by `shardCollection/createIndexes`) +after translating the index from the buckets collection to the time-series view. This is because the +buckets collection should be invisible to the end-user. ## Sharding administrative commands -All sharding admin commands, such as `split` and `moveChunk` must be run on the **buckets** collection -directly. These are some of the only commands that users run on the **buckets** namespace, and not the **view** namespace. +All sharding admin commands, such as `split` and `moveChunk` must be run on the **buckets** +collection directly. These are some of the only commands that users run on the **buckets** +namespace, and not the **view** namespace. ## Orphan buckets and the bucket catalog Open time-series buckets are stored in memory in the `BucketCatalog`. Incoming measurements will be inserted into the open buckets in the catalog. If a chunk migration occurs, and a bucket becomes an orphan on a specific shard, the `BucketCatalog` cannot insert any new measurements into these newly -orphaned buckets. Therefore, the bucket catalog must consider if buckets are orphaned. To achieve this, -after a chunk migration has succeeded, the `BucketCatalog` is cleared. +orphaned buckets. Therefore, the bucket catalog must consider if buckets are orphaned. To achieve +this, after a chunk migration has succeeded, the `BucketCatalog` is cleared. diff --git a/src/mongo/db/global_catalog/ddl/README_ddl_operations.md b/src/mongo/db/global_catalog/ddl/README_ddl_operations.md index de3006ab4d0..f71e31e7dba 100644 --- a/src/mongo/db/global_catalog/ddl/README_ddl_operations.md +++ b/src/mongo/db/global_catalog/ddl/README_ddl_operations.md @@ -1,14 +1,29 @@ # DDL Operations -On the Sharding team, we use the term _DDL_ to mean any operation that needs to update any subset of [catalog containers](../../shard_role/shard_catalog/README_sharding_catalog.md#catalog-containers). Within this definition, there are standard DDLs that use the DDL coordinator infrastructure as well as non-standard DDLs that each have their own implementations. +On the Sharding team, we use the term _DDL_ to mean any operation that needs to update any subset of +[catalog containers](../../shard_role/shard_catalog/README_sharding_catalog.md#catalog-containers). +Within this definition, there are standard DDLs that use the DDL coordinator infrastructure as well +as non-standard DDLs that each have their own implementations. ## Standard DDLs -Most DDL operations are built upon the DDL coordinator infrastructure which provides some [retriability](#retriability), [synchronization](#synchronization), and [recoverability](#recovery) guarantees. +Most DDL operations are built upon the DDL coordinator infrastructure which provides some +[retriability](#retriability), [synchronization](#synchronization), and [recoverability](#recovery) +guarantees. -Each of these operations has a _coordinator_ - a node that drives the execution of the operation. In [most operations](https://github.com/mongodb/mongo/blob/e61bf27c2f6a83fed36e5a13c008a32d563babe2/src/mongo/db/s/sharding_ddl_coordinator_service.cpp#L60-L120), this coordinator is the database primary, but in [a few others](https://github.com/mongodb/mongo/blob/e61bf27c2f6a83fed36e5a13c008a32d563babe2/src/mongo/db/s/config/configsvr_coordinator_service.cpp#L75-L94) the coordinator is the CSRS. These coordinators extend either the [RecoverableShardingDDLCoordinator class](https://github.com/mongodb/mongo/blob/9fe03fd6c85760920398b7891fde74069f5457db/src/mongo/db/s/sharding_ddl_coordinator.h#L266) or the [ConfigSvrCoordinator class](https://github.com/mongodb/mongo/blob/9fe03fd6c85760920398b7891fde74069f5457db/src/mongo/db/s/config/configsvr_coordinator.h#L47), which make up the DDL coordinator infrastructure. +Each of these operations has a _coordinator_ - a node that drives the execution of the operation. In +[most operations](https://github.com/mongodb/mongo/blob/e61bf27c2f6a83fed36e5a13c008a32d563babe2/src/mongo/db/s/sharding_ddl_coordinator_service.cpp#L60-L120), +this coordinator is the database primary, but in +[a few others](https://github.com/mongodb/mongo/blob/e61bf27c2f6a83fed36e5a13c008a32d563babe2/src/mongo/db/s/config/configsvr_coordinator_service.cpp#L75-L94) +the coordinator is the CSRS. These coordinators extend either the +[RecoverableShardingDDLCoordinator class](https://github.com/mongodb/mongo/blob/9fe03fd6c85760920398b7891fde74069f5457db/src/mongo/db/s/sharding_ddl_coordinator.h#L266) +or the +[ConfigSvrCoordinator class](https://github.com/mongodb/mongo/blob/9fe03fd6c85760920398b7891fde74069f5457db/src/mongo/db/s/config/configsvr_coordinator.h#L47), +which make up the DDL coordinator infrastructure. -The diagram below shows a simplified example of a DDL operation's execution. The coordinator can be one of the shards or the config server, and the commands sent to that node will just be applied locally. +The diagram below shows a simplified example of a DDL operation's execution. The coordinator can be +one of the shards or the config server, and the commands sent to that node will just be applied +locally. ```mermaid sequenceDiagram @@ -39,36 +54,72 @@ loop DDL Coordinator Infrastructure end ``` -The outer loop is common to all standard DDL operations and ensures that different DDLs within the same database/namespace serialize properly by acquiring [DDL locks](#synchronization). The inner loop is specific to each operation, and will perform some set of updates to the sharding metadata while under the critical section. Some operations may not involve all shards or may have more complex phases on the participant shards, but all follow the same general pattern of acquiring the critical section, updating some metadata, and releasing the critical section, checkpointing their progress along the way. +The outer loop is common to all standard DDL operations and ensures that different DDLs within the +same database/namespace serialize properly by acquiring [DDL locks](#synchronization). The inner +loop is specific to each operation, and will perform some set of updates to the sharding metadata +while under the critical section. Some operations may not involve all shards or may have more +complex phases on the participant shards, but all follow the same general pattern of acquiring the +critical section, updating some metadata, and releasing the critical section, checkpointing their +progress along the way. -The checkpoints are majority write concern updates to a persisted document on the coordinator. This document - called a state document - contains all the information about the running operation including the operation type, namespaces involved, and which checkpoint the operation has reached. An initial checkpoint must be the first thing any coordinator does in order to ensure that the operation will continue to run even [in the presence of failovers](#recovery). Subsequent checkpoints allow retries to skip phases that have already been completed. +The checkpoints are majority write concern updates to a persisted document on the coordinator. This +document - called a state document - contains all the information about the running operation +including the operation type, namespaces involved, and which checkpoint the operation has reached. +An initial checkpoint must be the first thing any coordinator does in order to ensure that the +operation will continue to run even [in the presence of failovers](#recovery). Subsequent +checkpoints allow retries to skip phases that have already been completed. ### Retriability -Most DDL operations must complete after they have started. An exception to this is often a _CheckPreconditions_ phase at the beginning of a coordinator in which the operation will check some conditions and will be allowed to exit if these conditions are not met. After this, however, the operation will continue to retry until it succeeds. This is because the updates to the sharding metadata would cause inconsistencies if the critical section were released partially through the operation. For this reason, DDL operations should not throw non-retriable errors after the initial phase of checking preconditions. +Most DDL operations must complete after they have started. An exception to this is often a +_CheckPreconditions_ phase at the beginning of a coordinator in which the operation will check some +conditions and will be allowed to exit if these conditions are not met. After this, however, the +operation will continue to retry until it succeeds. This is because the updates to the sharding +metadata would cause inconsistencies if the critical section were released partially through the +operation. For this reason, DDL operations should not throw non-retriable errors after the initial +phase of checking preconditions. ### Synchronization -DDL operations are serialized on the coordinator by acquisition of the DDL locks, handled by the [DDL Lock Manager](https://github.com/mongodb/mongo/blob/r7.0.0-rc7/src/mongo/db/s/ddl_lock_manager.h). DDL locks are local to the coordinator and only in memory, so they must be reacquired during [recovery](#recovery). +DDL operations are serialized on the coordinator by acquisition of the DDL locks, handled by the +[DDL Lock Manager](https://github.com/mongodb/mongo/blob/r7.0.0-rc7/src/mongo/db/s/ddl_lock_manager.h). +DDL locks are local to the coordinator and only in memory, so they must be reacquired during +[recovery](#recovery). -The DDL locks follow a [multiple granularity hierarchical approach](https://en.wikipedia.org/wiki/Multiple_granularity_locking), which means DDL locks must be acquired in a specific order using the [intentional locking protocol](https://en.wikipedia.org/wiki/Multiple_granularity_locking#:~:text=MGL%20also%20uses%20intentional%20%22locks%22) appropriately. With that, we ensure that a DDL operation acting over a whole database serializes with another DDL operation targeting a collection from that database, and, at the same time, two DDL operations targeting different collections can run concurrently. +The DDL locks follow a +[multiple granularity hierarchical approach](https://en.wikipedia.org/wiki/Multiple_granularity_locking), +which means DDL locks must be acquired in a specific order using the +[intentional locking protocol](https://en.wikipedia.org/wiki/Multiple_granularity_locking#:~:text=MGL%20also%20uses%20intentional%20%22locks%22) +appropriately. With that, we ensure that a DDL operation acting over a whole database serializes +with another DDL operation targeting a collection from that database, and, at the same time, two DDL +operations targeting different collections can run concurrently. Every DDL lock resource should be taken in the following order: 1. DDL Database lock 2. DDL Collection lock -Therefore, if a DDL operation needs to update collection metadata, a DDL lock will be acquired first on the database in IX mode and then on the collection in X mode. On the other hand, if a DDL operation only updates the database metadata (like dropDatabase), only the DDL lock on the database will be taken in X mode. +Therefore, if a DDL operation needs to update collection metadata, a DDL lock will be acquired first +on the database in IX mode and then on the collection in X mode. On the other hand, if a DDL +operation only updates the database metadata (like dropDatabase), only the DDL lock on the database +will be taken in X mode. -Some operations also acquire additional DDL locks, such as renameCollection, which will acquire the database and collection DDL locks for the target namespace after acquiring the DDL locks on the source collection. +Some operations also acquire additional DDL locks, such as renameCollection, which will acquire the +database and collection DDL locks for the target namespace after acquiring the DDL locks on the +source collection. Finally, at the end of the operation, all of the locks are released in reverse order. ### Recovery -DDL coordinators are resilient to elections and sudden crashes because they are implemented as [primary only services](https://github.com/mongodb/mongo/blob/r6.0.0/docs/primary_only_service.md#primaryonlyservice) that - by definition - get automatically resumed when the node of a shard steps up. +DDL coordinators are resilient to elections and sudden crashes because they are implemented as +[primary only services](https://github.com/mongodb/mongo/blob/r6.0.0/docs/primary_only_service.md#primaryonlyservice) +that - by definition - get automatically resumed when the node of a shard steps up. -When a new primary node is elected, the DDL primary only service is rebuilt, and any ongoing coordinators will be restarted based on their persisted state document. During this recovery phase, any new requests for DDL operations are put on hold, waiting for existing coordinators to be re-instatiated to avoid conflicts with the DDL locks. +When a new primary node is elected, the DDL primary only service is rebuilt, and any ongoing +coordinators will be restarted based on their persisted state document. During this recovery phase, +any new requests for DDL operations are put on hold, waiting for existing coordinators to be +re-instatiated to avoid conflicts with the DDL locks. ### Sections about specific standard DDL operations @@ -76,11 +127,26 @@ When a new primary node is elected, the DDL primary only service is rebuilt, and ## Non-Standard DDLs -Some DDL operations do not follow the structure outlined in the section above. These operations are [chunk migration](../../s/README_migrations.md), resharding, and refine collection shard key. There are also other operations such as add and remove shard that do not modify the sharding catalog but do modify local metadata and need to coordinate with ddl operations. These operations also do not use the DDL coordinator infrastructure, but they do take the DDl lock to synchronize with other ddls. +Some DDL operations do not follow the structure outlined in the section above. These operations are +[chunk migration](../../s/README_migrations.md), resharding, and refine collection shard key. There +are also other operations such as add and remove shard that do not modify the sharding catalog but +do modify local metadata and need to coordinate with ddl operations. These operations also do not +use the DDL coordinator infrastructure, but they do take the DDl lock to synchronize with other +ddls. -Both chunk migration and resharding have to copy user data across shards. This is too time intensive to happen entirely while holding the collection critical section, so these operations have separate machinery to transfer the data and commit the changes. These commands do not commit transactionally across the shards and the config server, rather they commit on the config server and rely on shards pulling the updated commit information from the config server after learning via a router that there is new information. They also do not have the same requirement as standard DDL operations that they must complete after starting except after entering their commit phases. +Both chunk migration and resharding have to copy user data across shards. This is too time intensive +to happen entirely while holding the collection critical section, so these operations have separate +machinery to transfer the data and commit the changes. These commands do not commit transactionally +across the shards and the config server, rather they commit on the config server and rely on shards +pulling the updated commit information from the config server after learning via a router that there +is new information. They also do not have the same requirement as standard DDL operations that they +must complete after starting except after entering their commit phases. -Refine shard key commits only on the config server, again relying on shards to pull updated information from the config server after hearing about this more recent information from a router. In this case, this was done not because of the cost of transfering data, but so that refine shard key did not need to involve the shards. This allows the refineShardKey command to run quickly and not block operations. +Refine shard key commits only on the config server, again relying on shards to pull updated +information from the config server after hearing about this more recent information from a router. +In this case, this was done not because of the cost of transfering data, but so that refine shard +key did not need to involve the shards. This allows the refineShardKey command to run quickly and +not block operations. ### Sections explaining specific non-standard DDL operations diff --git a/src/mongo/db/global_catalog/ddl/README_transactions_and_ddl.md b/src/mongo/db/global_catalog/ddl/README_transactions_and_ddl.md index 8a253a48a33..aa42d591944 100644 --- a/src/mongo/db/global_catalog/ddl/README_transactions_and_ddl.md +++ b/src/mongo/db/global_catalog/ddl/README_transactions_and_ddl.md @@ -1,27 +1,29 @@ # Sharded Transactions and DDLs -This guide describes the consistency protocols specific to multi-statement transactions in the presence -of DDLs. Refer to [this architecture guide](https://github.com/mongodb/mongo/blob/8a79395deff895f18b8878ff4567c9fb309a7c64/src/mongo/db/s/README_sessions_and_transactions.md#transactions) for more general information about MongoDB multi-statement transactions. +This guide describes the consistency protocols specific to multi-statement transactions in the +presence of DDLs. Refer to +[this architecture guide](https://github.com/mongodb/mongo/blob/8a79395deff895f18b8878ff4567c9fb309a7c64/src/mongo/db/s/README_sessions_and_transactions.md#transactions) +for more general information about MongoDB multi-statement transactions. -In a sharded cluster, a multi-statement transaction (also referred to as a transaction) -establishes a storage snapshot on a participant shard when the first statement targets the shard. -The snapshot remains open and does not advance for the lifetime of the transaction. Two-phase -locking prevents DDLs from committing on the namespaces involved in the transaction until the -transaction commits or aborts. DDL in this context refers to both schema and data distribution -changes. +In a sharded cluster, a multi-statement transaction (also referred to as a transaction) establishes +a storage snapshot on a participant shard when the first statement targets the shard. The snapshot +remains open and does not advance for the lifetime of the transaction. Two-phase locking prevents +DDLs from committing on the namespaces involved in the transaction until the transaction commits or +aborts. DDL in this context refers to both schema and data distribution changes. -From a storage snapshot perspective, transactions behave as follows, depending on the read -concern level: +From a storage snapshot perspective, transactions behave as follows, depending on the read concern +level: -- "local" and "majority" read concern establish snapshots on each participant shards at an - arbitrary timestamp that is not guaranteed to be the same across all shards. +- "local" and "majority" read concern establish snapshots on each participant shards at an arbitrary + timestamp that is not guaranteed to be the same across all shards. - "snapshot" read concern establishes snapshots on participant shards at the same timestamp. Participant shards validate the data ownership of the received statements, and reject requests that -use stale routing information, via the [placement versioning protocol](https://github.com/mongodb/mongo/blob/8a79395deff895f18b8878ff4567c9fb309a7c64/src/mongo/db/s/README_versioning_protocols.md). +use stale routing information, via the +[placement versioning protocol](https://github.com/mongodb/mongo/blob/8a79395deff895f18b8878ff4567c9fb309a7c64/src/mongo/db/s/README_versioning_protocols.md). Routers operate with a cached version of the routing table, which can be stale, and is lazily -refreshed when a shard informs the router that its table is stale. The routers refresh to the -latest version of the routing table. +refreshed when a shard informs the router that its table is stale. The routers refresh to the latest +version of the routing table. From an isolation perspective: @@ -29,28 +31,29 @@ From an isolation perspective: distribution change occurring after the snapshot is established is invisible. - The placement versioning protocol, and more broadly, the routing protocol, operate with _read committed_ level: routing continuously observe data distribution changes. This routing protocol - behavior is also true on the shard end: the shard validates that the incoming request originates from - a router that has the last data ownership information, regardless of whether this ownership + behavior is also true on the shard end: the shard validates that the incoming request originates + from a router that has the last data ownership information, regardless of whether this ownership information is visible in the existing data snapshot. In practical terms, the routing protocol covers the case where the router is stale compared to the shard's view of the catalog, however it is not designed to address the case where the router uses -information that is newer than (and incompatible with) the shard's snapshot. The routing protocol -in itself cannot forbid the following anomalies: +information that is newer than (and incompatible with) the shard's snapshot. The routing protocol in +itself cannot forbid the following anomalies: - **Data placement anomaly:** The router forwards a request to a shard using data ownership - information that is newer than (and invisible in) the shard's snapshot. The shard's data snapshot is - unable to observe the incoming range. Processing this request would miss data belonging to that - range. This anomaly could occur when a range migration interleaves with an uncommitted transaction. + information that is newer than (and invisible in) the shard's snapshot. The shard's data snapshot + is unable to observe the incoming range. Processing this request would miss data belonging to that + range. This anomaly could occur when a range migration interleaves with an uncommitted + transaction. - **Collection generation anomaly:** The router forwards a request to the shard relative to a - [collection generation](../../shard_role/shard_catalog/README_terminology.md) - that is newer than the one in the shard's snapshot. This could occur, for instance, when the - collection's namespace is recreated on the shard after the transaction has established a snapshot. + [collection generation](../../shard_role/shard_catalog/README_terminology.md) that is newer than + the one in the shard's snapshot. This could occur, for instance, when the collection's namespace + is recreated on the shard after the transaction has established a snapshot. - **Collection incarnation anomaly:** Similar to the collection generation anomaly, but concerning - the local catalog. The router forwards a request with ShardVersion::UNTRACKED, bypassing collection - generation checks. The request might be for a namespace that was sharded when the transaction - established the snapshot. Processing this request would incorrectly return partial data for the - collection as the router only targeted the primary shard. + the local catalog. The router forwards a request with ShardVersion::UNTRACKED, bypassing + collection generation checks. The request might be for a namespace that was sharded when the + transaction established the snapshot. Processing this request would incorrectly return partial + data for the collection as the router only targeted the primary shard. The sections below describe the protocols transactions use along with the placement versioning protocol to forbid the anomalies above. @@ -65,33 +68,38 @@ On the first statement, the router chooses an `atClusterTime` (i.e., it selects For **tracked** collections, the protocol is as follows: 1. The router forwards statements using its latest routing table, but interprets it as of - `atClusterTime` by consulting the [chunk ownership history](https://github.com/mongodb/mongo/blob/6afc3207668d5dca4e7168bdb089f74bc299ef06/src/mongo/s/catalog/type_chunk.h#L295-L296). + `atClusterTime` by consulting the + [chunk ownership history](https://github.com/mongodb/mongo/blob/6afc3207668d5dca4e7168bdb089f74bc299ef06/src/mongo/s/catalog/type_chunk.h#L295-L296). 1. The targeted shard checks the attached placement version. All the following conditions must be met for the request to be considered valid: 1. It must match the current (latest) placement version for this shard. - 1. The received `atClusterTime` must not be earlier than the latest placement version's [timestamp field](https://github.com/mongodb/mongo/blob/8a79395deff895f18b8878ff4567c9fb309a7c64/src/mongo/db/s/README_versioning_protocols.md#shard-version) known by the shard. - This field represents the commit timestamp of the latest collection generation operation - (e.g. shardCollection, renameCollection, etc) on this sharded collection. + 1. The received `atClusterTime` must not be earlier than the latest placement version's + [timestamp field](https://github.com/mongodb/mongo/blob/8a79395deff895f18b8878ff4567c9fb309a7c64/src/mongo/db/s/README_versioning_protocols.md#shard-version) + known by the shard. This field represents the commit timestamp of the latest collection + generation operation (e.g. shardCollection, renameCollection, etc) on this sharded collection. Notes: - (2.i) ensures that the routing table used by the router (including its history) is not stale. This is part of the placement versioning protocol. - (2.ii) ensures that the collection did not undergo any collection generation change at a timestamp - later than `atClusterTime`, which would make the current routing/filtering metadata invalid to be used - with the point-in-time storage snapshot. This proscribes the collection generation anomaly. + later than `atClusterTime`, which would make the current routing/filtering metadata invalid to be + used with the point-in-time storage snapshot. This proscribes the collection generation anomaly. For **untracked** collections, the protocol is as follows: 1. The router forwards statements using the latest database version, and targets its primary shard. ShardVersion::UNTRACKED is attached in addition to the DatabaseVersion. -1. The targeted shard checks the attached metadata. All the following conditions must be - met for the request to be considered valid: +1. The targeted shard checks the attached metadata. All the following conditions must be met for the + request to be considered valid: 1. The received database version must match the current (latest) database version. - 1. The received `atClusterTime` must not be earlier than the latest database version's [timestamp field](https://github.com/mongodb/mongo/blob/eeef1763cb0ff77757bb60eabb8ad1233c990786/src/mongo/db/s/README_versioning_protocols.md#database-version) known by the shard. - This field represents the commit timestamp of the latest reincarnation (drop/create) or movePrimary operation for this database. + 1. The received `atClusterTime` must not be earlier than the latest database version's + [timestamp field](https://github.com/mongodb/mongo/blob/eeef1763cb0ff77757bb60eabb8ad1233c990786/src/mongo/db/s/README_versioning_protocols.md#database-version) + known by the shard. This field represents the commit timestamp of the latest reincarnation + (drop/create) or movePrimary operation for this database. 1. The received placement version is UNTRACKED, and the shard checks the latest version matches. - 1. The collection in the snapshot must be the same incarnation (same UUID) as in the latest CollectionCatalog. + 1. The collection in the snapshot must be the same incarnation (same UUID) as in the latest + CollectionCatalog. Notes: @@ -99,22 +107,24 @@ Notes: database versioning protocol. - (2.ii) ensures that the database did not undergo any reincarnation at a timestamp later than `atClusterTime`. The router always routes requests for untracked collections based on the latest - database primary shard knowledge, but this decision might not be valid at the specified cluster time. - E.g. if the shard was not the primary shard for the database at that point in time. + database primary shard knowledge, but this decision might not be valid at the specified cluster + time. E.g. if the shard was not the primary shard for the database at that point in time. - (2.iii) ensures that the collection generation anomaly is detected for cases where an untracked collection becomes tracked. There will be a mismatch between the attached ShardVersion::UNTRACKED and the actual placement version on the shard. - (2.iv) ensures that the collection incarnation anomaly is detected by the primary shard after a - sharded collection is reincarnated as unsharded (by definition, ShardVersion::UNTRACKED always conforms with 2.ii). + sharded collection is reincarnated as unsharded (by definition, ShardVersion::UNTRACKED always + conforms with 2.ii). ## Transactions with readConcern="local" or "majority" This protocol is more complex, because with read concern levels weaker than "snapshot", each -participant shard can open their read snapshot at different timestamps, and the router is unaware -of the timestamp they chose. Both data placement and collection generation anomalies apply. +participant shard can open their read snapshot at different timestamps, and the router is unaware of +the timestamp they chose. Both data placement and collection generation anomalies apply. -On the first statement, the router chooses a `placementConflictTime` (i.e., it selects its latest known -`VectorClock::clusterTime`) at the beginning of the transaction and uses the same `placementConflictTime` for all statements. +On the first statement, the router chooses a `placementConflictTime` (i.e., it selects its latest +known `VectorClock::clusterTime`) at the beginning of the transaction and uses the same +`placementConflictTime` for all statements. For **tracked** collections, the protocol is as follows: @@ -126,16 +136,17 @@ For **tracked** collections, the protocol is as follows: 1. The targeted shard checks the attached placement version. All the following conditions must be met for the request to be considered valid: 1. It must match the current (latest) placement version for this shard. - 1. `placementConflictTime` must not be earlier than the placement version's [timestamp field](https://github.com/mongodb/mongo/blob/8a79395deff895f18b8878ff4567c9fb309a7c64/src/mongo/db/s/README_versioning_protocols.md#shard-version). + 1. `placementConflictTime` must not be earlier than the placement version's + [timestamp field](https://github.com/mongodb/mongo/blob/8a79395deff895f18b8878ff4567c9fb309a7c64/src/mongo/db/s/README_versioning_protocols.md#shard-version). This field represents the commit timestamp of the latest DDL operation (e.g. create, rename, etc) on this collection. - 1. `placementConflictTime` must not be earlier than the latest _incoming_ migration commit timestamp on this - shard for this collection. + 1. `placementConflictTime` must not be earlier than the latest _incoming_ migration commit + timestamp on this shard for this collection. Notes: -- (4.i) ensures that the routing table used by the router is not stale. This is part of the placement - versioning protocol. +- (4.i) ensures that the routing table used by the router is not stale. This is part of the + placement versioning protocol. - (4.ii) ensures that the collection did not undergo any collection generation change at a timestamp later than `placementConflictTime`, which would make the current routing/filtering metadata invalid to be used with the open snapshot. This proscribes the collection generation anomaly. @@ -143,43 +154,47 @@ Notes: placement anomaly. - The `afterClusterTime` selected at (2) imposes a lower bound for each shard's snapshot read timestamp. The (4.i) and (4.ii) assertions check that the metadata/placement has not changed since - that lower bound, therefore guaranteeing that the assertions are valid for whatever timestamp could - have been ultimately selected. The lower bound imposed by `afterClusterTime` is necessary because - there is otherwise no guarantee that the shard would open a snapshot that is at least inclusive of - the `placementConflictTime`. Consider this scenario involving a readConcern:"majority" transaction: + that lower bound, therefore guaranteeing that the assertions are valid for whatever timestamp + could have been ultimately selected. The lower bound imposed by `afterClusterTime` is necessary + because there is otherwise no guarantee that the shard would open a snapshot that is at least + inclusive of the `placementConflictTime`. Consider this scenario involving a + readConcern:"majority" transaction: - The config server commits the range migration at timestamp T100, and majority replicates it. - The shard learns the commit timestamp from the config server, sets its latest migration commit timestamp to T100, and rescinds the critical section. - - The first transaction statement comes in, with `placementConflictTime=T100`. If the - shard's majority commit point has not yet advanced to T100, in the absence of `afterClusterTime=T100`, + - The first transaction statement comes in, with `placementConflictTime=T100`. If the shard's + majority commit point has not yet advanced to T100, in the absence of `afterClusterTime=T100`, the shard could open a snapshot at T99, and miss the incoming range. -- By design, it is not possible for a router to cache routing information accounting for the - latest migration, without having gossiped in a clusterTime that is at least as recent as that - migration's commit timestamp. +- By design, it is not possible for a router to cache routing information accounting for the latest + migration, without having gossiped in a clusterTime that is at least as recent as that migration's + commit timestamp. For **untracked** collections, the protocol is as follows: 1. For each statement, the router uses the latest database version. -1. For each statement, the router sends the command to the database primary shard. It attaches the database - version as usual, and additionally attaches the selected `placementConflictTime`. It also - attaches an `afterClusterTime` = `placementConflictTime`, and ShardVersion::UNTRACKED. +1. For each statement, the router sends the command to the database primary shard. It attaches the + database version as usual, and additionally attaches the selected `placementConflictTime`. It + also attaches an `afterClusterTime` = `placementConflictTime`, and ShardVersion::UNTRACKED. 1. The targeted shard will open its storage snapshot with a timestamp at least `afterClusterTime`. -1. The targeted shard checks the attached metadata. All the following conditions must be - met for the request to be considered valid: +1. The targeted shard checks the attached metadata. All the following conditions must be met for the + request to be considered valid: 1. The received database version must match the current (latest) database version. - 1. `placementConflictTime` must not be earlier than the database version's [timestamp field](https://github.com/mongodb/mongo/blob/eeef1763cb0ff77757bb60eabb8ad1233c990786/src/mongo/db/s/README_versioning_protocols.md#database-version). - This field represents the commit timestamp of the latest reincarnation (drop/create) or movePrimary operation for this database. + 1. `placementConflictTime` must not be earlier than the database version's + [timestamp field](https://github.com/mongodb/mongo/blob/eeef1763cb0ff77757bb60eabb8ad1233c990786/src/mongo/db/s/README_versioning_protocols.md#database-version). + This field represents the commit timestamp of the latest reincarnation (drop/create) or + movePrimary operation for this database. 1. The received placement version is UNTRACKED, and the shard checks the latest version matches. - 1. The collection in the snapshot must be the same incarnation (same UUID) as in the latest CollectionCatalog. + 1. The collection in the snapshot must be the same incarnation (same UUID) as in the latest + CollectionCatalog. Notes: -- (4.i) ensures that the database version used by the router is not stale. This is part of the database - versioning protocol. +- (4.i) ensures that the database version used by the router is not stale. This is part of the + database versioning protocol. - (4.ii) ensures that the database did not undergo any reincarnation at a timestamp later than - `placementConflictTime`. The router always routes requests for untracked collections based on the latest - database primary shard knowledge, but this decision might not be valid at for snapshots opened at - a timestamp before the reincarnation. + `placementConflictTime`. The router always routes requests for untracked collections based on the + latest database primary shard knowledge, but this decision might not be valid at for snapshots + opened at a timestamp before the reincarnation. - (4.iii) ensures that the collection generation anomaly is detected for cases where an untracked collection becomes tracked. There will be a mismatch between the attached ShardVersion::UNTRACKED and the actual placement version on the shard. diff --git a/src/mongo/db/index_builds/README.md b/src/mongo/db/index_builds/README.md index 5cc841bb710..33530e5054a 100644 --- a/src/mongo/db/index_builds/README.md +++ b/src/mongo/db/index_builds/README.md @@ -7,24 +7,23 @@ At a high level, omitting details that will be elaborated upon in further sectio have the following procedure: - While holding a collection X lock, write a new index entry to the array of indexes included as - part of a durable catalog entry. This entry has a `ready: false` component. See [Durable - Catalog](../shard_role/shard_catalog/README.md#durable-catalog). + part of a durable catalog entry. This entry has a `ready: false` component. See + [Durable Catalog](../shard_role/shard_catalog/README.md#durable-catalog). - Downgrade to a collection IX lock. - Scan all documents on the collection to be indexed - - Generate [keys](../storage/key_string/README.md) for the indexed fields for each - document + - Generate [keys](../storage/key_string/README.md) for the indexed fields for each document - Periodically yield locks and storage engine snapshots - Insert the generated keys into the [external sorter](../sorter/README.md) -- Read the sorted keys from the external sorter and load them into the storage engine index. - For performance reasons, we insert the keys into the index table in sorted order. +- Read the sorted keys from the external sorter and load them into the storage engine index. For + performance reasons, we insert the keys into the index table in sorted order. - While holding a collection X lock, make a final `ready: true` write to the durable catalog. ## Hybrid Index Builds Hybrid index builds refer to the default procedure introduced in 4.2 that produces efficient index data structures without blocking reads or writes for extended periods of time. This is achieved by -performing a full collection scan and loading keys while concurrently -intercepting new writes into an internal storage engine table. +performing a full collection scan and loading keys while concurrently intercepting new writes into +an internal storage engine table. ### Internal Table For Side Writes @@ -55,9 +54,8 @@ Once the collection scan and key-loading phases of the index build are complete, keys are applied directly to the index in three phases: - Drain the side table while holding a collection IX lock to allow concurrent reads and writes. - - Since writes are still accepted, new keys may appear at the end of the _side-writes_ table. - They will be applied in subsequent steps. - (Signal commit readiness to the primary) + - Since writes are still accepted, new keys may appear at the end of the _side-writes_ table. They + will be applied in subsequent steps. (Signal commit readiness to the primary) - Continue draining the side table while holding a collection IX lock to allow concurrent reads and writes, while waiting for other replicas to become commit-ready. - Drain the side table while holding a collection X lock to block all reads and writes. @@ -75,8 +73,8 @@ and Unique indexes created with `{unique: true}` enforce a constraint that there are no duplicate keys in an index. The hybrid index procedure makes it challenging to detect duplicates because keys are -split between the index and the side-writes table. Additionally, during the lifetime of -an index build, concurrent writes may introduce and resolve duplicate key conflicts on the index. +split between the index and the side-writes table. Additionally, during the lifetime of an index +build, concurrent writes may introduce and resolve duplicate key conflicts on the index. For those reasons, during an index build we temporarily allow duplicate key violations, and record any detected violations in an internal table, the _duplicate key tracker_. Each record is keyed by a @@ -86,8 +84,8 @@ excluded because the tracker only cares whether the key still has duplicates at which specific document owns the key. The type bits are preserved so that a human-readable key can be reconstructed for the error message if the violation persists. -At the conclusion of the index build, under a collection X lock, [duplicate keys are -re-checked](https://github.com/mongodb/mongo/blob/e0efdcfb5020b802da043b955e922d0995109619/src/mongo/db/index_builds/index_builds_coordinator.cpp#L3730). +At the conclusion of the index build, under a collection X lock, +[duplicate keys are re-checked](https://github.com/mongodb/mongo/blob/e0efdcfb5020b802da043b955e922d0995109619/src/mongo/db/index_builds/index_builds_coordinator.cpp#L3730). If the duplicate has been resolved (e.g. the conflicting document was deleted during the build), the record is deleted. If it persists, an error is thrown. @@ -102,30 +100,29 @@ to generate a key for `{a: [1, 2, 3], b: [4, 5, 6]}`. On a primary under normal circumstances, index builds fail immediately after encountering a key generation error (as opposed to duplicate key errors), and the error is returned to the user. Since -secondaries apply oplog entries [out of -order](../repl/README.md#oplog-entry-application), however, spurious key generation errors may be -encountered on otherwise consistent data. To solve this problem, we relax key constraints and -suppress key generation errors on secondaries. +secondaries apply oplog entries [out of order](../repl/README.md#oplog-entry-application), however, +spurious key generation errors may be encountered on otherwise consistent data. To solve this +problem, we relax key constraints and suppress key generation errors on secondaries. With the introduction of simultaneous index builds, an index build may be started on a secondary node, but complete while it is a primary after a state transition. If we ignored constraints while in the secondary state, we would not be able to commit the index build and guarantee its consistency since we may have suppressed valid key generation errors. -To solve this problem, on secondaries, the records associated with key generation errors are -skipped and recorded in an internal table, the _skipped records tracker_. Each record is a BSON -document containing the record identifier of the collection document that failed key generation. No -index key is stored because the key failed to generate in the first place. +To solve this problem, on secondaries, the records associated with key generation errors are skipped +and recorded in an internal table, the _skipped records tracker_. Each record is a BSON document +containing the record identifier of the collection document that failed key generation. No index key +is stored because the key failed to generate in the first place. -If a secondary node becomes primary and then commits the index build, it re-generates and -re-inserts keys for the [skipped -records](https://github.com/mongodb/mongo/blob/e0efdcfb5020b802da043b955e922d0995109619/src/mongo/db/index_builds/index_builds_coordinator.cpp#L2037) +If a secondary node becomes primary and then commits the index build, it re-generates and re-inserts +keys for the +[skipped records](https://github.com/mongodb/mongo/blob/e0efdcfb5020b802da043b955e922d0995109619/src/mongo/db/index_builds/index_builds_coordinator.cpp#L2037) under a collection X lock: for each stored record identifier, the current version of the document is read and keys are re-generated. If there are still constraint violations, an error is thrown and the index build aborts. Primaries do not suppress key generation errors, so they do not use the skipped records tracker; they abort immediately when a key generation error occurs. Secondaries that remain -secondary rely on the primary's decision to commit as assurance that skipped records do not need -to be checked. +secondary rely on the primary's decision to commit as assurance that skipped records do not need to +be checked. See [SkippedRecordTracker](https://github.com/mongodb/mongo/blob/e0efdcfb5020b802da043b955e922d0995109619/src/mongo/db/index_builds/skipped_record_tracker.h#L54). @@ -151,16 +148,15 @@ primary is done with its indexing, it will decide to replicate either an `abortI Each node independently builds the index by scanning its own collection data. The external sorter spills to local unreplicated temporary files under `dbpath/_tmp` when the memory limit is reached. -Once sorted, the keys are -[bulk-loaded](https://source.wiredtiger.com/develop/tune_bulk_load.html) into the WiredTiger index -table. Bulk-loading requires keys to be inserted in sorted order, but builds a B-tree structure that -is more efficiently filled than with random insertion. +Once sorted, the keys are [bulk-loaded](https://source.wiredtiger.com/develop/tune_bulk_load.html) +into the WiredTiger index table. Bulk-loading requires keys to be inserted in sorted order, but +builds a B-tree structure that is more efficiently filled than with random insertion. Simultaneous index builds are resilient to replica set state transitions. The node that starts an index build does not need to be the same node that decides to commit it. -See [Index Builds in Replicated Environments - MongoDB -Manual](https://www.mongodb.com/docs/manual/core/index-creation/#index-builds-in-replicated-environments). +See +[Index Builds in Replicated Environments - MongoDB Manual](https://www.mongodb.com/docs/manual/core/index-creation/#index-builds-in-replicated-environments). Server 7.1 introduces the following improvements: @@ -168,8 +164,8 @@ Server 7.1 introduces the following improvements: 7.1, index builds aborted the index build close to completion, potentially long after detection. - A secondary member can abort a two-phase index build. Before 7.1, a secondary was forced to crash instead. See the [Voting for Abort](#voting-for-abort) section. -- Index builds are cancelled if there isn't enough storage space available. See the [Disk - Space](#disk-space) section. +- Index builds are cancelled if there isn't enough storage space available. See the + [Disk Space](#disk-space) section. ### Commit Quorum @@ -235,8 +231,7 @@ which defaults to 500MB. On clean shutdown, index builds save their progress in internal idents that will be used for resuming the index builds when the server starts up. The persisted information includes: -- [Phase of the index - build](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/resumable_index_builds.idl#L43) +- [Phase of the index build](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/resumable_index_builds.idl#L43) when it was interrupted for shutdown: - initialized - collection scan @@ -257,20 +252,20 @@ resumability can be found in [IndexBuildsCoordinator::isIndexBuildResumable()](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/index_builds_coordinator.cpp#L375). Generally, index builds are resumable under the following conditions: -- The index build is running on a voting member of the replica set with the default [commit - quorum](#commit-quorum) `"votingMembers"`. +- The index build is running on a voting member of the replica set with the default + [commit quorum](#commit-quorum) `"votingMembers"`. - Majority read concern is enabled. -The [Recover To A Timestamp (RTT) rollback -algorithm](https://github.com/mongodb/mongo/blob/04b12743cbdcfea11b339e6ad21fc24dec8f6539/src/mongo/db/repl/README.md#rollback) +The +[Recover To A Timestamp (RTT) rollback algorithm](https://github.com/mongodb/mongo/blob/04b12743cbdcfea11b339e6ad21fc24dec8f6539/src/mongo/db/repl/README.md#rollback) supports resuming index builds interrupted at any phase. On entering rollback, the resumable index information is persisted to disk using the same mechanism as shutdown. We resume the index build using the startup recovery logic that RTT uses to bring the node back to a writable state. For improved rollback semantics, resumable index builds require a majority read cursor during collection scan phase. Index builds wait for the majority commit point to advance before starting -the collection scan. The majority wait happens after installing the [side table for intercepting new -writes](#internal-table-for-side-writes). +the collection scan. The majority wait happens after installing the +[side table for intercepting new writes](#internal-table-for-side-writes). See [MultiIndexBlock::\_constructStateObject()](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/catalog/multi_index_block.cpp#L900) @@ -288,12 +283,11 @@ index build are explicitly replicated through the oplog via `ci` and `cd` oplog simply apply oplog to build the index. Thus, the concept of commit quorum does not apply to primary-driven index builds. -Both primary-driven and two-phase builds use the same [bulk -builder](https://github.com/mongodb/mongo/blob/e0efdcfb5020b802da043b955e922d0995109619/src/mongo/db/index/index_access_method.cpp#L920) +Both primary-driven and two-phase builds use the same +[bulk builder](https://github.com/mongodb/mongo/blob/e0efdcfb5020b802da043b955e922d0995109619/src/mongo/db/index/index_access_method.cpp#L920) for key insertion, but differ in how sorted keys are spilled. Compared to two-phase builds, which -spill to local unreplicated temporary files, primary-driven builds spill into a [replicated storage -engine -table](https://github.com/mongodb/mongo/blob/e0efdcfb5020b802da043b955e922d0995109619/src/mongo/db/index_builds/index_build_interceptor.h#L221) +spill to local unreplicated temporary files, primary-driven builds spill into a +[replicated storage engine table](https://github.com/mongodb/mongo/blob/e0efdcfb5020b802da043b955e922d0995109619/src/mongo/db/index_builds/index_build_interceptor.h#L221) so that sorted keys are propagated to secondaries through the oplog. ## Single-Phase Index Builds diff --git a/src/mongo/db/matcher/README.md b/src/mongo/db/matcher/README.md index ac46456918d..dbf1495afb2 100644 --- a/src/mongo/db/matcher/README.md +++ b/src/mongo/db/matcher/README.md @@ -2,26 +2,51 @@ ## Overview -A query is first parsed into a [logical model](../query/README_logical_models.md), representing the query's semantics in a structured format. This logical model is then normalized, transformed, and optimized, making it more amenable to efficient execution. These rewrites are based on predefined rules, or heuristics. We call them **heuristic rewrites** because we don't know for sure whether the rewritten result will be better than the original - we just have a best guess. Once the logical model is optimized, the query planner [generates](../query/plan_enumerator/README.md) multiple candidate physical representations and [selects](../exec/runtime_planners/classic_runtime_planner/README.md) the most efficient plan for execution. +A query is first parsed into a [logical model](../query/README_logical_models.md), representing the +query's semantics in a structured format. This logical model is then normalized, transformed, and +optimized, making it more amenable to efficient execution. These rewrites are based on predefined +rules, or heuristics. We call them **heuristic rewrites** because we don't know for sure whether the +rewritten result will be better than the original - we just have a best guess. Once the logical +model is optimized, the query planner [generates](../query/plan_enumerator/README.md) multiple +candidate physical representations and +[selects](../exec/runtime_planners/classic_runtime_planner/README.md) the most efficient plan for +execution. -This README will cover rewrites on the `MatchExpression` component of a find query or `$match` stage. It does not cover subsequent stages such as query compilation (stage builders). To learn more about rewrites on aggregate pipelines, refer to the [Pipeline Rewrites README](../pipeline/README.md). +This README will cover rewrites on the `MatchExpression` component of a find query or `$match` +stage. It does not cover subsequent stages such as query compilation (stage builders). To learn more +about rewrites on aggregate pipelines, refer to the +[Pipeline Rewrites README](../pipeline/README.md). ## MatchExpression Optimization -The entrypoint to [`MatchExpression`](../query/README_logical_models.md#matchexpression) optimization is the [`MatchExpression::optimize()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression.cpp#L138) function, which is called by [`MatchExpression::normalize()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression.h#L619). This is called on the root of the `MatchExpression` tree and makes simplifying changes to the tree's structure without altering its semantics, returning one of the following: +The entrypoint to [`MatchExpression`](../query/README_logical_models.md#matchexpression) +optimization is the +[`MatchExpression::optimize()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression.cpp#L138) +function, which is called by +[`MatchExpression::normalize()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression.h#L619). +This is called on the root of the `MatchExpression` tree and makes simplifying changes to the tree's +structure without altering its semantics, returning one of the following: 1. The original, unmodified `MatchExpression`, 1. The original `MatchExpression` that has been mutated, 1. A new `MatchExpression`. -`MatchExpression` optimization is done during query [canonicalization](../query/README_logical_models.md#canonicalquery), and involves the following processes: +`MatchExpression` optimization is done during query +[canonicalization](../query/README_logical_models.md#canonicalquery), and involves the following +processes: -1. [**Expression-specific Optimizations**](#expression-specific-optimizations): Individual `MatchExpression` nodes (e.g. `LeafMatchExpression` and `ArrayMatchExpression`) have specific logic for optimizations to perform on themselves. -1. [**Boolean Simplification**](#boolean-simplification): Simplifies the logical structure of `MatchExpression`s containing Boolean operators like `$and`, `$or`, `$nor`, and `$not`. +1. [**Expression-specific Optimizations**](#expression-specific-optimizations): Individual + `MatchExpression` nodes (e.g. `LeafMatchExpression` and `ArrayMatchExpression`) have specific + logic for optimizations to perform on themselves. +1. [**Boolean Simplification**](#boolean-simplification): Simplifies the logical structure of + `MatchExpression`s containing Boolean operators like `$and`, `$or`, `$nor`, and `$not`. > ### Aside: Disabling Optimizations > -> You may want to disable `MatchExpression` optimization for testing purposes, say if you added a new rewrite and want to verify that it's semantically correct. You can compare the results with optimizations on against results from an unoptimized form of the query by toggling the `disableMatchExpressionOptimization` failpoint. +> You may want to disable `MatchExpression` optimization for testing purposes, say if you added a +> new rewrite and want to verify that it's semantically correct. You can compare the results with +> optimizations on against results from an unoptimized form of the query by toggling the +> `disableMatchExpressionOptimization` failpoint. > > In a mongo shell, run the following command: > @@ -31,13 +56,25 @@ The entrypoint to [`MatchExpression`](../query/README_logical_models.md#matchexp ### Expression-specific Optimizations -Subclasses of `MatchExpression` that represent different types define specific optimization behavior by overriding the [`MatchExpression::getOptimizer()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression.h#L619) function, which takes in an input `MatchExpression` and passes the same `MatchExpression` to the resulting `ExpressionOptimizerFunc`. If the subclass holds children `MatchExpression` objects, it is responsible for returning an `ExpressionOptimizerFunc` that recursively calls `MatchExpression::optimize()` on those children. +Subclasses of `MatchExpression` that represent different types define specific optimization behavior +by overriding the +[`MatchExpression::getOptimizer()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression.h#L619) +function, which takes in an input `MatchExpression` and passes the same `MatchExpression` to the +resulting `ExpressionOptimizerFunc`. If the subclass holds children `MatchExpression` objects, it is +responsible for returning an `ExpressionOptimizerFunc` that recursively calls +`MatchExpression::optimize()` on those children. -Generally, we optimize the logical representation through a bottom-up approach. This is more efficient: by handling subtrees first and potentially eliminating redundant child nodes, unnecessary work is avoided at the parent level. However, this isn't always enforced. It is permissible for an implementation to optimize itself first (e.g. pruning child expressions) before optimizing the children themselves. +Generally, we optimize the logical representation through a bottom-up approach. This is more +efficient: by handling subtrees first and potentially eliminating redundant child nodes, unnecessary +work is avoided at the parent level. However, this isn't always enforced. It is permissible for an +implementation to optimize itself first (e.g. pruning child expressions) before optimizing the +children themselves. **Example 1: `ListOfMatchExpression`** -Let's examine how a [`ListOfMatchExpression`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_tree.h#L56), such as `$and` and `$or`, is rewritten. Consider the following query: +Let's examine how a +[`ListOfMatchExpression`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_tree.h#L56), +such as `$and` and `$or`, is rewritten. Consider the following query: ``` { @@ -66,21 +103,26 @@ Let's examine how a [`ListOfMatchExpression`](https://github.com/mongodb/mongo/b } ``` -The top-level `AndMatchExpression` has two children - an `OrMatchExpression` (Child 0) and an `AndMatchExpression` (Child 1). +The top-level `AndMatchExpression` has two children - an `OrMatchExpression` (Child 0) and an +`AndMatchExpression` (Child 1). For Child 0, the following rewrites are performed: -1. Since Child 0 is an OR with EQ conditions on the same path, it can be rewritten into an `InMatchExpression`. -2. Since the rewritten AND now only has one operand, we will simplify the expression to just the operand. +1. Since Child 0 is an OR with EQ conditions on the same path, it can be rewritten into an + `InMatchExpression`. +2. Since the rewritten AND now only has one operand, we will simplify the expression to just the + operand. ``` // Child 0 after rewrites { "age": { $in: [ 50, 30 ] } } ``` -Child 1 is an `AndMatchExpression` with a trivially true predicate and a nested `AndMatchExpression`. The following rewrites are performed: +Child 1 is an `AndMatchExpression` with a trivially true predicate and a nested +`AndMatchExpression`. The following rewrites are performed: -1. The associativity of AND means that AND absorbs the children of any ANDs among its children, so the nested predicate is lifted to the top-level. +1. The associativity of AND means that AND absorbs the children of any ANDs among its children, so + the nested predicate is lifted to the top-level. 1. All trivially true children are removed from the AND. ``` @@ -93,7 +135,8 @@ Child 1 is an `AndMatchExpression` with a trivially true predicate and a nested } ``` -At the root, the AND in Child 1 can now be absorbed into the top-most AND. The final, rewritten query is now simplified and ready for further query planning: +At the root, the AND in Child 1 can now be absorbed into the top-most AND. The final, rewritten +query is now simplified and ready for further query planning: ``` { @@ -107,7 +150,9 @@ At the root, the AND in Child 1 can now be absorbed into the top-most AND. The f **Example 2: `InMatchExpression`** -For an [`InMatchExpression`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_leaf.h#L764), we perform the following optimizations: +For an +[`InMatchExpression`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_leaf.h#L764), +we perform the following optimizations: 1. An IN with exactly one regex becomes a `RegexMatchExpression`. 2. An IN of exactly one equality becomes an `EqualityMatchExpression`. @@ -126,7 +171,12 @@ For an [`InMatchExpression`](https://github.com/mongodb/mongo/blob/28df8e56046e4 **Example 3: `ExprMatchExpression`** -There are also instances where we perform rewrites to make an expression that otherwise can't take advantage of indexes indexable. For instance, `$expr` expressions are not themselves indexable, but may contain children expressions that could use indexes to produce a superset of the expected result, minimizing the number of documents that need to be filtered. We attempt to rewrite them as a conjunction of internal `MatchExpression`s so that the query planner can potentially generate an index scan. +There are also instances where we perform rewrites to make an expression that otherwise can't take +advantage of indexes indexable. For instance, `$expr` expressions are not themselves indexable, but +may contain children expressions that could use indexes to produce a superset of the expected +result, minimizing the number of documents that need to be filtered. We attempt to rewrite them as a +conjunction of internal `MatchExpression`s so that the query planner can potentially generate an +index scan. We will rewrite the following `$expr`: @@ -159,75 +209,145 @@ into } ``` -Notice that `{ $eq: ['$x', 1] }` is representable as a `MatchExpression`, whereas `{ $eq: ['$y', '$z'] }` compares multiple field path references, requiring their values from each input document. `MatchExpression`s don't allow for more than one local document field path, so this part cannot be extracted. +Notice that `{ $eq: ['$x', 1] }` is representable as a `MatchExpression`, whereas +`{ $eq: ['$y', '$z'] }` compares multiple field path references, requiring their values from each +input document. `MatchExpression`s don't allow for more than one local document field path, so this +part cannot be extracted. -Unlike regular comparison operators, `$_internalExpr` operators have non-type bracketed semantics to match non-type bracketed comparison operators inside `$expr`. It will match either an identical set or superset of the documents matched by `$expr` due to semantic differences between the rewritten `MatchExpression` and the [`ExprMatchExpression`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_expr.h#L67). +Unlike regular comparison operators, `$_internalExpr` operators have non-type bracketed semantics to +match non-type bracketed comparison operators inside `$expr`. It will match either an identical set +or superset of the documents matched by `$expr` due to semantic differences between the rewritten +`MatchExpression` and the +[`ExprMatchExpression`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_expr.h#L67). > ### Aside: Type Bracketing > -> **Type bracketing** ensures that comparisons are performed between values of the same type, avoiding unexpected results. Given the query `{field: {$gt: 5}}`, type bracketing would ensure only numeric types are considered for comparison. For instance, documents with `{field: 10}` and `{field: 100.01}` would match. This is the default behavior for `MatchExpression`s. +> **Type bracketing** ensures that comparisons are performed between values of the same type, +> avoiding unexpected results. Given the query `{field: {$gt: 5}}`, type bracketing would ensure +> only numeric types are considered for comparison. For instance, documents with `{field: 10}` and +> `{field: 100.01}` would match. This is the default behavior for `MatchExpression`s. > -> On the other hand, without type bracketing, we would consider all types in the comparison. For instance, strings are higher in the [`BSON` sort order](https://www.mongodb.com/docs/manual/reference/bson-type-comparison-order/) than numerics, so the document `{field: "string"}` would also match the query above. This is the default comparison mode for `$expr`. +> On the other hand, without type bracketing, we would consider all types in the comparison. For +> instance, strings are higher in the +> [`BSON` sort order](https://www.mongodb.com/docs/manual/reference/bson-type-comparison-order/) +> than numerics, so the document `{field: "string"}` would also match the query above. This is the +> default comparison mode for `$expr`. -For example, `$_internalExprEq` in `MatchExpression` reaches into arrays, whereas `$eq` in `ExprMatchExpression` does not. Thus, the original `$expr` is still included as a second level of filtering to ensure that the returned results match expected `$expr` semantics. For the full description of `InternalExprMatchExpression` semantics, refer to [`expression_internal_expr_comparison.h`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_internal_expr_comparison.h#L71). +For example, `$_internalExprEq` in `MatchExpression` reaches into arrays, whereas `$eq` in +`ExprMatchExpression` does not. Thus, the original `$expr` is still included as a second level of +filtering to ensure that the returned results match expected `$expr` semantics. For the full +description of `InternalExprMatchExpression` semantics, refer to +[`expression_internal_expr_comparison.h`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_internal_expr_comparison.h#L71). > ### Aside: Array Traversal Semantics > -> **Array traversal** semantics define how arrays are traversed when walking through the `MatchExpression` tree. Different types of `MatchExpression`s can provide their array traversal requirements for leaf arrays (e.g. how we handle the path `a.b` when `b` is an array) and non-leaf arrays (e.g. how we handle the path `a.b` when `a` is an array). +> **Array traversal** semantics define how arrays are traversed when walking through the +> `MatchExpression` tree. Different types of `MatchExpression`s can provide their array traversal +> requirements for leaf arrays (e.g. how we handle the path `a.b` when `b` is an array) and non-leaf +> arrays (e.g. how we handle the path `a.b` when `a` is an array). > -> Generally, traversing arrays means that the elements of the array are considered along with the entire array object. When walking through the path `f` in the document `{f: [1, 2]}`, the path iterator would return 1, 2, and [1, 2]. +> Generally, traversing arrays means that the elements of the array are considered along with the +> entire array object. When walking through the path `f` in the document `{f: [1, 2]}`, the path +> iterator would return 1, 2, and [1, 2]. > -> If the behavior is no array traversal, then only the entire array object (`[1, 2]`) will be returned. There is also a mode where only the array elements are returned, while the array itself is omitted. For the full definition of array traversal modes, refer to [`LeafArrayBehavior`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/path.h#L53) and [`NonLeafArrayBehavior`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/path.h#L77). +> If the behavior is no array traversal, then only the entire array object (`[1, 2]`) will be +> returned. There is also a mode where only the array elements are returned, while the array itself +> is omitted. For the full definition of array traversal modes, refer to +> [`LeafArrayBehavior`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/path.h#L53) +> and +> [`NonLeafArrayBehavior`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/path.h#L77). > -> For `MatchExpression`s, the default mode is to traverse arrays, while for `$expr`, the default mode is not to traverse arrays. For instance, `{$expr: {$eq: ["$f", [1, 2]]}}` will only match documents where the value at `f` is the entire array `[1, 2]`. +> For `MatchExpression`s, the default mode is to traverse arrays, while for `$expr`, the default +> mode is not to traverse arrays. For instance, `{$expr: {$eq: ["$f", [1, 2]]}}` will only match +> documents where the value at `f` is the entire array `[1, 2]`. ## Boolean Simplification -After calling `MatchExpression::getOptimizer()`, it may be that we still have a complex Boolean expression that could be further simplified. If the resulting `MatchExpression` falls within the limits set by the knobs in `ExpressionSimplifierSettings`, then it will undergo further simplifications in the Boolean expression simplifier with the goal of reducing computational overhead and enabling better plan generation. +After calling `MatchExpression::getOptimizer()`, it may be that we still have a complex Boolean +expression that could be further simplified. If the resulting `MatchExpression` falls within the +limits set by the knobs in `ExpressionSimplifierSettings`, then it will undergo further +simplifications in the Boolean expression simplifier with the goal of reducing computational +overhead and enabling better plan generation. > ### Aside: ExpressionSimplifierSettings > -> There are some cases when we may want to skip Boolean simplification, such as when the query is deemed to complex to simplify effectively. Additionally, even if simplification is attempted, we might discard the result because it is more complex than the original query. Some knobs that affect this decision are: +> There are some cases when we may want to skip Boolean simplification, such as when the query is +> deemed to complex to simplify effectively. Additionally, even if simplification is attempted, we +> might discard the result because it is more complex than the original query. Some knobs that +> affect this decision are: > -> - **maximumNumberOfUniquePredicates** - if the number of unique predicates in an expression exceeds this number, the expression is considered too big to be simplified. +> - **maximumNumberOfUniquePredicates** - if the number of unique predicates in an expression +> exceeds this number, the expression is considered too big to be simplified. > - **maximumNumberOfMinterms** - maximum number of minterms allowed during boolean transformations. -> - **maxSizeFactor** - if the simplified expression is larger than the original expression's size \* `maxSizeFactor`, the simplified one will be rejected. +> - **maxSizeFactor** - if the simplified expression is larger than the original expression's +> size \* `maxSizeFactor`, the simplified one will be rejected. > -> For the full list of settings, refer to [expression_simplifier.h](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_simplifier.h#L38). +> For the full list of settings, refer to +> [expression_simplifier.h](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_simplifier.h#L38). -The entrypoint into Boolean simplification is the [`simplifyMatchExpression()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_simplifier.cpp#L229) function. Broadly, it is implemented in the following steps: +The entrypoint into Boolean simplification is the +[`simplifyMatchExpression()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_simplifier.cpp#L229) +function. Broadly, it is implemented in the following steps: 1. **Convert the `MatchExpression` to a bitset tree.** > ### Aside: Bitset > -> A **bitset** is a compact representation of a set, where each element is represented by a bit in a sequence of binary digits. It indicates whether an element is present (1) or absent (0) in a set. For instance, +> A **bitset** is a compact representation of a set, where each element is represented by a bit in a +> sequence of binary digits. It indicates whether an element is present (1) or absent (0) in a set. +> For instance, > > - 0001 represents a set containing the first element. > - 1010 represents a set containing the second and fourth elements. > -> Bitset operations tend to be faster and more straightforward than working with a complex AST structure. +> Bitset operations tend to be faster and more straightforward than working with a complex AST +> structure. -- The query filter is transformed into a bitset tree where predicates are in leaf nodes stored as bitsets, while internal nodes represent the tree structure. An internal node may be a conjunction (AND) or disjunction (OR) of its children. -- MQL logical operators are represented like `BitsetTreeNode{ type: , isNegated: }`. For specific representations of each [logical operator](https://www.mongodb.com/docs/manual/reference/operator/query-logical/), refer to [bitset_tree.h](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification/bitset_tree.h#L50). +- The query filter is transformed into a bitset tree where predicates are in leaf nodes stored as + bitsets, while internal nodes represent the tree structure. An internal node may be a conjunction + (AND) or disjunction (OR) of its children. +- MQL logical operators are represented like + `BitsetTreeNode{ type: , isNegated: }`. For + specific representations of each + [logical operator](https://www.mongodb.com/docs/manual/reference/operator/query-logical/), refer + to + [bitset_tree.h](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification/bitset_tree.h#L50). 2. **Simplify the bitset tree to DNF.** > ### Aside: DNF (Disjunctive Normal Form) > -> DNF (Disjunctive Normal Form) is a canonical normal form of a logical expression consisting of a disjunction of conjunctions, or an OR of ANDs. +> DNF (Disjunctive Normal Form) is a canonical normal form of a logical expression consisting of a +> disjunction of conjunctions, or an OR of ANDs. > -> For instance, a query of the form A ∧ (B ∨ C) can be transfored into DNF by distributing A => (A ∧ B) ∨ (A ∧ C). +> For instance, a query of the form A ∧ (B ∨ C) can be transfored into DNF by distributing A => (A ∧ +> B) ∨ (A ∧ C). > -> The expression in DNF can also be thought of as a maxterm of minterms, where the **maxterm** is the top disjunction of an expression in DNF and the **minterm** is a conjunction of the expression. +> The expression in DNF can also be thought of as a maxterm of minterms, where the **maxterm** is +> the top disjunction of an expression in DNF and the **minterm** is a conjunction of the +> expression. -- Once the bitset tree is in DNF, it can be further reduced to a minimal set of minterms, or sum of products. +- Once the bitset tree is in DNF, it can be further reduced to a minimal set of minterms, or sum of + products. -3. **Apply the [Quine McCluskey](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification/quine_mccluskey.h) reduction operation of DNF terms**: (x ∧ y) ∨ (x ∧ ~y) = x -4. **Apply [Absorption's Law](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification/bitset_algebra.h#L117)**: x ∨ (x ∧ y) = x -5. **Use [Petrick's method](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification/petrick.h) for further simplification**: This is used to find the minimal "coverage", or the smallest set of minterms such that the predicates evaluate to true. - - For example, given the input list of minterms `[[0, 1, 2], [2, 3], [0, 3]]`, we can derive two minimal coverages: `[0, 1]` and `[0, 2]`. The result is a vector of indices to the required minterms. We can "cover" the predicates 0, 1, 2, and 3 with either pairs of the original list of minterms. -6. **Restore the original MatchExpression**: Finally, we [restore](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_simplifier.cpp#L259) the `MatchExpression` tree from the bitset tree and a list of expressions representing bits in the bitset tree. +3. **Apply the + [Quine McCluskey](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification/quine_mccluskey.h) + reduction operation of DNF terms**: (x ∧ y) ∨ (x ∧ ~y) = x +4. **Apply + [Absorption's Law](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification/bitset_algebra.h#L117)**: + x ∨ (x ∧ y) = x +5. **Use + [Petrick's method](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification/petrick.h) + for further simplification**: This is used to find the minimal "coverage", or the smallest set of + minterms such that the predicates evaluate to true. + - For example, given the input list of minterms `[[0, 1, 2], [2, 3], [0, 3]]`, we can derive two + minimal coverages: `[0, 1]` and `[0, 2]`. The result is a vector of indices to the required + minterms. We can "cover" the predicates 0, 1, 2, and 3 with either pairs of the original list + of minterms. +6. **Restore the original MatchExpression**: Finally, we + [restore](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/matcher/expression_simplifier.cpp#L259) + the `MatchExpression` tree from the bitset tree and a list of expressions representing bits in + the bitset tree. **Example**: @@ -265,7 +385,8 @@ Input query after calling `MatchExpression::getOptimizer()`: } ``` -First, we convert the equivalent `MatchExpression` to a bitset tree. Let's say this is the bit mapping: +First, we convert the equivalent `MatchExpression` to a bitset tree. Let's say this is the bit +mapping: - `field0 = A`: Bit 0 - `field1 = B`: Bit 1 @@ -296,25 +417,30 @@ BitsetTreeNode{ } ``` -Next, we simplify this bitset tree into DNF. This tree is already in DNF, so no further work simplifications are done here: +Next, we simplify this bitset tree into DNF. This tree is already in DNF, so no further work +simplifications are done here: ``` (0 AND 1) OR (0 AND 2 AND 3) OR (0 AND 2) OR (1 AND 2) ``` -We then apply Absorption's Law. `(0 AND 2)` absorbs `(0 AND 2 AND 3)` because the latter is a subset of the former. The simplified query becomes: +We then apply Absorption's Law. `(0 AND 2)` absorbs `(0 AND 2 AND 3)` because the latter is a subset +of the former. The simplified query becomes: ``` (0 AND 1) OR (0 AND 2) OR (1 AND 2) ``` -The last optimization we can make is to find the minimal coverage of minterms. The predicates mapped to the bits 0, 1, and 2 can still evaluate to true with either `[0, 1]` or `[1, 2]`. If we choose the former, we get: +The last optimization we can make is to find the minimal coverage of minterms. The predicates mapped +to the bits 0, 1, and 2 can still evaluate to true with either `[0, 1]` or `[1, 2]`. If we choose +the former, we get: ``` (0 AND 1) OR (0 AND 2) ``` -Finally, at the end of Boolean simplification, we restore the original `MatchExpression`, which rougly maps to the following MQL query: +Finally, at the end of Boolean simplification, we restore the original `MatchExpression`, which +rougly maps to the following MQL query: ``` { @@ -336,7 +462,11 @@ Finally, at the end of Boolean simplification, we restore the original `MatchExp ``` -For more information on the design of the Boolean simplifier, refer to the blog post: [Improving MongoDB Queries by Simplifying Boolean Expressions](https://www.mongodb.com/blog/post/improving-mongodb-queries-by-simplifying-boolean-expressions). Libraries can be found in the [`boolean_simplification`](https://github.com/mongodb/mongo/tree/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification) directory. +For more information on the design of the Boolean simplifier, refer to the blog post: +[Improving MongoDB Queries by Simplifying Boolean Expressions](https://www.mongodb.com/blog/post/improving-mongodb-queries-by-simplifying-boolean-expressions). +Libraries can be found in the +[`boolean_simplification`](https://github.com/mongodb/mongo/tree/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/boolean_simplification) +directory. ```mermaid graph TD diff --git a/src/mongo/db/pipeline/README.md b/src/mongo/db/pipeline/README.md index 0ede49e762a..b98c3b966d3 100644 --- a/src/mongo/db/pipeline/README.md +++ b/src/mongo/db/pipeline/README.md @@ -2,16 +2,28 @@ ## Overview -After an aggregate command issued by a user is parsed into a `Pipeline`, it undergoes **heuristic rewrites** to transform the whole pipeline as well as individual stages within the pipeline into a more efficient form. The entrypoint to pipeline optimization is the [`pipeline_optimization::optimizePipeline()`](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/optimize.cpp#L59) function, which is comprised of two primary steps: +After an aggregate command issued by a user is parsed into a `Pipeline`, it undergoes **heuristic +rewrites** to transform the whole pipeline as well as individual stages within the pipeline into a +more efficient form. The entrypoint to pipeline optimization is the +[`pipeline_optimization::optimizePipeline()`](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/optimize.cpp#L59) +function, which is comprised of two primary steps: -1. [**Inter-stage Optimization**](#inter-stage-optimization): optimizes the entire `Pipeline` object, which is represented internally as a container of `DocumentSource`s. This modifies the container by combining, swapping, dropping, and/or inserting stages. -1. [**Stage-specific Optimization**](#stage-specific-optimization): optimizes each stage, or `DocumentSource` individually. +1. [**Inter-stage Optimization**](#inter-stage-optimization): optimizes the entire `Pipeline` + object, which is represented internally as a container of `DocumentSource`s. This modifies the + container by combining, swapping, dropping, and/or inserting stages. +1. [**Stage-specific Optimization**](#stage-specific-optimization): optimizes each stage, or + `DocumentSource` individually. -For information on how to register new rewrites, see [Registering new rewrites](#registering-new-rewrites). +For information on how to register new rewrites, see +[Registering new rewrites](#registering-new-rewrites). > ### Aside: Disabling Optimizations > -> You may want to disable `Pipeline` optimization for testing purposes, say if you added a new rewrite and want to verify that it's semantically correct. You can compare the results with optimizations on against results from an unoptimized form of the query by toggling the `disablePipelineOptimization` failpoint, which prevents individual `DocumentSource`s from being optimized. +> You may want to disable `Pipeline` optimization for testing purposes, say if you added a new +> rewrite and want to verify that it's semantically correct. You can compare the results with +> optimizations on against results from an unoptimized form of the query by toggling the +> `disablePipelineOptimization` failpoint, which prevents individual `DocumentSource`s from being +> optimized. > > In a mongo shell, run the following command: > @@ -19,16 +31,35 @@ For information on how to register new rewrites, see [Registering new rewrites]( > db.adminCommand({configureFailPoint: "disablePipelineOptimization", mode: "alwaysOn"}) > ``` > -> You should also add the `{$_internalInhibitOptimation: {}}` stage before each stage in the pipeline to ensure that stages don't participate in whole `Pipeline` optimization (i.e. stage pushdown or swapping). +> You should also add the `{$_internalInhibitOptimation: {}}` stage before each stage in the +> pipeline to ensure that stages don't participate in whole `Pipeline` optimization (i.e. stage +> pushdown or swapping). ## Inter-stage Optimization -First, we attempt to optimize the overall `Pipeline` by modifying stages in relation with each other. The entrypoint is [`pipeline_optimization::optimizeContainer()`](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/optimize.cpp#L82), which invokes the [rule-based rewrite engine](https://github.com/mongodb/mongo/blob/787278f00ef495a4311c69b7b913fdc3fd32e4cd/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L196) to run all rules that may combine or re-order adjacent stages, or otherwise modify the structure of the pipeline. Currently all inter-stage rewrites, except for `$match`, `$sample`, `$project`, and `$redact` pushdowns, are implemented inside a public `optimizeAt()` method on each `DocumentSource` and registered as [unconditional rules](https://github.com/mongodb/mongo/blob/787278f00ef495a4311c69b7b913fdc3fd32e4cd/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L90) (see [qo_rules_to_move.cpp](https://github.com/mongodb/mongo/blob/787278f00ef495a4311c69b7b913fdc3fd32e4cd/src/mongo/db/pipeline/optimization/qo_rules_to_move.cpp#L46-L68) for an example). +First, we attempt to optimize the overall `Pipeline` by modifying stages in relation with each +other. The entrypoint is +[`pipeline_optimization::optimizeContainer()`](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/optimize.cpp#L82), +which invokes the +[rule-based rewrite engine](https://github.com/mongodb/mongo/blob/787278f00ef495a4311c69b7b913fdc3fd32e4cd/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L196) +to run all rules that may combine or re-order adjacent stages, or otherwise modify the structure of +the pipeline. Currently all inter-stage rewrites, except for `$match`, `$sample`, `$project`, and +`$redact` pushdowns, are implemented inside a public `optimizeAt()` method on each `DocumentSource` +and registered as +[unconditional rules](https://github.com/mongodb/mongo/blob/787278f00ef495a4311c69b7b913fdc3fd32e4cd/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L90) +(see +[qo_rules_to_move.cpp](https://github.com/mongodb/mongo/blob/787278f00ef495a4311c69b7b913fdc3fd32e4cd/src/mongo/db/pipeline/optimization/qo_rules_to_move.cpp#L46-L68) +for an example). These are the general classes of optimizations we attempt to make: 1. **Swapping stages**: - [`$match` pushdown](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/match_rules.cpp#L230-L236) is a prime example. Generally, we want to filter documents before passing them into a more computationally-heavy stage to minimize the amount of work that needs to be done. If the user writes a pipeline that specifies a `$match` after a stage like a `$sort`, we will push it down when possible in order to minimize the number of documents to sort. For example, if the user specified this pipeline: + [`$match` pushdown](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/match_rules.cpp#L230-L236) + is a prime example. Generally, we want to filter documents before passing them into a more + computationally-heavy stage to minimize the amount of work that needs to be done. If the user + writes a pipeline that specifies a `$match` after a stage like a `$sort`, we will push it down + when possible in order to minimize the number of documents to sort. For example, if the user + specified this pipeline: ``` { $sort: { age : -1 } }, @@ -42,10 +73,12 @@ the optimizer will rewrite it to: { $sort: { age : -1 } } ``` -2. **Coalescing stages**: - When possible, the optimizer coalesces a pipeline stage into its predecessor. This tends to occur after any sequence reordering optimizations. +2. **Coalescing stages**: When possible, the optimizer coalesces a pipeline stage into its + predecessor. This tends to occur after any sequence reordering optimizations. -For example, when a `$sort` precedes a `$limit`, the optimizer can coalesce the `$limit` into the `$sort` if no intervening stages modify the number of documents (e.g. `$unwind` or `$group`). Given a pipeline with the follwing stages: +For example, when a `$sort` precedes a `$limit`, the optimizer can coalesce the `$limit` into the +`$sort` if no intervening stages modify the number of documents (e.g. `$unwind` or `$group`). Given +a pipeline with the follwing stages: ``` { $sort : { age : -1 } }, @@ -72,9 +105,13 @@ the optimizer will rewrite it to: } ``` -This allows the sort to only maintain the top N results as it progresses, where N is the specified limit, reducing the number of documents that need to be stored in memory. +This allows the sort to only maintain the top N results as it progresses, where N is the specified +limit, reducing the number of documents that need to be stored in memory. -Two consecutive stages that are the same can also be coalesced (it may also be helpful to consider one stage as dropped). For instance, if a `$limit` stage is immediately followed by another `$limit` stage, we can keep the `$limit` stage where the limit amount is the _smaller_ of the two initial limit amounts, and drop the other stage. Given the pipeline: +Two consecutive stages that are the same can also be coalesced (it may also be helpful to consider +one stage as dropped). For instance, if a `$limit` stage is immediately followed by another `$limit` +stage, we can keep the `$limit` stage where the limit amount is the _smaller_ of the two initial +limit amounts, and drop the other stage. Given the pipeline: ``` { $limit: 100 }, @@ -87,15 +124,20 @@ the optimizer will rewrite it to: { $limit: 10 } ``` -3. **Inserting stages**: - While this may seem counter-intuitive, it can be beneficial to insert stages into the pipeline in some instances to constrain the stream of documents that are passed to the next stage as well as leverage better index usage. For example, when a pipeline has a `$redact` stage immediately followed by a `$match` stage, it may be possible to add a portion of the `$match` stage before the `$redact` stage. Given the pipeline: +3. **Inserting stages**: While this may seem counter-intuitive, it can be beneficial to insert + stages into the pipeline in some instances to constrain the stream of documents that are passed + to the next stage as well as leverage better index usage. For example, when a pipeline has a + `$redact` stage immediately followed by a `$match` stage, it may be possible to add a portion of + the `$match` stage before the `$redact` stage. Given the pipeline: ``` { $redact: { $cond: { if: { $gte: [ "$sensitivity", 3 ] }, then: "$$PRUNE", else: "$$DESCEND" } } }, { $match: { "status": "active", "sensitivity": { $lt: 5 } } } ``` -the optimizer could interpret that the "status" field is independent of the `$redact` stage, while the "sensitivity" field is dependent. It can split the `$match` stage into an independent and dependent portion, pushing the former before the `$redact` stage and keeping the latter after it: +the optimizer could interpret that the "status" field is independent of the `$redact` stage, while +the "sensitivity" field is dependent. It can split the `$match` stage into an independent and +dependent portion, pushing the former before the `$redact` stage and keeping the latter after it: ``` { $match: { "status": "active" } }, @@ -103,24 +145,37 @@ the optimizer could interpret that the "status" field is independent of the `$re { $match: "sensitivity": { $lt: 5 } } ``` -Although the pipeline has more stages than it initially started with, it can be executed more optimally because it both (1) minimizes the documents going into the resource-intensive `$redact` stage, as well as (2) enables the pipeline to use an index on "status" for the first `$match` stage. +Although the pipeline has more stages than it initially started with, it can be executed more +optimally because it both (1) minimizes the documents going into the resource-intensive `$redact` +stage, as well as (2) enables the pipeline to use an index on "status" for the first `$match` stage. -For an in-depth description of different pipeline optimizations, refer to the summary in our [docs](https://www.mongodb.com/docs/manual/core/aggregation-pipeline-optimization/#aggregation-pipeline-optimization). +For an in-depth description of different pipeline optimizations, refer to the summary in our +[docs](https://www.mongodb.com/docs/manual/core/aggregation-pipeline-optimization/#aggregation-pipeline-optimization). ### Dependency Tracking & Analysis -In order to determine whether a pipeline is valid, as well as whether stages can be pushed down before each other or if parts of a stage can be pruned, we use dependency tracking and analysis. This process identifies what fields or variables a stage depends on for execution. If a stage has dependencies on an earlier stage, it must remain sequentially after that stage throughout the rewrite process. On the other hand, if a stage is independent of a prior stage and it is performant to push it down, we know that it is safe to do so. +In order to determine whether a pipeline is valid, as well as whether stages can be pushed down +before each other or if parts of a stage can be pruned, we use dependency tracking and analysis. +This process identifies what fields or variables a stage depends on for execution. If a stage has +dependencies on an earlier stage, it must remain sequentially after that stage throughout the +rewrite process. On the other hand, if a stage is independent of a prior stage and it is performant +to push it down, we know that it is safe to do so. These are the major classes of dependencies: - **Document field dependencies**: specific document fields required by a pipeline stage. - _Example_: `{$project: {name: 1}}` depends on the `name` field. -- **Computed fields**: fields generated by stages like `$addFields` or `$group`. These newly-added fields may be dependencies for subsequent stages. - - _Example_: `{$addFields: {total: {$sum: ["price", "$tax"]}}}` computes `total`, which may be used as a dependency in later stages. -- **Renames**: mappings that change field names. Subsequent stages must account for these transformations to resolve the correct fields downstream. +- **Computed fields**: fields generated by stages like `$addFields` or `$group`. These newly-added + fields may be dependencies for subsequent stages. + - _Example_: `{$addFields: {total: {$sum: ["price", "$tax"]}}}` computes `total`, which may be + used as a dependency in later stages. +- **Renames**: mappings that change field names. Subsequent stages must account for these + transformations to resolve the correct fields downstream. - _Example_: `{$project: {newField: "$oldField"}}` renames `oldField` to `newField`. -- **Variable references**: dependencies on scoped variables, such as user-defined variables or system variables (`$$CURRENT`, `$$ROOT`). - - _Example_: In the following `$project` stage, `$$discount` is a variable reference defined and used within the `$let`: +- **Variable references**: dependencies on scoped variables, such as user-defined variables or + system variables (`$$CURRENT`, `$$ROOT`). + - _Example_: In the following `$project` stage, `$$discount` is a variable reference defined and + used within the `$let`: ``` { @@ -135,18 +190,34 @@ These are the major classes of dependencies: } ``` -- **Metadata**: additional information about a document that's not part of its core fields. This is often used in operations that rely on contextual information, such as text search scores, geographic proximity, or reserved fields. - - _Example_: The stage `{$project: {score: {$meta: "textScore"}}}` has a dependency on the `textScore` metadata, which is not a document field. +- **Metadata**: additional information about a document that's not part of its core fields. This is + often used in operations that rely on contextual information, such as text search scores, + geographic proximity, or reserved fields. + - _Example_: The stage `{$project: {score: {$meta: "textScore"}}}` has a dependency on the + `textScore` metadata, which is not a document field. -For implementation details about dependency tracking and validation, refer to the [`expression_algo`](https://github.com/mongodb/mongo/blob/d4a04783e727db7d533785689a7c92437cd05fdf/src/mongo/db/matcher/expression_algo.h), [`semantic_analysis`](https://github.com/mongodb/mongo/blob/d4a04783e727db7d533785689a7c92437cd05fdf/src/mongo/db/pipeline/semantic_analysis.h), and [`dependencies`](https://github.com/mongodb/mongo/blob/d4a04783e727db7d533785689a7c92437cd05fdf/src/mongo/db/pipeline/dependencies.h) files. +For implementation details about dependency tracking and validation, refer to the +[`expression_algo`](https://github.com/mongodb/mongo/blob/d4a04783e727db7d533785689a7c92437cd05fdf/src/mongo/db/matcher/expression_algo.h), +[`semantic_analysis`](https://github.com/mongodb/mongo/blob/d4a04783e727db7d533785689a7c92437cd05fdf/src/mongo/db/pipeline/semantic_analysis.h), +and +[`dependencies`](https://github.com/mongodb/mongo/blob/d4a04783e727db7d533785689a7c92437cd05fdf/src/mongo/db/pipeline/dependencies.h) +files. ## Stage-specific Optimization -Once we have the final order of the stages, we again invoke the [rule-based rewrite engine](../query/compiler/rewrites/README.md), but this time with a configuration that only run rules that perform stage-specific optimizations. These rules are currently implemented as public `optimize()` methods on `DocumentSource` subclasses and registered as unconditional rules. Each of these rules either returns an optimized `DocumentSource` that's semantically equivalent or removes the current stage if it's a no-op. For instance, a no-op stage like `{$match: {}}` would be removed. +Once we have the final order of the stages, we again invoke the +[rule-based rewrite engine](../query/compiler/rewrites/README.md), but this time with a +configuration that only run rules that perform stage-specific optimizations. These rules are +currently implemented as public `optimize()` methods on `DocumentSource` subclasses and registered +as unconditional rules. Each of these rules either returns an optimized `DocumentSource` that's +semantically equivalent or removes the current stage if it's a no-op. For instance, a no-op stage +like `{$match: {}}` would be removed. -The `MatchExpression` in a `$match` stage contains specific rewrite logic that is covered in greater detail [here](../matcher/README.md). +The `MatchExpression` in a `$match` stage contains specific rewrite logic that is covered in greater +detail [here](../matcher/README.md). -Additionally, a stage that contains an `Expression` with `ExpressionConstant` values may be eligible for constant folding. For example, the stage: +Additionally, a stage that contains an `Expression` with `ExpressionConstant` values may be eligible +for constant folding. For example, the stage: ``` { $project: { a: { $sum: [ 4, 5, 1 ] } } } @@ -160,15 +231,23 @@ contains constants than can be folded into one: > ### Aside: Constant Folding > -> **Constant folding** is the process of evaluating expressions that contain or resolve to constants, replacing them with their computed result. By simplifying the expression during query planning, we reduce computational overhead during execution. +> **Constant folding** is the process of evaluating expressions that contain or resolve to +> constants, replacing them with their computed result. By simplifying the expression during query +> planning, we reduce computational overhead during execution. > ### Aside: Expression > -> An **expression** is a component of a query that resolves to a value. It is stateless, meaning it returns a value without mutating any of the values used to build the expression. +> An **expression** is a component of a query that resolves to a value. It is stateless, meaning it +> returns a value without mutating any of the values used to build the expression. > -> For example, the expression `{$add: [3, "$inventory.total"]}` consists of the `$add` operator and two input expressions: the constant `3` and the field path expression `"$inventory.total`. It returns the result of adding 3 to the value at path `inventory.total` of the input document. +> For example, the expression `{$add: [3, "$inventory.total"]}` consists of the `$add` operator and +> two input expressions: the constant `3` and the field path expression `"$inventory.total`. It +> returns the result of adding 3 to the value at path `inventory.total` of the input document. -These optimizations may seem obvious, but oftentimes, the aggregation pipelines are computer-generated, and the application won't/shouldn't care to do such analysis. Additionally, the original query may have been more complex, but after earlier heuristic rewrites, we may find that more values could be folded together than we initially anticipated. +These optimizations may seem obvious, but oftentimes, the aggregation pipelines are +computer-generated, and the application won't/shouldn't care to do such analysis. Additionally, the +original query may have been more complex, but after earlier heuristic rewrites, we may find that +more values could be folded together than we initially anticipated. ```mermaid graph TD @@ -224,23 +303,58 @@ graph TD ## Registering new rewrites -All pipeline rewrites are invoked through the [rule-based rewrite engine](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L196) (see [README](../query/compiler/rewrites/README.md)). While most rewrites are still implemented inside `DocumentSource::optimizeAt()` and `optimize()` and registered as unconditional rules (i.e., rules where the precondition is always true), new rewrites should be implemented and registered as their own, separate rules. A rule is defined by a name, precondition and transform functions, a priority and a set of tags: https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L51-L81 +All pipeline rewrites are invoked through the +[rule-based rewrite engine](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L196) +(see [README](../query/compiler/rewrites/README.md)). While most rewrites are still implemented +inside `DocumentSource::optimizeAt()` and `optimize()` and registered as unconditional rules (i.e., +rules where the precondition is always true), new rewrites should be implemented and registered as +their own, separate rules. A rule is defined by a name, precondition and transform functions, a +priority and a set of tags: +https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L51-L81 ### Rule registry and registration macros -The [rule registry](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/rule_based_rewriter.cpp#L159-L160) is a mapping between `DocumentSource` subtypes and rules that are applicable to them. It lives as a decoration on the service context. This means rule registrations [get called](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L61) on service context creation, i.e. when a new mongod or mongos process is started. Whenever the rewrite engine advances to a new element, [`PipelineRewriteContext::enqueueRules()`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.cpp#L169-L185) is called, which looks up the current `DocumentSource`'s type in the registry and enqueues all applicable rules to be attempted on the current element. +The +[rule registry](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/rule_based_rewriter.cpp#L159-L160) +is a mapping between `DocumentSource` subtypes and rules that are applicable to them. It lives as a +decoration on the service context. This means rule registrations +[get called](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L61) +on service context creation, i.e. when a new mongod or mongos process is started. Whenever the +rewrite engine advances to a new element, +[`PipelineRewriteContext::enqueueRules()`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.cpp#L169-L185) +is called, which looks up the current `DocumentSource`'s type in the registry and enqueues all +applicable rules to be attempted on the current element. -Rules can be registered using the [`REGISTER_RULES`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L49) macro. It accepts a `DocumentSource` subclass as its first argument, and then a comma-separated list of rules to register for that type. See the rule registration for `DocumentSourceMatch` for an example: https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/match_rules.cpp#L227-L236 +Rules can be registered using the +[`REGISTER_RULES`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L49) +macro. It accepts a `DocumentSource` subclass as its first argument, and then a comma-separated list +of rules to register for that type. See the rule registration for `DocumentSourceMatch` for an +example: +https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/match_rules.cpp#L227-L236 -To gate rules behind a feature flag, use the [`REGISTER_RULES_WITH_FEATURE_FLAG`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L60) macro. It's similar to `REGISTER_RULES`, but accepts a feature flag as its second argument. +To gate rules behind a feature flag, use the +[`REGISTER_RULES_WITH_FEATURE_FLAG`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L60) +macro. It's similar to `REGISTER_RULES`, but accepts a feature flag as its second argument. -Another way to make a rule get called conditionally is to enqueue it from another rule's precondition or transform function by calling [`PipelineRewriteContext::addRule()`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L122). One caveat to be aware of is that if the rewrite engine is invoked to only run a certain group of rewrites, the dynamically-enqueued rule will only run if it also belongs to that group. See the rule [`PUSH_MATCH_BEFORE_CHANGE_STREAMS`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/match_rules.cpp#L113) for an example: https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/match_rules.cpp#L147 +Another way to make a rule get called conditionally is to enqueue it from another rule's +precondition or transform function by calling +[`PipelineRewriteContext::addRule()`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L122). +One caveat to be aware of is that if the rewrite engine is invoked to only run a certain group of +rewrites, the dynamically-enqueued rule will only run if it also belongs to that group. See the rule +[`PUSH_MATCH_BEFORE_CHANGE_STREAMS`](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/match_rules.cpp#L113) +for an example: +https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/match_rules.cpp#L147 ### Tags -Some of our current rewrites rely on the assumption that all inter-stage optimizations are applied to the whole pipeline before any in-place optimizations are attempted. If this assumption is broken, some rewrites may interfere with each other. Hence our current pipeline rewrites are divided into [two categories](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L94-L99): `Reordering` and `InPlace`. Both groups of rewrites are run against the pipeline separately. +Some of our current rewrites rely on the assumption that all inter-stage optimizations are applied +to the whole pipeline before any in-place optimizations are attempted. If this assumption is broken, +some rewrites may interfere with each other. Hence our current pipeline rewrites are divided into +[two categories](https://github.com/mongodb/mongo/blob/f072451dc0301232b4748410242565f53f8cb6cf/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L94-L99): +`Reordering` and `InPlace`. Both groups of rewrites are run against the pipeline separately. -If your rule may change any other stages except the current one, it should have the `Reordering` tag. Otherwise, it should have the `InPlace` tag. +If your rule may change any other stages except the current one, it should have the `Reordering` +tag. Otherwise, it should have the `InPlace` tag. --- diff --git a/src/mongo/db/pipeline/search/vectorSearch_technical_overview.md b/src/mongo/db/pipeline/search/vectorSearch_technical_overview.md index 30ce7d7ae05..9af5f0b84cf 100644 --- a/src/mongo/db/pipeline/search/vectorSearch_technical_overview.md +++ b/src/mongo/db/pipeline/search/vectorSearch_technical_overview.md @@ -2,13 +2,23 @@ ## Introduction -[Atlas Search](https://www.mongodb.com/docs/atlas/atlas-search/) provides integrated full-text search for running syntactic search queries. However, this does not cover semantic searches, e.g. searching "cat" in hopes of matching "kitten". The vector search feature allows users to run semantic search queries directly in the database. Users may define vector indexes that are maintained externally on a `mongot` process backed by [Apache Lucene](https://lucene.apache.org/). Queries made via the aggregation framework specify a query vector and path to search over, which `mongod` funnels through to `mongot`. +[Atlas Search](https://www.mongodb.com/docs/atlas/atlas-search/) provides integrated full-text +search for running syntactic search queries. However, this does not cover semantic searches, e.g. +searching "cat" in hopes of matching "kitten". The vector search feature allows users to run +semantic search queries directly in the database. Users may define vector indexes that are +maintained externally on a `mongot` process backed by [Apache Lucene](https://lucene.apache.org/). +Queries made via the aggregation framework specify a query vector and path to search over, which +`mongod` funnels through to `mongot`. This document covers the `mongod` side of the vector search implementation. ## Overview -Vector search is implemented as an aggregation stage that behaves similarly to [`$search`](/src/mongo/db/query/search/README.md). The `$vectorSearch` stage must be the first stage in the pipeline, always run on `mongod`. Users specify the query vector and path to search over as well as several `mongot`-specific knobs. `$vectorSearch` fetches results from `mongot` via a cursor-based protocol that parallels (and reuses code from) `$search`. +Vector search is implemented as an aggregation stage that behaves similarly to +[`$search`](/src/mongo/db/query/search/README.md). The `$vectorSearch` stage must be the first stage +in the pipeline, always run on `mongod`. Users specify the query vector and path to search over as +well as several `mongot`-specific knobs. `$vectorSearch` fetches results from `mongot` via a +cursor-based protocol that parallels (and reuses code from) `$search`. ## Details @@ -16,7 +26,8 @@ Vector search is implemented as an aggregation stage that behaves similarly to [ #### Parameters -[`$vectorSearch`](/src/mongo/db/pipeline/search/document_source_vector_search.h) takes several parameters that are passed on to `mongot`. These include: +[`$vectorSearch`](/src/mongo/db/pipeline/search/document_source_vector_search.h) takes several +parameters that are passed on to `mongot`. These include: | Parameter | Description | | ------------- | ----------------------------------------------------------- | @@ -27,42 +38,71 @@ Vector search is implemented as an aggregation stage that behaves similarly to [ | index | index to use for the search | | filter | optional pre-filter to apply before searching | -Validation for most of these fields occurs on `mongot`, with the exception of `filter`. `mongot` does not yet support complex MQL semantics, so the `filter` is limited to simple comparisons (e.g. `$eq`, `$lt`, `$gte`) on basic field types. This is validated on `mongod` with a custom `MatchExpressionVisitor`. +Validation for most of these fields occurs on `mongot`, with the exception of `filter`. `mongot` +does not yet support complex MQL semantics, so the `filter` is limited to simple comparisons (e.g. +`$eq`, `$lt`, `$gte`) on basic field types. This is validated on `mongod` with a custom +`MatchExpressionVisitor`. -Additionally, `limit` may be used by `mongod` to ensure correct results in sharded clusters (described below). All other parameters are passed through to `mongot` for algorithm-specific behavior. +Additionally, `limit` may be used by `mongod` to ensure correct results in sharded clusters +(described below). All other parameters are passed through to `mongot` for algorithm-specific +behavior. #### getMore -The `$vectorSearch` stage supports sending `getMore` requests to `mongot` when a batch is exhausted, but this is not expected to happen often. Because `mongot` receives the maximum number of documents requested via the initial `limit` parameter, it is able to generate all results on the first call. The only situation in which a `getMore` is required is when the result set is large enough to exceed the 16MB response size limit. +The `$vectorSearch` stage supports sending `getMore` requests to `mongot` when a batch is exhausted, +but this is not expected to happen often. Because `mongot` receives the maximum number of documents +requested via the initial `limit` parameter, it is able to generate all results on the first call. +The only situation in which a `getMore` is required is when the result set is large enough to exceed +the 16MB response size limit. ### idLookup -An `$_internalSearchIdLookup` stage is [inserted into the pipeline](https://github.com/mongodb/mongo/blob/636d0c1ce26d905cc508a73ada598950e16860b5/src/mongo/db/pipeline/search/document_source_vector_search.cpp#L204) directly after the `$vectorSearch` stage (always on `mongod`) so that full documents can be returned to the user, as vector indexes do not support any kind of stored source functionality. +An `$_internalSearchIdLookup` stage is +[inserted into the pipeline](https://github.com/mongodb/mongo/blob/636d0c1ce26d905cc508a73ada598950e16860b5/src/mongo/db/pipeline/search/document_source_vector_search.cpp#L204) +directly after the `$vectorSearch` stage (always on `mongod`) so that full documents can be returned +to the user, as vector indexes do not support any kind of stored source functionality. -Note that there are no mitigations in place to handle idLookup reducing the size of the result set when it filters out orphans. The `limit` parameter passed to `$vectorSearch` is understood to be a maximum, so we may generate that number of results and then subsequently drop orphans, ending with fewer than `limit` documents. This differs from `$search`, where we would request more documents from `mongot` to make up for the orphans. +Note that there are no mitigations in place to handle idLookup reducing the size of the result set +when it filters out orphans. The `limit` parameter passed to `$vectorSearch` is understood to be a +maximum, so we may generate that number of results and then subsequently drop orphans, ending with +fewer than `limit` documents. This differs from `$search`, where we would request more documents +from `mongot` to make up for the orphans. ### Metadata -Results are returned in descending score order from `mongot`. A metadata field with this value, `$vectorSearchScore`, is allowed to be projected by the user. +Results are returned in descending score order from `mongot`. A metadata field with this value, +`$vectorSearchScore`, is allowed to be projected by the user. ### Sharding -In a sharded environment, results are merged and [sorted in descending order](https://github.com/mongodb/mongo/blob/636d0c1ce26d905cc508a73ada598950e16860b5/src/mongo/db/pipeline/search/document_source_vector_search.h#L62) on the `$vectorSearchScore` metadata field. Additionally, the `limit` parameter specified in `$vectorSearch` is applied after merging by inserting an additional `$limit` stage into the merging pipeline. +In a sharded environment, results are merged and +[sorted in descending order](https://github.com/mongodb/mongo/blob/636d0c1ce26d905cc508a73ada598950e16860b5/src/mongo/db/pipeline/search/document_source_vector_search.h#L62) +on the `$vectorSearchScore` metadata field. Additionally, the `limit` parameter specified in +`$vectorSearch` is applied after merging by inserting an additional `$limit` stage into the merging +pipeline. -If a user-specified `$limit` exists in the pipeline following `$vectorSearch` that is smaller than the `$vectorSearch` limit value, this is pushed down to the shards as well, although it is not sent to `mongot`. +If a user-specified `$limit` exists in the pipeline following `$vectorSearch` that is smaller than +the `$vectorSearch` limit value, this is pushed down to the shards as well, although it is not sent +to `mongot`. ### Index Management -Vector indexes are managed through the existing search index management commands, due to the fact that they are stored in the same way as search indexes on `mongot`. +Vector indexes are managed through the existing search index management commands, due to the fact +that they are stored in the same way as search indexes on `mongot`. ### Explain -'$vectorSearch' explains follow how $search/$searchMeta explains work. Check out [search_technical_overview.md](/src/mongo/db/query/search/search_technical_overview.md) for more information. +'$vectorSearch' explains follow how $search/$searchMeta explains work. Check out +[search_technical_overview.md](/src/mongo/db/query/search/search_technical_overview.md) for more +information. ### Testing -The `vectorSearch` command is supported by [`mongotmock`](https://github.com/mongodb/mongo/blob/636d0c1ce26d905cc508a73ada598950e16860b5/src/mongo/db/query/search/mongotmock/mongotmock_commands.cpp#L194) for testing. +The `vectorSearch` command is supported by +[`mongotmock`](https://github.com/mongodb/mongo/blob/636d0c1ce26d905cc508a73ada598950e16860b5/src/mongo/db/query/search/mongotmock/mongotmock_commands.cpp#L194) +for testing. ### Didn't Find What You're Looking For? -Visit [the landing page](/src/mongo/db/query/search/README.md) for all $search/$vectorSearch/$searchMeta related documentation for server contributors. +Visit [the landing page](/src/mongo/db/query/search/README.md) for all +$search/$vectorSearch/$searchMeta related documentation for server contributors. diff --git a/src/mongo/db/process_health/README.md b/src/mongo/db/process_health/README.md index 177a2e4e43a..836ce972ec7 100644 --- a/src/mongo/db/process_health/README.md +++ b/src/mongo/db/process_health/README.md @@ -6,15 +6,19 @@ _Note:_ in 4.4 release only the mongos proxy server is supported ## Health Observers -_Health Observers_ are designed for every particular check to run. Each observer can be configured to be on/off and critical or not to be able to crash the serer on error. Each observer has a configurable interval of how often it will run the checks. +_Health Observers_ are designed for every particular check to run. Each observer can be configured +to be on/off and critical or not to be able to crash the serer on error. Each observer has a +configurable interval of how often it will run the checks. ## Health Observers Parameters -- healthMonitoringIntensities: main configuration for each observer. Can be set at startup and changed at runtime. Valid values: +- healthMonitoringIntensities: main configuration for each observer. Can be set at startup and + changed at runtime. Valid values: - off: this observer if off - critical: if the observer detects a failure, the process will crash - - non-critical: if the observer detects a failure, the error will be logged and the process will not crash + - non-critical: if the observer detects a failure, the error will be logged and the process will + not crash Example as startup parameter: @@ -48,15 +52,22 @@ _Health Observers_ are designed for every particular check to run. Each observer ## LDAP Health Observer -LDAP Health Observer checks all configured LDAP servers that at least one of them is up and running. At every run, it creates new connection to every configured LDAP server and runs a simple query. The LDAP health observer is using the same parameters as described in the **LDAP Authorization** section of the manual. +LDAP Health Observer checks all configured LDAP servers that at least one of them is up and running. +At every run, it creates new connection to every configured LDAP server and runs a simple query. The +LDAP health observer is using the same parameters as described in the **LDAP Authorization** section +of the manual. -To enable this observer, use the _healthMonitoringIntensities_ and _healthMonitoringIntervals_ parameters as described above. The recommended value for the LDAP monitoring interval is 30 seconds. +To enable this observer, use the _healthMonitoringIntensities_ and _healthMonitoringIntervals_ +parameters as described above. The recommended value for the LDAP monitoring interval is 30 seconds. ## Active Fault -When a failure is detected, and the observer is configured as _critical_, the server will wait for the configured interval before crashing. The interval from the failure detection and crash is configured with _activeFaultDurationSecs_ parameter: +When a failure is detected, and the observer is configured as _critical_, the server will wait for +the configured interval before crashing. The interval from the failure detection and crash is +configured with _activeFaultDurationSecs_ parameter: -- activeFaultDurationSecs: how long to wait from the failure detection to crash, in seconds. This can be configured at startup and changed at runtime. +- activeFaultDurationSecs: how long to wait from the failure detection to crash, in seconds. This + can be configured at startup and changed at runtime. Example: @@ -66,12 +77,15 @@ When a failure is detected, and the observer is configured as _critical_, the se ## Progress Monitor -_Progress Monitor_ detects that every health check is not stuck, without returning either success or failure. If a health check starts and does not complete the server will crash. This behavior could be configured with: +_Progress Monitor_ detects that every health check is not stuck, without returning either success or +failure. If a health check starts and does not complete the server will crash. This behavior could +be configured with: - progressMonitor: configure the progress monitor. Values: - _interval_: how often to run the liveness check, in milliseconds - - _deadline_: timeout before crashing the server if a health check is not making progress, in seconds + - _deadline_: timeout before crashing the server if a health check is not making progress, in + seconds Example: diff --git a/src/mongo/db/query/README.md b/src/mongo/db/query/README.md index fe3723d04f2..03f8635bde2 100644 --- a/src/mongo/db/query/README.md +++ b/src/mongo/db/query/README.md @@ -2,11 +2,16 @@ ## Overview -The query system is responsible for interpreting the user's request, finding an optimal way to satisfy it, and computing the final results. It is primarily exposed through the `find` and `aggregate` commands, but also used in associated read commands such as `count`, `distinct`, and `mapReduce` and write commands such as `update`, `delete`, and `findAndModify`. +The query system is responsible for interpreting the user's request, finding an optimal way to +satisfy it, and computing the final results. It is primarily exposed through the `find` and +`aggregate` commands, but also used in associated read commands such as `count`, `distinct`, and +`mapReduce` and write commands such as `update`, `delete`, and `findAndModify`. ## [Query Optimization Architecture Guide](README_QO.md) -The [QO Architecture Guide](README_QO.md) provides an overview of the query system components maintained by the QO team, including parsing, heuristic rewrites, query planning, plan caching, and query testing infrastructure. +The [QO Architecture Guide](README_QO.md) provides an overview of the query system components +maintained by the QO team, including parsing, heuristic rewrites, query planning, plan caching, and +query testing infrastructure. ## Additional Query Features @@ -25,24 +30,35 @@ The [QO Architecture Guide](README_QO.md) provides an overview of the query syst - **BSON**: Binary-encoded serialization of JSON-like documents. - A data format developed by MongoDB for data representation in its core. - **`CanonicalQuery`**: A standardized form for queries, in BSON. - - It works as a container for the parsed query, projection, and sort portions of the original query message. The filter portion is parsed into a **`MatchExpression`**. + - It works as a container for the parsed query, projection, and sort portions of the original + query message. The filter portion is parsed into a **`MatchExpression`**. - **`DocumentSource`**: Represents one stage in an **aggregation** **pipeline** - Not necessarily one-to-one with the stages in the user-defined pipeline. -- **`ExpressionContext`**: An object that stores state that may be useful to access throughout the lifespan of a query, but is probably not relevant to any other operations. This includes the collation, a time zone database, various random booleans and state, etc. +- **`ExpressionContext`**: An object that stores state that may be useful to access throughout the + lifespan of a query, but is probably not relevant to any other operations. This includes the + collation, a time zone database, various random booleans and state, etc. - **Find**: The subsystem that runs **find** stages and **pushed-down** **aggregate** stages. - **IDL**: Interface Definition Language. YAML-formatted files to generate C++ code. -- **`LiteParsedPipeline`**: A very simple model of an **aggregate** **pipeline**, constructed through a semi-parse that proceeds just enough to tease apart the stages that are involved. - - It has neither verified that the input is well-formed, nor parsed the expressions or detailed arguments to the stages. It can be used for requests that we want to inspect before proceeding and building a full model of the user's query or request. +- **`LiteParsedPipeline`**: A very simple model of an **aggregate** **pipeline**, constructed + through a semi-parse that proceeds just enough to tease apart the stages that are involved. + - It has neither verified that the input is well-formed, nor parsed the expressions or detailed + arguments to the stages. It can be used for requests that we want to inspect before proceeding + and building a full model of the user's query or request. - **`MatchExpression`**: The parsed Abstract Syntax Tree (AST) from the filter portion of the query. - **MQL**: MongoDB Query Language. -- **Plan Cache**: Stores previously generated query plans to allow for faster retrieval and execution of recurring queries by avoiding the need to generate and score possible query plans from scratch. -- **`PlanExecutor`**: An abstract type that executes a **`QuerySolution`** plan by cranking its tree of stages into execution. **`PlanExecutor`** has three primary subclasses: +- **Plan Cache**: Stores previously generated query plans to allow for faster retrieval and + execution of recurring queries by avoiding the need to generate and score possible query plans + from scratch. +- **`PlanExecutor`**: An abstract type that executes a **`QuerySolution`** plan by cranking its tree + of stages into execution. **`PlanExecutor`** has three primary subclasses: 1. `PlanExecutorImpl`: Executes **find** stages 1. `PlanExecutorPipeline`: Executes **aggregation** stages. 1. `PlanExecutorSBE`: Executes SBE plans. - **Pipeline**: A list of **`DocumentSource`s** which handles a part of the optimization. - **Pushdown**: Convert an **aggregate** stage in the **pipeline** to a **find** stage. -- **`QuerySolution`**: A tree structure of `QuerySolutionNode`s that represents one possible execution plan for a query. +- **`QuerySolution`**: A tree structure of `QuerySolutionNode`s that represents one possible + execution plan for a query. - Various operation nodes inherit from `QuerySolutionNode` - For example: `CollectionScanNode`, `FetchNode`, `IndexScanNode`, `OrNode`, etc. - - Generally speaking, one winning **`QuerySolution`** is the output of the QO system and input of the QE system. + - Generally speaking, one winning **`QuerySolution`** is the output of the QO system and input of + the QE system. diff --git a/src/mongo/db/query/README_QO.md b/src/mongo/db/query/README_QO.md index 1887fc6b561..4b05c3af378 100644 --- a/src/mongo/db/query/README_QO.md +++ b/src/mongo/db/query/README_QO.md @@ -1,6 +1,8 @@ # Query Optimization Architecture Guide -This page provides an overview of the source code architecture for MongoDB's Query Optimization system. It is designed for engineers working on the core server, with introductory sections offering low-level details particularly useful for new members of the QO team. +This page provides an overview of the source code architecture for MongoDB's Query Optimization +system. It is designed for engineers working on the core server, with introductory sections offering +low-level details particularly useful for new members of the QO team. ## Table of Contents @@ -22,7 +24,8 @@ This page provides an overview of the source code architecture for MongoDB's Que - [QueryTester](query_tester/README.md) - [Fuzzers](https://github.com/10gen/jstestfuzz/blob/master/HitchhikersGuide.md) - [Locust perf tests](https://github.com/10gen/dsi/tree/master/workloads/query-optimization) - - [Generic QO benchmark](https://github.com/10gen/dsi/blob/master/workloads/query-optimization/generic/README.md), a.k.a. Synthetic Benchmarks. + - [Generic QO benchmark](https://github.com/10gen/dsi/blob/master/workloads/query-optimization/generic/README.md), + a.k.a. Synthetic Benchmarks. ## High-Level Diagram diff --git a/src/mongo/db/query/README_explain.md b/src/mongo/db/query/README_explain.md index e027750ee5a..b0362caac62 100644 --- a/src/mongo/db/query/README_explain.md +++ b/src/mongo/db/query/README_explain.md @@ -2,18 +2,32 @@ ## Overview -The `explain` command provides observability into query plans and their execution stats, helping users analyze a query's performance and index usage. Explain queries don't return any documents for reads or modify the underlying collection for writes. Some modes will only perform query planning, while others will also run the query to gather execution stats. +The `explain` command provides observability into query plans and their execution stats, helping +users analyze a query's performance and index usage. Explain queries don't return any documents for +reads or modify the underlying collection for writes. Some modes will only perform query planning, +while others will also run the query to gather execution stats. -Moreover, `explain` ignores the [plan cache](plan_cache/README.md) during query planning, always generating a set of candidate plans and choosing a winner without consulting the plan cache. The query planner will also avoid caching the winning plan. +Moreover, `explain` ignores the [plan cache](plan_cache/README.md) during query planning, always +generating a set of candidate plans and choosing a winner without consulting the plan cache. The +query planner will also avoid caching the winning plan. This document will cover the following aspects of `explain`: -1. [**Explain Usage**](#explain-usage) to provide insight into how a query can be dissected for its query planning, index usage, and execution stats. This is especially helpful when investigating a performance regression or understanding the plan space given a dataset, set of indexes, and query. -1. [**Explain Implementation**](#explain-implementation) for a deep dive into how `explain` is implemented in the codebase. This section traces an `explain` command through parsing, query planning, and query execution. It also discusses how `explain` is implemented in a sharded cluster, as well as different plan explainers for `explain`. +1. [**Explain Usage**](#explain-usage) to provide insight into how a query can be dissected for its + query planning, index usage, and execution stats. This is especially helpful when investigating a + performance regression or understanding the plan space given a dataset, set of indexes, and + query. +1. [**Explain Implementation**](#explain-implementation) for a deep dive into how `explain` is + implemented in the codebase. This section traces an `explain` command through parsing, query + planning, and query execution. It also discusses how `explain` is implemented in a sharded + cluster, as well as different plan explainers for `explain`. ## Explain Usage -You may want to run the `explain` command for a query to gather information on the amount of time a query took to complete, whether the query used an index, the number of documents and index keys scanned to fulfill a query, etc. This section provides a guide that will walk you through the following: +You may want to run the `explain` command for a query to gather information on the amount of time a +query took to complete, whether the query used an index, the number of documents and index keys +scanned to fulfill a query, etc. This section provides a guide that will walk you through the +following: 1. How to [issue an `explain` command](#syntax) 1. Choose a [verbosity mode](#verbosity-modes) @@ -21,10 +35,12 @@ You may want to run the `explain` command for a query to gather information on t ### Syntax -To use `explain`, you must have permission to run the underlying command. You can run `explain` using: +To use `explain`, you must have permission to run the underlying command. You can run `explain` +using: - `db.runCommand()`, wrapping the command to be explained - - This is supported for most explainable commands (`aggregate`, `count`, `delete`, `distinct`, `find`, `findAndModify`, `mapReduce`, `update`). + - This is supported for most explainable commands (`aggregate`, `count`, `delete`, `distinct`, + `find`, `findAndModify`, `mapReduce`, `update`). - The default [verbosity mode](#verbosity-modes) is "allPlansExecution": ``` @@ -38,7 +54,8 @@ db.runCommand( ``` - `db.collection.explain().` - - This is supported for `aggregate`, `count`, `distinct`, `find`, `findAndModify`, `mapReduce`, and `remove`. + - This is supported for `aggregate`, `count`, `distinct`, `find`, `findAndModify`, `mapReduce`, + and `remove`. - The default [verbosity mode](#verbosity-modes) is "queryPlanner". ``` @@ -46,8 +63,10 @@ db.collection.explain().aggregate() ``` - `cursor.explain()` for the `db.collection.find()` method - - This is similar to the above (`db.collection.explain().find()`), but that is more expressive and allows for additional chaining of query modifiers. - - Meanwhile, this returns a cursor and may require a call to `.next()` to return full `explain` results: + - This is similar to the above (`db.collection.explain().find()`), but that is more expressive and + allows for additional chaining of query modifiers. + - Meanwhile, this returns a cursor and may require a call to `.next()` to return full `explain` + results: ``` db.collection.find().explain() @@ -64,11 +83,17 @@ The first two syntaxes are recommended as they support the most commands. ### Verbosity Modes -The **verbosity mode** describes the amount of information returned by the `explain` command. These are the available modes in ascending order: +The **verbosity mode** describes the amount of information returned by the `explain` command. These +are the available modes in ascending order: -1. **`queryPlanner`**: The query optimizer is run to choose the winning plan for the operation under evaluation. -1. **`executionStats`**: The query optimizer chooses the winning plan and executes it to completion. This mode returns its execution statistics. -1. **`allPlansExecution`**: The query optimizer chooses the winning plan and executes it to completion. This mode returns statistics about the execution of the winning plan as well as of the other candidate plans captured during [plan selection](../exec/runtime_planners/classic_runtime_planner/README.md). +1. **`queryPlanner`**: The query optimizer is run to choose the winning plan for the operation under + evaluation. +1. **`executionStats`**: The query optimizer chooses the winning plan and executes it to completion. + This mode returns its execution statistics. +1. **`allPlansExecution`**: The query optimizer chooses the winning plan and executes it to + completion. This mode returns statistics about the execution of the winning plan as well as of + the other candidate plans captured during + [plan selection](../exec/runtime_planners/classic_runtime_planner/README.md). ### Analyze Explain Output @@ -110,7 +135,9 @@ db.coll.explain().find({a: 1, b: 1}) - **explainVersion**: "1" is used for the classic engine, whereas "2" is used for SBE. > ### Aside: Execution Engine > - > Operations may use the classic execution engine or the [slot-based execution engine](https://www.mongodb.com/docs/manual/reference/sbe/#std-label-sbe-landing) (SBE). The `explain` output structure may differ in this case: + > Operations may use the classic execution engine or the + > [slot-based execution engine](https://www.mongodb.com/docs/manual/reference/sbe/#std-label-sbe-landing) + > (SBE). The `explain` output structure may differ in this case: > > **Classic Execution Engine**: > @@ -150,9 +177,13 @@ db.coll.explain().find({a: 1, b: 1}) > } > }, > ``` -- **queryShapeHash**: a hex string that represents the hash of the [query shape](https://www.mongodb.com/docs/manual/core/query-shapes/#std-label-query-shapes). -- **serverInfo**: For unsharded collections, the info is returned for the `mongod` instance. For sharded collections, the info is returned for each accessed shard; there is additionally a top-level `serverInfo` object for the `mongos`. -- **serverParameters**: details about several internal [query parameters](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/query_knobs.idl). +- **queryShapeHash**: a hex string that represents the hash of the + [query shape](https://www.mongodb.com/docs/manual/core/query-shapes/#std-label-query-shapes). +- **serverInfo**: For unsharded collections, the info is returned for the `mongod` instance. For + sharded collections, the info is returned for each accessed shard; there is additionally a + top-level `serverInfo` object for the `mongos`. +- **serverParameters**: details about several internal + [query parameters](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/query_knobs.idl). #### `queryPlanner` Mode @@ -231,7 +262,11 @@ db.coll.explain().find({a: 1, b: 1}) } ``` -This returns information in the `queryPlanner` section about the winning plan and rejected plans. It describes their data access patterns as well as details about the indexes under consideration. There is metadata about overall optimizer stats, such as the total time spent in query optimization (heuristic rewrites, plan enumeration, and plan selection) (`optimizationTimeMillis`), whether we hit the limit for an optimizer knob, and whether a particular plan was cached. +This returns information in the `queryPlanner` section about the winning plan and rejected plans. It +describes their data access patterns as well as details about the indexes under consideration. There +is metadata about overall optimizer stats, such as the total time spent in query optimization +(heuristic rewrites, plan enumeration, and plan selection) (`optimizationTimeMillis`), whether we +hit the limit for an optimizer knob, and whether a particular plan was cached. #### `executionStats` Mode @@ -291,9 +326,14 @@ This returns information in the `queryPlanner` section about the winning plan an } ``` -This contains all the information in the `queryPlanner` mode, with additional `executionStats` information from executing the `winningPlan`, such as the number of documents returned and the number of documents and index keys examined. The `executionTimeMillis` field describes the total amount of time required for both query optimization and query execution. It does not include the network time required to transmit the data back to the client. +This contains all the information in the `queryPlanner` mode, with additional `executionStats` +information from executing the `winningPlan`, such as the number of documents returned and the +number of documents and index keys examined. The `executionTimeMillis` field describes the total +amount of time required for both query optimization and query execution. It does not include the +network time required to transmit the data back to the client. -For write operations, query execution refers to the modifications that **would** be performed, but does **not** make those modifications to the database. +For write operations, query execution refers to the modifications that **would** be performed, but +does **not** make those modifications to the database. #### `allPlansExecution` Mode @@ -456,13 +496,28 @@ For write operations, query execution refers to the modifications that **would** } ``` -This contains all the information in the `executionStats` mode, with additional execution information for the rejected plans such as the estimated execution time (`executionTimeMillisEstimate`) and the candidate plan's `score`, which is calculated based on how [productive](../exec/runtime_planners/classic_runtime_planner/README.md#plan-ranking) it was during the trial period. +This contains all the information in the `executionStats` mode, with additional execution +information for the rejected plans such as the estimated execution time +(`executionTimeMillisEstimate`) and the candidate plan's `score`, which is calculated based on how +[productive](../exec/runtime_planners/classic_runtime_planner/README.md#plan-ranking) it was during +the trial period. -From the output above, we can see that the plans using the `{a: 1}` and `{a: 1, b: 1}` indexes resulted in higher-scoring plans than the `{b: 1}` index, which is expected because the predicate on `a` is far more selective than the predicate on `b` (100 unique values for the former vs 2 buckets of values for the latter). The better plans also have the `isEOF` flag set, meaning they ran to completion and returned all results in the trial period, whereas the worst plan did not run to completion and has a value of 0 in that field. +From the output above, we can see that the plans using the `{a: 1}` and `{a: 1, b: 1}` indexes +resulted in higher-scoring plans than the `{b: 1}` index, which is expected because the predicate on +`a` is far more selective than the predicate on `b` (100 unique values for the former vs 2 buckets +of values for the latter). The better plans also have the `isEOF` flag set, meaning they ran to +completion and returned all results in the trial period, whereas the worst plan did not run to +completion and has a value of 0 in that field. -The multiplanner has additional [tie-breaking](../exec/runtime_planners/classic_runtime_planner/README.md#tie-breakers) heuristics when two plans score equally well. While both the plan using the index on `{a: 1, b: 1}` and the plan using the index on `{a: 1}` only needed to examine one document and scan one index key, the latter plan required a residual filter predicate on `{ b: { '$eq': 1 } }` in its `FETCH` stage. Thus, the plan using the `{a: 1, b: 1}` index is the most optimal. +The multiplanner has additional +[tie-breaking](../exec/runtime_planners/classic_runtime_planner/README.md#tie-breakers) heuristics +when two plans score equally well. While both the plan using the index on `{a: 1, b: 1}` and the +plan using the index on `{a: 1}` only needed to examine one document and scan one index key, the +latter plan required a residual filter predicate on `{ b: { '$eq': 1 } }` in its `FETCH` stage. +Thus, the plan using the `{a: 1, b: 1}` index is the most optimal. -For a full list of fields and their definitions in the `explain` output, refer to the [docs](https://www.mongodb.com/docs/manual/reference/explain-results/). +For a full list of fields and their definitions in the `explain` output, refer to the +[docs](https://www.mongodb.com/docs/manual/reference/explain-results/). ## Explain Implementation @@ -472,79 +527,178 @@ For a full list of fields and their definitions in the `explain` output, refer t ### Tracing an `explain` command -Just like a [non-explain command](../commands/query_cmd/README.md), an `explain` command is parsed, canonicalized when possible, normalized, and sent to the query planner for plan enumeration and plan selection. +Just like a [non-explain command](../commands/query_cmd/README.md), an `explain` command is parsed, +canonicalized when possible, normalized, and sent to the query planner for plan enumeration and plan +selection. -The entrypoint to `explain` is [`CmdExplain::parse()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/explain_cmd.cpp#L212), which takes in an [`OpMsgRequest`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/rpc/op_msg.h#L194). +The entrypoint to `explain` is +[`CmdExplain::parse()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/explain_cmd.cpp#L212), +which takes in an +[`OpMsgRequest`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/rpc/op_msg.h#L194). -- The inner explained command is extracted from the outer command. The explained command's override of the [`parseForExplain()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands.h#L465) function returns a [`CommandInvocation`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands.h#L793). +- The inner explained command is extracted from the outer command. The explained command's override + of the + [`parseForExplain()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands.h#L465) + function returns a + [`CommandInvocation`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands.h#L793). -The [`CmdExplain::Invocation::run()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/explain_cmd.cpp#L145) function executes the parsed `explain` command. The inner invocation, or explained command, calls its override of the [`explain()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/explain_cmd.cpp#L145) function. Different commands have their own implementations for this function, but they generally share the following steps: +The +[`CmdExplain::Invocation::run()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/explain_cmd.cpp#L145) +function executes the parsed `explain` command. The inner invocation, or explained command, calls +its override of the +[`explain()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/explain_cmd.cpp#L145) +function. Different commands have their own implementations for this function, but they generally +share the following steps: -1. Begin the [query planning timer](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/find_cmd.cpp#L432) after the command has been parsed. +1. Begin the + [query planning timer](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/find_cmd.cpp#L432) + after the command has been parsed. 1. Acquire any locks that are required. 1. Construct a [`CanonicalQuery`](README_logical_models.md#canonicalquery) from the parsed request. -1. Get a plan executor for the query. This phase covers [plan enumeration](plan_enumerator/README.md) and [plan selection](../exec/runtime_planners/classic_runtime_planner/README.md). -1. Once the executor is returned, it [stops the query planning timer](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/get_executor.cpp#L1307). +1. Get a plan executor for the query. This phase covers + [plan enumeration](plan_enumerator/README.md) and + [plan selection](../exec/runtime_planners/classic_runtime_planner/README.md). +1. Once the executor is returned, it + [stops the query planning timer](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/get_executor.cpp#L1307). 1. Get a [`PlanExplainer`](#plan-explainers) with query planning stats from the `PlanExecutor`. -1. Given the [`PlanExecutor`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_executor.h#L168), if the verbosity level is `executionStats` or higher, the [`explainStages()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L471) function first calls [`executePlan()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L326), executing the `PlanExecutor` and discarding the resulting documents until it reaches `EOF`. It will log an error and throw an exception if the query doesn't run to completion successfully. This function calls [`getNextBatch()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_executor.cpp#L88) under the hood so that the `PlanExecutor` is executed in a tighter loop. -1. Generate human-readable `explain` `BSON` from the `PlanStats` tree. Any operation with a query component can be explained using this function. This will call helper functions to generate [query planner info](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L88), [execution stats info](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L261) for the winning plan, or [all plans execution info](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L261) depending on the verbosity. +1. Given the + [`PlanExecutor`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_executor.h#L168), + if the verbosity level is `executionStats` or higher, the + [`explainStages()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L471) + function first calls + [`executePlan()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L326), + executing the `PlanExecutor` and discarding the resulting documents until it reaches `EOF`. It + will log an error and throw an exception if the query doesn't run to completion successfully. + This function calls + [`getNextBatch()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_executor.cpp#L88) + under the hood so that the `PlanExecutor` is executed in a tighter loop. +1. Generate human-readable `explain` `BSON` from the `PlanStats` tree. Any operation with a query + component can be explained using this function. This will call helper functions to generate + [query planner info](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L88), + [execution stats info](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L261) + for the winning plan, or + [all plans execution info](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L261) + depending on the verbosity. > ### Aside: `Pipeline` Explain > -> Note that `explain` is implemented slightly differently for aggregate commands. The [`_runAggregate`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/run_aggregate.cpp#L774) function calls [`executeExplain()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/run_aggregate.cpp#L728) if we're in an `explain` command. If a `PlanExecutorPipeline` is provided, the [`explainPipeline()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L390) function is invoked, which turns `explain` `BSON` for document sources into a human-readable format. The underlying logic is similar despite the divergence in code paths. +> Note that `explain` is implemented slightly differently for aggregate commands. The +> [`_runAggregate`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/run_aggregate.cpp#L774) +> function calls +> [`executeExplain()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/commands/query_cmd/run_aggregate.cpp#L728) +> if we're in an `explain` command. If a `PlanExecutorPipeline` is provided, the +> [`explainPipeline()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/explain.cpp#L390) +> function is invoked, which turns `explain` `BSON` for document sources into a human-readable +> format. The underlying logic is similar despite the divergence in code paths. > ### Aside: Lock Acquisition > -> The [`LockPolicy`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_executor.h#L187) describes whether callers should acquire locks when using a `PlanExecutor`. `find` executors using the legacy `PlanStage` engine require the caller to lock the collection `MODE_IS`. On the other hand, aggregate and `SBE` executors may access multiple collections and acquire their own locks on any involved collections when producing query results. In this case, callers don't need to explicitly acquire any locks ahead of time. +> The +> [`LockPolicy`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_executor.h#L187) +> describes whether callers should acquire locks when using a `PlanExecutor`. `find` executors using +> the legacy `PlanStage` engine require the caller to lock the collection `MODE_IS`. On the other +> hand, aggregate and `SBE` executors may access multiple collections and acquire their own locks on +> any involved collections when producing query results. In this case, callers don't need to +> explicitly acquire any locks ahead of time. > > These are the possible lock policies: > -> - `kLockExternally`: The caller is responsible for locking the collection over which this `PlanExecutor` executes. -> - `kLockInternally`: The caller need not hold any locks, the `PlanExecutor` will acquire all the required locks itself. +> - `kLockExternally`: The caller is responsible for locking the collection over which this +> `PlanExecutor` executes. +> - `kLockInternally`: The caller need not hold any locks, the `PlanExecutor` will acquire all the +> required locks itself. > -> Although an `explain` command never modifies the underlying collection, it must still acquire the same locks that are acquired by the explained command (if it were run normally) as it will at least partially execute the candidate plans during the trial period. Explains on write commands will hold write locks even though no writes are actually performed so that timing info is more accurate. +> Although an `explain` command never modifies the underlying collection, it must still acquire the +> same locks that are acquired by the explained command (if it were run normally) as it will at +> least partially execute the candidate plans during the trial period. Explains on write commands +> will hold write locks even though no writes are actually performed so that timing info is more +> accurate. ### Plan Explainers -The [`PlanExplainer`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_explainer.h#L46) interface defines an API to provide information on the execution plans generated by the query planner in various formats. At a high-level, it provides information such as: +The +[`PlanExplainer`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_explainer.h#L46) +interface defines an API to provide information on the execution plans generated by the query +planner in various formats. At a high-level, it provides information such as: - Whether the multiplanner was used - What query knobs were hit during plan enumeration - What version of `explain` was used - Summary statistics for the following: - - The secondary collection, if it was referenced in a stage like `$lookup`. (Note that `explain` doesn't include any information about subpipelines executed on the `from` collection in `$lookup`.) + - The secondary collection, if it was referenced in a stage like `$lookup`. (Note that `explain` + doesn't include any information about subpipelines executed on the `from` collection in + `$lookup`.) - The winning plan - The rejected plans -It contains a [`PlanStatsDetails`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_explainer.h#L62) structure containing the plan selected by the query planner, as well as an optional [`PlanSummaryStats`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_summary_stats.h#L58) container if the verbosity is `executionStats` or higher. This tracks information that will be in the `explain` output, as well as other stats that the profiler, slow query log, and other non-explain output may want to collect such as: +It contains a +[`PlanStatsDetails`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_explainer.h#L62) +structure containing the plan selected by the query planner, as well as an optional +[`PlanSummaryStats`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_summary_stats.h#L58) +container if the verbosity is `executionStats` or higher. This tracks information that will be in +the `explain` output, as well as other stats that the profiler, slow query log, and other +non-explain output may want to collect such as: - `nReturned` - the number of results returned by the plan - `totalKeysExamined` - the number of index keys examined by the plan - `hasSortStage` - whether or not the plan used an in-memory sort stage - `replanReason` - If replanning was triggered, what caused it? -The [`accumulate()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_summary_stats_visitor.h#L147) function walks through a [`PlanStageStats`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/exec/sbe/stages/plan_stats.cpp#L52) tree to compute aggregate stats from the stats each stage has gathered during its execution. +The +[`accumulate()`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/query/plan_summary_stats_visitor.h#L147) +function walks through a +[`PlanStageStats`](https://github.com/mongodb/mongo/blob/28df8e56046e44f5977671e85fef7bcd38ffbea1/src/mongo/db/exec/sbe/stages/plan_stats.cpp#L52) +tree to compute aggregate stats from the stats each stage has gathered during its execution. -Different executors will provide overrides of the functions in the `PlanExplainer` interface, and will provide its own stats in the `PlanSummaryStats` container: +Different executors will provide overrides of the functions in the `PlanExplainer` interface, and +will provide its own stats in the `PlanSummaryStats` container: -- **Classic executor**: Uses [`PlanExplainerImpl`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/query/plan_explainer_impl.h#L59). All the information required to generate `explain` output in various formats is stored in the execution tree. Starting from the root stage of the execution tree, plan summary stats are gathered by traversing through the rest of the tree. The `MultiPlanStage` is skipped, and stats are extracted from its children. Note that if [subplanning](../exec/runtime_planners/classic_runtime_planner_for_sbe/README.md#subplanner) was triggered, it doesn't include information about rejected plans. -- **Express executor**: Uses [`PlanExplainerExpress`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/query/plan_explainer_express.h#L193). Since we don't build a plan tree for express queries, this doesn't include stage information that's typically included in other `PlanExplainer`s, such as whether shard filtering was required and what index bounds were used. It will, however, include the chosen index. +- **Classic executor**: Uses + [`PlanExplainerImpl`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/query/plan_explainer_impl.h#L59). + All the information required to generate `explain` output in various formats is stored in the + execution tree. Starting from the root stage of the execution tree, plan summary stats are + gathered by traversing through the rest of the tree. The `MultiPlanStage` is skipped, and stats + are extracted from its children. Note that if + [subplanning](../exec/runtime_planners/classic_runtime_planner_for_sbe/README.md#subplanner) was + triggered, it doesn't include information about rejected plans. +- **Express executor**: Uses + [`PlanExplainerExpress`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/query/plan_explainer_express.h#L193). + Since we don't build a plan tree for express queries, this doesn't include stage information + that's typically included in other `PlanExplainer`s, such as whether shard filtering was required + and what index bounds were used. It will, however, include the chosen index. > ### Aside: Express Executor > -> An [`ExpressPlan`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/exec/express/express_plan.h#L1313) is a streamlined execution engine that supports a specific sequence of query stages: +> An +> [`ExpressPlan`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/exec/express/express_plan.h#L1313) +> is a streamlined execution engine that supports a specific sequence of query stages: > > - Document iterator --> optional shard filter --> optional write --> optional projection > -> Notably, it skips the plan enumeration and plan selection phases, as well as the intermediate materialization of a `PlanStage` tree. +> Notably, it skips the plan enumeration and plan selection phases, as well as the intermediate +> materialization of a `PlanStage` tree. > -> Queries are [eligible](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/query/query_utils.h#L118) for the express executor if they are: +> Queries are +> [eligible](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/query/query_utils.h#L118) +> for the express executor if they are: > -> - A point query that can be fulfilled via a single lookup on the `_id` index or with a direct lookup into a clustered collection +> - A point query that can be fulfilled via a single lookup on the `_id` index or with a direct +> lookup into a clustered collection > - An simple equality query that has a suitable index -- **SBE executor**: Uses [`PlanExplainerClassicRuntimePlannerForSBE`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/query/plan_explainer_sbe.h#L125), which extends [`PlanExplainerSBEBase`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/db/query/plan_explainer_sbe.h#L59). This will display the stringified SBE plan(s) (the `QuerySolutionNode`-derived format for the given plan). Since SBE also go through a trial period using the classic multiplanner, it holds a pointer to the `MultiPlanStage` that allows calling into `PlanExplainerImpl` to extract information about the trial period using the `PlanStage`-derived format. -- **Pipeline exector**: Uses [`PlanExplainerPipeline`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/db/pipeline/plan_explainer_pipeline.h#L46). This is used for aggregation pipelines and has special logic to iterate through the pipeline stages to accumulate general stats, as well as note down specific stats that certain stages provide. +- **SBE executor**: Uses + [`PlanExplainerClassicRuntimePlannerForSBE`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/query/plan_explainer_sbe.h#L125), + which extends + [`PlanExplainerSBEBase`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/db/query/plan_explainer_sbe.h#L59). + This will display the stringified SBE plan(s) (the `QuerySolutionNode`-derived format for the + given plan). Since SBE also go through a trial period using the classic multiplanner, it holds a + pointer to the `MultiPlanStage` that allows calling into `PlanExplainerImpl` to extract + information about the trial period using the `PlanStage`-derived format. +- **Pipeline exector**: Uses + [`PlanExplainerPipeline`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/db/pipeline/plan_explainer_pipeline.h#L46). + This is used for aggregation pipelines and has special logic to iterate through the pipeline + stages to accumulate general stats, as well as note down specific stats that certain stages + provide. ```mermaid graph TD @@ -616,24 +770,50 @@ graph TD ### Sharded Explain -When `explain` is run in a sharded cluster, it is parsed similarly to the non-sharded `explain`. However, there are some differences in what is measured by the timer and the execution output displayed. +When `explain` is run in a sharded cluster, it is parsed similarly to the non-sharded `explain`. +However, there are some differences in what is measured by the timer and the execution output +displayed. -The entrypoint to sharded `explain` is [`ClusterExplainCmd::parse()`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/s/commands/query_cmd/cluster_explain_cmd.cpp#L219), which is similar to `CmdExplain::parse()`. However, it calls [`makeExplainedObj()`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/s/commands/query_cmd/cluster_explain_cmd.cpp#L188) to synthesize a `BSONObj` for the explained command, copying [generic arguments](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/idl/generic_argument.idl#L49) from the outer `explain` command. +The entrypoint to sharded `explain` is +[`ClusterExplainCmd::parse()`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/s/commands/query_cmd/cluster_explain_cmd.cpp#L219), +which is similar to `CmdExplain::parse()`. However, it calls +[`makeExplainedObj()`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/s/commands/query_cmd/cluster_explain_cmd.cpp#L188) +to synthesize a `BSONObj` for the explained command, copying +[generic arguments](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/idl/generic_argument.idl#L49) +from the outer `explain` command. -The [`ClusterExplainCmd::run()`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/s/commands/query_cmd/cluster_explain_cmd.cpp#L141) function executes the parsed `explain` command. The inner invocation, or explained command, calls its override of the [`explain()`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/commands.h#L825) function. Different sharded commands have their own implementations for this function, but they generally share the following steps: +The +[`ClusterExplainCmd::run()`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/s/commands/query_cmd/cluster_explain_cmd.cpp#L141) +function executes the parsed `explain` command. The inner invocation, or explained command, calls +its override of the +[`explain()`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/db/commands.h#L825) +function. Different sharded commands have their own implementations for this function, but they +generally share the following steps: 1. Parse the command to a `CanonicalQuery`. 1. Begin the timer to measure how long it takes to run the command on the shards. -1. Call [`wrapAsExplain()`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/s/commands/query_cmd/cluster_explain.cpp#L130) to wrap the passed in inner command into an `explain` command request. This step will prune any generic arguments in the inner command, only keeping what was provided in the outer command. -1. Call [`scatterGatherVersionedTargetByRoutingTable`](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/s/cluster_commands_helpers.h#L300), which dispatches versioned commands to the shards, deciding which ones to target by applying the passed-in query and collation to the local routing table cache. -1. Finally, call [`buildExplainResult`](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/s/commands/query_cmd/cluster_explain.cpp#L432) to construct the sharded `explain` output format based on the responses from the shards in `shardResponses`. A sharded `explain` will only succeed if all shards support the `explain` command. This will also return mock `mongos` execution stages to describe how results are combined for read operations, or if a write would be performed: +1. Call + [`wrapAsExplain()`](https://github.com/mongodb/mongo/blob/11b6fc54aaeddbb6dd85d2a808827f8048f366a1/src/mongo/s/commands/query_cmd/cluster_explain.cpp#L130) + to wrap the passed in inner command into an `explain` command request. This step will prune any + generic arguments in the inner command, only keeping what was provided in the outer command. +1. Call + [`scatterGatherVersionedTargetByRoutingTable`](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/s/cluster_commands_helpers.h#L300), + which dispatches versioned commands to the shards, deciding which ones to target by applying the + passed-in query and collation to the local routing table cache. +1. Finally, call + [`buildExplainResult`](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/s/commands/query_cmd/cluster_explain.cpp#L432) + to construct the sharded `explain` output format based on the responses from the shards in + `shardResponses`. A sharded `explain` will only succeed if all shards support the `explain` + command. This will also return mock `mongos` execution stages to describe how results are + combined for read operations, or if a write would be performed: - `kSingleShard` - only a single shard is targeted and returns results. - `kMergeFromShards` - multiple shards were targeted, and the results must be merged. - `kMergeSortFromShards`- similar to the above, but the results must be merged in a sorted order. - `kWriteOnShards` - this command would perform a write on the shards. -Here is the sample output for sharded `explain`. It includes the core query planner and server information for each accessed shard in the `shards` field: +Here is the sample output for sharded `explain`. It includes the core query planner and server +information for each accessed shard in the `shards` field: ``` { diff --git a/src/mongo/db/query/README_logical_models.md b/src/mongo/db/query/README_logical_models.md index 6f78e9b11eb..0e42a7e755b 100644 --- a/src/mongo/db/query/README_logical_models.md +++ b/src/mongo/db/query/README_logical_models.md @@ -2,37 +2,55 @@ ## Overview -A **logical model** is a representation of how data is structured and how its components relate to other data. In our case, the main logical model of a query is a [`CanonicalQuery`](#canonicalquery). `CanonicalQuery` is itself a logical model, but also delegates to four other logical models to represent its components: -| Query Component | Logical Model | -| ---------------------------|---------------| -| filter | [`MatchExpression`](#matchexpression) | -| projection | [`Projection`](#projection) | -| sort | [`SortPattern`](#sortpattern) | -| distinct | [`CanonicalDistinct`](#canonicaldistinct) | +A **logical model** is a representation of how data is structured and how its components relate to +other data. In our case, the main logical model of a query is a [`CanonicalQuery`](#canonicalquery). +`CanonicalQuery` is itself a logical model, but also delegates to four other logical models to +represent its components: | Query Component | Logical Model | | +---------------------------|---------------| | filter | [`MatchExpression`](#matchexpression) | | +projection | [`Projection`](#projection) | | sort | [`SortPattern`](#sortpattern) | | distinct | +[`CanonicalDistinct`](#canonicaldistinct) | -The first step of query planning will take a `CanonicalQuery` as input, but the goal of our logical models is to refine and simplify all the components of a `CanonicalQuery` into its most basic form through desugaring, normalization, and other rewrites. +The first step of query planning will take a `CanonicalQuery` as input, but the goal of our logical +models is to refine and simplify all the components of a `CanonicalQuery` into its most basic form +through desugaring, normalization, and other rewrites. > ### Aside: Desugaring > -> When writing MQL, you might use shorthand notation which simplifies more explicit expressions. This shorthand notation is considered _syntactic sugar_ for the full form equivalent expression. For example: -> | Implicit (Syntactic Sugar) | Explicit | -> |----------------------------|----------| -> | `{field: "value"}` | `{field: {$eq: "value"}}` | -> | `{a: {$gte: 0}, b: {$lt: 5}}` | `{$and: [{a: {$gte: 0}}, {b: {$lt: 5}}]}` | +> When writing MQL, you might use shorthand notation which simplifies more explicit expressions. +> This shorthand notation is considered _syntactic sugar_ for the full form equivalent expression. +> For example: | Implicit (Syntactic Sugar) | Explicit | |----------------------------|----------| | +> `{field: "value"}` | `{field: {$eq: "value"}}` | | `{a: {$gte: 0}, b: {$lt: 5}}` | +> `{$and: [{a: {$gte: 0}}, {b: {$lt: 5}}]}` | > -> During desugaring, the parser interprets the shorthand notation and [converts](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/db/matcher/expression_parser.cpp#L284) it into the explicit form that it processes internally. +> During desugaring, the parser interprets the shorthand notation and +> [converts](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/db/matcher/expression_parser.cpp#L284) +> it into the explicit form that it processes internally. -By converting all instances of syntactic sugar into their explicit forms, we simplify later optimization steps. For example, rather than including logic to handle both implicit and explicit `$eq` separately, we convert all implicit `$eq`s to explicit `$eq`s, and therefore can handle both cases in the same logic path. +By converting all instances of syntactic sugar into their explicit forms, we simplify later +optimization steps. For example, rather than including logic to handle both implicit and explicit +`$eq` separately, we convert all implicit `$eq`s to explicit `$eq`s, and therefore can handle both +cases in the same logic path. ## `CanonicalQuery` -A [`CanonicalQuery`](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/db/query/canonical_query.h#L72) is a container that represents a parsed and normalized query. It contains the filter, projection, and sort components of the original query message. Each of these components is generated via the [`CanonicalQuery` constructor](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/db/query/canonical_query.cpp#L94). In order to create the `CanonicalQuery` in its "base" form, the `CanonicalQuery` delegates to three more processes and related data structures: `MatchExpression`, `Projection`, and `SortPattern` to handle the simplification, each of which is discussed in detail below. +A +[`CanonicalQuery`](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/db/query/canonical_query.h#L72) +is a container that represents a parsed and normalized query. It contains the filter, projection, +and sort components of the original query message. Each of these components is generated via the +[`CanonicalQuery` constructor](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/db/query/canonical_query.cpp#L94). +In order to create the `CanonicalQuery` in its "base" form, the `CanonicalQuery` delegates to three +more processes and related data structures: `MatchExpression`, `Projection`, and `SortPattern` to +handle the simplification, each of which is discussed in detail below. -If a `CanonicalQuery` [cannot be generated](../commands/query_cmd/README.md#parsing-aggregations) after parsing, we move the query straight to the Query Execution layer without optimization. +If a `CanonicalQuery` [cannot be generated](../commands/query_cmd/README.md#parsing-aggregations) +after parsing, we move the query straight to the Query Execution layer without optimization. > ### Aside: Canonicalization > -> `ParsedFindCommand` already contains the parsed filter, projection, and sort in their _original_ form. `CanonicalQuery` does not create these data structures; it _simplifies_ them from their original form and stores the result. For example, these `ParsedFindCommand`s all result in the same `CanonicalQuery` because `$and` and `$or` are normalized away when they have only one child. +> `ParsedFindCommand` already contains the parsed filter, projection, and sort in their _original_ +> form. `CanonicalQuery` does not create these data structures; it _simplifies_ them from their +> original form and stores the result. For example, these `ParsedFindCommand`s all result in the +> same `CanonicalQuery` because `$and` and `$or` are normalized away when they have only one child. > > ``` > db.c.find({a: 1}) @@ -42,16 +60,30 @@ If a `CanonicalQuery` [cannot be generated](../commands/query_cmd/README.md#pars ## `MatchExpression` -The `CanonicalQuery` holds a reference to an abstract syntax tree (AST) called a [`MatchExpression`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/matcher/expression.h#L72), a representation of the query's `filter` component. +The `CanonicalQuery` holds a reference to an abstract syntax tree (AST) called a +[`MatchExpression`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/matcher/expression.h#L72), +a representation of the query's `filter` component. > ### Aside: Abstract Syntax Trees > -> An Abstract Syntax Tree (AST) is a tree representation of a program's syntax. While a Concrete Syntax Tree (CST) models the exact grammar of a language, an AST _abstracts_ away unnecessary syntax details and models logical structure. Using an AST generally makes it easier to perform optimizations and static analysis. For an example of how we model `MatchExpression`s as ASTs, see below. +> An Abstract Syntax Tree (AST) is a tree representation of a program's syntax. While a Concrete +> Syntax Tree (CST) models the exact grammar of a language, an AST _abstracts_ away unnecessary +> syntax details and models logical structure. Using an AST generally makes it easier to perform +> optimizations and static analysis. For an example of how we model `MatchExpression`s as ASTs, see +> below. -`MatchExpression` is also the abstract type from which all nodes inherit. All possible `MatchExpression` nodes are enumerated by the [`MatchType`](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/db/matcher/expression.h#L80) enum. For each of these types, there exists a subclass that inherits from `MatchExpression`. For example: +`MatchExpression` is also the abstract type from which all nodes inherit. All possible +`MatchExpression` nodes are enumerated by the +[`MatchType`](https://github.com/mongodb/mongo/blob/8e6b2afd632cbcc67a2a129da0b1393d7576367e/src/mongo/db/matcher/expression.h#L80) +enum. For each of these types, there exists a subclass that inherits from `MatchExpression`. For +example: -- [`EqualityMatchExpression`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/matcher/expression_leaf.h#L314) corresponds to `MatchType::EQ` and the `$eq` operator; it must have 0 children nodes, also known as a `LeafMatchExpression`. -- [`AndMatchExpression`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/matcher/expression_tree.h#L139) corresponds to `MatchType::AND` and the `$and` operator; it has N children nodes, where N is the number of conjuncts of the `$and`. +- [`EqualityMatchExpression`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/matcher/expression_leaf.h#L314) + corresponds to `MatchType::EQ` and the `$eq` operator; it must have 0 children nodes, also known + as a `LeafMatchExpression`. +- [`AndMatchExpression`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/matcher/expression_tree.h#L139) + corresponds to `MatchType::AND` and the `$and` operator; it has N children nodes, where N is the + number of conjuncts of the `$and`. Let's take a look at how this query would be modeled as an AST: @@ -65,15 +97,24 @@ db.c.find({ }) ``` -During `MatchExpressionParser::parse()`, the request's `filter` component (in this case, our entire query) is broken down into individual `BSONElement`s and parsed separately into individual `MatchExpression`s. The `BSONObj`'s structure is then rebuilt as an AST of `MatchExpression`s. For example, this section: +During `MatchExpressionParser::parse()`, the request's `filter` component (in this case, our entire +query) is broken down into individual `BSONElement`s and parsed separately into individual +`MatchExpression`s. The `BSONObj`'s structure is then rebuilt as an AST of `MatchExpression`s. For +example, this section: ``` {$and: [{b: {$gte: 0}}, {c: {$lt: 5}}]} ``` -is broken up as 3 `BSONElement`s, each with their own `MatchExpression`. See below for the tree representation. +is broken up as 3 `BSONElement`s, each with their own `MatchExpression`. See below for the tree +representation. -In this case, the structure becomes an `AndMatchExpression` with two children: `GTEMatchExpression` and `LTMatchExpression`. This process occurs in [`parseSub()`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/matcher/expression_parser.cpp#L2132) for each element. If something doesn't fit the expected `BSONElement` pattern for `MatchExpression` conversion, [desugaring](#aside-desugaring) can be performed to convert the `filter` component into a parser-compatible form. +In this case, the structure becomes an `AndMatchExpression` with two children: `GTEMatchExpression` +and `LTMatchExpression`. This process occurs in +[`parseSub()`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/matcher/expression_parser.cpp#L2132) +for each element. If something doesn't fit the expected `BSONElement` pattern for `MatchExpression` +conversion, [desugaring](#aside-desugaring) can be performed to convert the `filter` component into +a parser-compatible form. After `MatchExpressionParser::parse()`, the underlying AST for our query looks like this: @@ -83,12 +124,19 @@ flowchart TD n4 --> n5["GTEMatchExpression
b >= 0"] & n6["LTMatchExpression
c < 5"] ``` -While `MatchExpressionParser::parse()` creates the initial `MatchExpression` and `ParsedFindCommand` holds the AST in its unoptimized form, `CanonicalQuery` calls [`MatchExpression::normalize()`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/query/canonical_query.cpp#L166), which begins the simplification process. The goal of normalization is to convert the `MatchExpression` AST into its most simple form. By simplifying `MatchExpression`s as much as possible, we ensure that: +While `MatchExpressionParser::parse()` creates the initial `MatchExpression` and `ParsedFindCommand` +holds the AST in its unoptimized form, `CanonicalQuery` calls +[`MatchExpression::normalize()`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/query/canonical_query.cpp#L166), +which begins the simplification process. The goal of normalization is to convert the +`MatchExpression` AST into its most simple form. By simplifying `MatchExpression`s as much as +possible, we ensure that: 1. Any resulting `QuerySolution`s will be as simple as possible -1. The [plan cache](plan_cache/README.md) will recognize logically equivalent queries as equivalent and reuse a cached plan when possible, even if the initial queries are different. +1. The [plan cache](plan_cache/README.md) will recognize logically equivalent queries as equivalent + and reuse a cached plan when possible, even if the initial queries are different. -In this case, we can apply the `$or` to `$in` rewrite rule because we have two `$eq` disjuncts on the same field. After normalization, our AST looks like this: +In this case, we can apply the `$or` to `$in` rewrite rule because we have two `$eq` disjuncts on +the same field. After normalization, our AST looks like this: ```mermaid flowchart TD @@ -96,17 +144,30 @@ flowchart TD n4 --> n5["GTEMatchExpression
b >= 0"] & n6["LTMatchExpression
c < 5"] ``` -This simplified `MatchExpression` AST could stem from an infinite number of queries, as we could always write queries with more and more complex boolean logic. This is just one example of a [**heuristic rewrite**](../matcher/README.md). +This simplified `MatchExpression` AST could stem from an infinite number of queries, as we could +always write queries with more and more complex boolean logic. This is just one example of a +[**heuristic rewrite**](../matcher/README.md). ## `Projection` -Much like `MatchExpression`, [`Projection`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/query/projection.h#L74) is also represented as an AST. From the initial `BSONObj` of the projection, [`parseAndAnalyze()`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/query/projection_parser.h#L52) is called to convert the raw `BSONObj` representation of the projection into an AST. By representing the `Projection` as an AST, we can traverse the tree and optimize each node based on its necessary dependencies. +Much like `MatchExpression`, +[`Projection`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/query/projection.h#L74) +is also represented as an AST. From the initial `BSONObj` of the projection, +[`parseAndAnalyze()`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/query/projection_parser.h#L52) +is called to convert the raw `BSONObj` representation of the projection into an AST. By representing +the `Projection` as an AST, we can traverse the tree and optimize each node based on its necessary +dependencies. > ### Aside: Inclusion and Exclusion Projections > -> An MQL projection is classified as either an inclusion projection or exclusion projection. In an inclusion projection, all desired fields are marked with a `1`; all other fields are excluded. In an exclusion projection, all undesired fields are marked with a `0`; all other fields are included. Because of this, a projection must be entirely "inclusion-based" or "exclusion-based"; you cannot specify some fields as included and others as excluded. +> An MQL projection is classified as either an inclusion projection or exclusion projection. In an +> inclusion projection, all desired fields are marked with a `1`; all other fields are excluded. In +> an exclusion projection, all undesired fields are marked with a `0`; all other fields are +> included. Because of this, a projection must be entirely "inclusion-based" or "exclusion-based"; +> you cannot specify some fields as included and others as excluded. > -> The only exception to the inclusion/exclusion rule is `_id`. The `_id` field _can_ differ from other fields, since `_id` is always assumed to be included. For example: +> The only exception to the inclusion/exclusion rule is `_id`. The `_id` field _can_ differ from +> other fields, since `_id` is always assumed to be included. For example: > > ``` > // Assume db.c contains fields 'a', 'b', and 'c'. @@ -131,9 +192,13 @@ flowchart TD ``` -Here, each `BooleanConstantASTNode` represents whether our final result will contain the field that points to it. +Here, each `BooleanConstantASTNode` represents whether our final result will contain the field that +points to it. -A more complex projection that involves nested fields or arrays may require substantially more complex syntax. This may involve many recursive children nodes for nested fields, or uses of the `$`, `$elemMatch` or `$slice` operators for arrays. Each of these is defined as its own node in [`projection_ast.h`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/query/projection_ast.h#L61). +A more complex projection that involves nested fields or arrays may require substantially more +complex syntax. This may involve many recursive children nodes for nested fields, or uses of the +`$`, `$elemMatch` or `$slice` operators for arrays. Each of these is defined as its own node in +[`projection_ast.h`](https://github.com/mongodb/mongo/blob/b0816f32f1eff965ffe069fc556ea968cc8533a6/src/mongo/db/query/projection_ast.h#L61). For example, given this more complex query: @@ -153,9 +218,15 @@ flowchart TD n4 --> n7["MatchExpressionASTNode
a.b > 0"] ``` -Note that in this example, the `a.d` field is a new field that is created by the projection as the result of an expression. This is a third type of projection beyond inclusion and exclusion, called addition. +Note that in this example, the `a.d` field is a new field that is created by the projection as the +result of an expression. This is a third type of projection beyond inclusion and exclusion, called +addition. -After parsing, [`optimizeProjection()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/projection.cpp#L241) walks the AST to find areas for optimization. This optimization has the potential to modify the AST in-place in simplifying it. For this reason, a `ProjectionASTMutableVisitor` recursively traverses the AST, calling `preVisit`, `inVisit`, and `postVisit` on each node in the AST. +After parsing, +[`optimizeProjection()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/projection.cpp#L241) +walks the AST to find areas for optimization. This optimization has the potential to modify the AST +in-place in simplifying it. For this reason, a `ProjectionASTMutableVisitor` recursively traverses +the AST, calling `preVisit`, `inVisit`, and `postVisit` on each node in the AST. For example, given this `ExpressionASTNode`: @@ -163,11 +234,19 @@ For example, given this `ExpressionASTNode`: {x: {$and: [false, "$b"]}} ``` -the initial construction of the AST would imply that `x` is dependent on `b`. After optimization, however, it becomes clear that this expression is unsatisfiable, and `x`'s dependency on `b` is released. +the initial construction of the AST would imply that `x` is dependent on `b`. After optimization, +however, it becomes clear that this expression is unsatisfiable, and `x`'s dependency on `b` is +released. ## `SortPattern` -If a query contains a sort specification, the `CanonicalQuery` will store the sort as a [`SortPattern`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/sort_pattern.h#L53), which is a vector of [`SortPatternPart`s](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/sort_pattern.h#L64). Each `SortPatternPart` represents one field and its sort order. This data structure can be used for reference throughout the optimization process and accessed via the `CanonicalQuery`. This is generally as simple as: +If a query contains a sort specification, the `CanonicalQuery` will store the sort as a +[`SortPattern`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/sort_pattern.h#L53), +which is a vector of +[`SortPatternPart`s](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/sort_pattern.h#L64). +Each `SortPatternPart` represents one field and its sort order. This data structure can be used for +reference throughout the optimization process and accessed via the `CanonicalQuery`. This is +generally as simple as: ``` db.c.find({}).sort({"a": 1, "b.c": -1}) @@ -186,17 +265,26 @@ SortPattern: [ > aggregate([{$match: {$text: {$search: 'Some Text'}}}, {$sort: {textScore: {$meta: 'textScore'}}}]) > ``` > -> Here, `{$meta: 'textScore'}` is an example of "document metadata" which sorts by descending relevance score. This sort order ignores its specifying field name and instead uses the metadata as an implicit top-level field. For more information on text score metadata sort, see [here](https://www.mongodb.com/docs/manual/reference/operator/aggregation/sort/#std-label-sort-pipeline-metadata). +> Here, `{$meta: 'textScore'}` is an example of "document metadata" which sorts by descending +> relevance score. This sort order ignores its specifying field name and instead uses the metadata +> as an implicit top-level field. For more information on text score metadata sort, see +> [here](https://www.mongodb.com/docs/manual/reference/operator/aggregation/sort/#std-label-sort-pipeline-metadata). ## `CanonicalDistinct` -When a `distinct()` query is run or an aggregation pipeline is eligible to use a `DISTINCT_SCAN`, the `CanonicalQuery` holds a [`CanonicalDistinct`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/canonical_distinct.h#L52). The `CanonicalDistinct` is effectively a container that holds all the data regarding a distinct query, just as `CanonicalQuery` is for find. Note that data such as the distinct command's `filter` is still maintained by the `CanonicalQuery`, however. For example, in this query: +When a `distinct()` query is run or an aggregation pipeline is eligible to use a `DISTINCT_SCAN`, +the `CanonicalQuery` holds a +[`CanonicalDistinct`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/canonical_distinct.h#L52). +The `CanonicalDistinct` is effectively a container that holds all the data regarding a distinct +query, just as `CanonicalQuery` is for find. Note that data such as the distinct command's `filter` +is still maintained by the `CanonicalQuery`, however. For example, in this query: ``` db.c.distinct("x", {x: {$gt: 0}}) ``` -the `CanonicalDistinct` holds the information that the distinct key is `"x"`, but the `CanonicalQuery` holds the `MatchExpression` representing `x > 0`. +the `CanonicalDistinct` holds the information that the distinct key is `"x"`, but the +`CanonicalQuery` holds the `MatchExpression` representing `x > 0`. --- diff --git a/src/mongo/db/query/README_query_feature_flags.md b/src/mongo/db/query/README_query_feature_flags.md index d6451a0de10..4262792f8cd 100644 --- a/src/mongo/db/query/README_query_feature_flags.md +++ b/src/mongo/db/query/README_query_feature_flags.md @@ -90,9 +90,9 @@ To answer this, we provide an abbreviated transcript of developers discussing th Parker Felix: -> Hi, I've been discussing a concern I've had around FCV transitions and v1/v2 oplog entries -> in #server-replication and wanted to move the discussion here. After adding an FCV check for -> applying v2 oplog entries, I noticed in a failure in a test that: +> Hi, I've been discussing a concern I've had around FCV transitions and v1/v2 oplog entries in +> #server-replication and wanted to move the discussion here. After adding an FCV check for applying +> v2 oplog entries, I noticed in a failure in a test that: > > 1. Starts a v2 eligible update > 2. Hangs the update after checking the feature flag and deciding it can use the v2 oplog path @@ -117,8 +117,8 @@ Parker Felix: Wenquin Ye: > What's the specific concern with acquiring a global IX lock. Are we afraid that it may block or -> starve some other important user operation? There is some precedent for user operations to acquire a -> global IX lock to serialize with setFCV. For example [transactions acquire the global IX +> starve some other important user operation? There is some precedent for user operations to acquire +> a global IX lock to serialize with setFCV. For example [transactions acquire the global IX > lock][txns_acquire_ix_lock_ref] when started. Also regarding, why we haven't seen this before, > maybe it's just because we haven't done extensive downgrade testing until now. For instance [I > recall finding another node crash issue that technically exists on the downgrade from 5.0 to @@ -126,12 +126,12 @@ Wenquin Ye: Parker Felix: -> I guess I'm concerned about performance implications of changing what locks updates -> use but I'm not that familiar with our locking mechanisms. I'm also not sure if the update -> completing before the setFCV takes effect gives us strong enough guarantees. If setFCV has an -> associated oplog entry and oplog entries need to be processed in order, that might be sufficient -> for ensuring that there are no outstanding v2 oplog entries once we have completed the FCV -> transition to 4.4 (and can then have 4.4 binaries in the replica set). +> I guess I'm concerned about performance implications of changing what locks updates use but I'm +> not that familiar with our locking mechanisms. I'm also not sure if the update completing before +> the setFCV takes effect gives us strong enough guarantees. If setFCV has an associated oplog entry +> and oplog entries need to be processed in order, that might be sufficient for ensuring that there +> are no outstanding v2 oplog entries once we have completed the FCV transition to 4.4 (and can then +> have 4.4 binaries in the replica set). huayu: @@ -161,12 +161,11 @@ josef: > 1. Starts a v2 eligible update > 2. Hangs the update after checking the feature flag and deciding it can use the v2 oplog path > 3. Starts an FCV 7.0->4.4 downgrade, disabling the feature flag on the primary and -> secondaries"checking the feature flag and deciding it can use the v2 oplog path" has to happen after -> acquiring the collection lock (which by extension acquires the global lock) and the lock must be -> held for the entire duration of the write unit of work WUOW. This way there's no interleaving with -> FCV changes. -> Oh and, if this is a multi-update, it should do the above after every yield/resume, that is, -> for each document it updates. +> secondaries"checking the feature flag and deciding it can use the v2 oplog path" has to happen +> after acquiring the collection lock (which by extension acquires the global lock) and the lock +> must be held for the entire duration of the write unit of work WUOW. This way there's no +> interleaving with FCV changes. Oh and, if this is a multi-update, it should do the above after +> every yield/resume, that is, for each document it updates. Parker Felix: @@ -176,9 +175,9 @@ Parker Felix: > driver][update_driver_decision_ref]. The update's MODE_IX lock will conflict with the [MODE_S > lock][MODE_S_lock_ref] taken by setFeatureCompatibilityVersion, so one will block after the other. > In the event of a yield during the update query, we [don't persist any data from the update -> driver][no_persistence_update_driver_ref] and will check the value of the flag again after the yield -> in case it has changed. I think this should then guarantee that the secondaries will process the -> remaining v2 oplog entries before the setFCV downgrade oplog entry. +> driver][no_persistence_update_driver_ref] and will check the value of the flag again after the +> yield in case it has changed. I think this should then guarantee that the secondaries will process +> the remaining v2 oplog entries before the setFCV downgrade oplog entry. huayu: @@ -193,9 +192,8 @@ Parker Felix: ianb: -> FWIW one of -> the tests has an assertion about the order of the oplog entries to this -> effect: [v2_delta_oplog_entries_fcv.js][v2_delta_oplog_entries_fcv_dot_js] +> FWIW one of the tests has an assertion about the order of the oplog entries to this effect: +> [v2_delta_oplog_entries_fcv.js][v2_delta_oplog_entries_fcv_dot_js] ```js // Check that the sequence of oplog entries is right. We expect to see the following @@ -220,11 +218,11 @@ huayu: Parker Felix: -> It seems like FCV gated feature flags are disabled when the FCV transition starts -> rather than upon completion of FCV downgrade. For this test, we hang the update after it checks the -> value of the feature flag and determines it can generate a v2 oplog. While the update is hanging, we -> initiate an FCV downgrade to 4.4, disabling the feature flag on the secondaries. When we resume the -> update, it still produces a v2 oplog entry that was then failing the feature flag check in oplog +> It seems like FCV gated feature flags are disabled when the FCV transition starts rather than upon +> completion of FCV downgrade. For this test, we hang the update after it checks the value of the +> feature flag and determines it can generate a v2 oplog. While the update is hanging, we initiate +> an FCV downgrade to 4.4, disabling the feature flag on the secondaries. When we resume the update, +> it still produces a v2 oplog entry that was then failing the feature flag check in oplog > application that I am going to remove. huayu: @@ -238,7 +236,8 @@ huayu: > 3. We start an FCV downgrade to 4.4. This will transition the FCV to the downgrading to 4.4. phase > which means the feature flag is disabled. This also writes an oplog entry to update the FCV doc > to downgrading to 4.4 to the oplog -> 4. The FCV downgrade will hang waiting to acquire the global lock in S mode [here][MODE_S_lock_ref] +> 4. The FCV downgrade will hang waiting to acquire the global lock in S mode +> [here][MODE_S_lock_ref] > 5. We resume the update which writes an v2 oplog entry to the oplog > 6. FCV downgrade can now acquire the global lock, and at the end it transitions the FCV to 4.4 and > writes an oplog entry for that @@ -286,17 +285,25 @@ cannot offer any more specific advice. [fcv_readme]: /src/mongo/db/repl/FCV_AND_FEATURE_FLAG_README.md [version_context]: /src/mongo/db/version_context.h -[good_parse_example]: https://github.com/mongodb/mongo/blob/8ac5a0c814a5e8a0f79825327fdf6c3aa118c0fa/src/mongo/db/pipeline/document_source.cpp#L135 +[good_parse_example]: + https://github.com/mongodb/mongo/blob/8ac5a0c814a5e8a0f79825327fdf6c3aa118c0fa/src/mongo/db/pipeline/document_source.cpp#L135 [SERVER-103028]: https://jira.mongodb.org/browse/SERVER-103028 [SERVER-91281]: https://jira.mongodb.org/browse/SERVER-91281 [SERVER-109985]: https://jira.mongodb.org/browse/SERVER-109985 [SERVER-91269]: https://jira.mongodb.org/browse/SERVER-91269 -[txns_acquire_ix_lock_ref]: https://github.com/mongodb/mongo/blob/965823ff377bc04ac0a4fce344aa9ab3f7e4eed0/src/mongo/db/transaction/transaction_participant.cpp#L1746-L1747 +[txns_acquire_ix_lock_ref]: + https://github.com/mongodb/mongo/blob/965823ff377bc04ac0a4fce344aa9ab3f7e4eed0/src/mongo/db/transaction/transaction_participant.cpp#L1746-L1747 [related_downgrade_crash_ref]: https://jira.mongodb.org/browse/SERVER-103343 -[dedicated_fcv_batch_ref]: https://github.com/mongodb/mongo/blob/770e79f6262294b67da4845a2872e123f7401a0b/src/mongo/db/namespace_string.cpp#L153-L162 -[ix_lock_ref]: https://github.com/mongodb/mongo/blob/5fca8916aebed11980bdb11437ade6e5baa49198/src/mongo/db/query/write_ops/write_ops_exec.cpp#L753-L757 -[update_driver_decision_ref]: https://github.com/mongodb/mongo/blob/da243b43b0879ff263a1d1ff68dcb204a5e40e47/src/mongo/db/update/update_driver.cpp#L296-L299 +[dedicated_fcv_batch_ref]: + https://github.com/mongodb/mongo/blob/770e79f6262294b67da4845a2872e123f7401a0b/src/mongo/db/namespace_string.cpp#L153-L162 +[ix_lock_ref]: + https://github.com/mongodb/mongo/blob/5fca8916aebed11980bdb11437ade6e5baa49198/src/mongo/db/query/write_ops/write_ops_exec.cpp#L753-L757 +[update_driver_decision_ref]: + https://github.com/mongodb/mongo/blob/da243b43b0879ff263a1d1ff68dcb204a5e40e47/src/mongo/db/update/update_driver.cpp#L296-L299 -[MODE_S_lock_ref]: https://github.com/mongodb/mongo/blob/339bd22e371a069a167db2b7ede52c6a299fa55d/src/mongo/db/commands/set_feature_compatibility_version_command.cpp#L1388-L1393   -[no_persistence_update_driver_ref]: https://github.com/mongodb/mongo/blob/5fca8916aebed11980bdb11437ade6e5baa49198/src/mongo/db/exec/update_stage.cpp#L534-L546 -[v2_delta_oplog_entries_fcv_dot_js]: https://github.com/mongodb/mongo/blob/da243b43b0879ff263a1d1ff68dcb204a5e40e47/jstests/multiVersion/s8/v2_delta_oplog_entries_fcv.js#L268-L274v2_delta_oplog_entries_fcv.js +[MODE_S_lock_ref]: +https://github.com/mongodb/mongo/blob/339bd22e371a069a167db2b7ede52c6a299fa55d/src/mongo/db/commands/set_feature_compatibility_version_command.cpp#L1388-L1393   +[no_persistence_update_driver_ref]: +https://github.com/mongodb/mongo/blob/5fca8916aebed11980bdb11437ade6e5baa49198/src/mongo/db/exec/update_stage.cpp#L534-L546 +[v2_delta_oplog_entries_fcv_dot_js]: +https://github.com/mongodb/mongo/blob/da243b43b0879ff263a1d1ff68dcb204a5e40e47/jstests/multiVersion/s8/v2_delta_oplog_entries_fcv.js#L268-L274v2_delta_oplog_entries_fcv.js diff --git a/src/mongo/db/query/README_query_shape_disambiguation.md b/src/mongo/db/query/README_query_shape_disambiguation.md index 1c81e989fa5..4229ce336c1 100644 --- a/src/mongo/db/query/README_query_shape_disambiguation.md +++ b/src/mongo/db/query/README_query_shape_disambiguation.md @@ -1,6 +1,8 @@ # Disambiguation of Various Query Shape Concepts -You have probably arrived here while reading or thinking about one query shape concept and are wondering how it relates to other similar concepts. This page aims to describe the different purposes and discriminating qualities of the following: +You have probably arrived here while reading or thinking about one query shape concept and are +wondering how it relates to other similar concepts. This page aims to describe the different +purposes and discriminating qualities of the following: - Query Shape - Query Stats Key @@ -34,11 +36,7 @@ Consider a query like the following. ```js db.runCommand({ aggregate: "foo", - pipeline: [ - {$match: {x: {$gte: 2}}}, - {$replaceWith: "$subDoc"}, - {$out: "bar"}, - ], + pipeline: [{$match: {x: {$gte: 2}}}, {$replaceWith: "$subDoc"}, {$out: "bar"}], comment: "I am just a humble example", readConcern: {level: "majority"}, }); @@ -49,14 +47,16 @@ Let's use this example to draw a contrast between the concepts which are observa > Note: when writing this example I chose `$replaceWith` as an example of a stage which cannot be > pushed into the access planner (the classic engine is not planned to ever support `$replaceWith`, > and SBE does not at time of this writing). There are plans to support this in SBE at which point -> this example will become stale, but at least for quite some time there will remain some stages which -> are unsupported for push down into the access plan, and the general theme/concept still holds. +> this example will become stale, but at least for quite some time there will remain some stages +> which are unsupported for push down into the access plan, and the general theme/concept still +> holds. ### The `planCacheShapeHash` (The Artist Formerly Known as `queryHash`) -For this query, the `planCacheShapeHash` will just consider the namespace and the $match predicate. -It would be the same shape or at least very similar to a find command with an `{x: {$gte: 10}}` -filter, since the access planner won't see/consider`$replaceWith` and everything after that. +For this query, the `planCacheShapeHash` will just consider the namespace and the +$match predicate. +It would be the same shape or at least very similar to a find command with an `{x: {$gte: +10}}` filter, since the access planner won't see/consider`$replaceWith` and everything after that. ### The `planCacheKeyHash` @@ -82,36 +82,37 @@ have a simple diagnostic name like the others. This will shapify the whole comma There are several things to keep in mind when deciding this question: -1. The Query Stats Store Key is generally meant to be the most discriminating way to collect metrics. - It is always possible for consumers of the metrics to perform their own grouping operation to - collapse two or more groups back into one, but it is impossible to undo a grouping. That being said, - we also cannot afford to have infinite entries, so there is a balance. As an example, we do not want - to track each and every 'comment' separately, since we know of cases where customers may use the - comment field as a sort of request ID with very high cardinality. +1. The Query Stats Store Key is generally meant to be the most discriminating way to collect + metrics. It is always possible for consumers of the metrics to perform their own grouping + operation to collapse two or more groups back into one, but it is impossible to undo a grouping. + That being said, we also cannot afford to have infinite entries, so there is a balance. As an + example, we do not want to track each and every 'comment' separately, since we know of cases + where customers may use the comment field as a sort of request ID with very high cardinality. 2. The Query Shape is the key used for query settings application. Any two queries with the same - shape will have the same settings applied. It would then logically follow that any two queries with - the same query stats store key also have the same query settings applied, since the query shape - is part of the query stats store key. + shape will have the same settings applied. It would then logically follow that any two queries + with the same query stats store key also have the same query settings applied, since the query + shape is part of the query stats store key. 3. The Query Shape is generally meant to capture anything semantically important to the query. If an option may change the results, it should probably go here. If the option might only impact performance or isolation, it should not go here. A couple examples: - - The 'maxTimeMs' is not part of the query shape since it does not matter for the semantics of the - query - it's purely an operational concern. - - As a trickier example, 'readConcern' was also excluded from the query shape since it only impacts - isolation guarantees. A shapified version is included in the query stats store key. From a query - language perspective, it does not really dictate which documents will semantically match the query - - it's purely a matter of timing. It was deemed more of an operational option/concern. This is a close - call, because - considering shard filtering and the readConcern level 'available', which does not - apply shard filtering - a different readConcern level may actually impact which query plan is - appropriate, and so it probably should play a discriminating role in a plan cache key. A key thought - experiment here was that a query setting for two identical queries which differ only by readConcern - should probably apply to both. - - Finally, another borderline example, the 'hint' option is **not** part of the query shape because - it should only impact performance. The 'hint' is part of the query stats store key. Further, the - team decided that a query setting should likely impact all queries which differ only by hint (and it - should override that hint). If an operator sees that a particular query shape should prefer one - index type, it should logically apply to all queries of that shape. One day we may add the ability - to override a query setting via a hint, but this is the intended behavior for now. + - The 'maxTimeMs' is not part of the query shape since it does not matter for the semantics of + the query - it's purely an operational concern. + - As a trickier example, 'readConcern' was also excluded from the query shape since it only + impacts isolation guarantees. A shapified version is included in the query stats store key. + From a query language perspective, it does not really dictate which documents will semantically + match the query - it's purely a matter of timing. It was deemed more of an operational + option/concern. This is a close call, because - considering shard filtering and the readConcern + level 'available', which does not apply shard filtering - a different readConcern level may + actually impact which query plan is appropriate, and so it probably should play a + discriminating role in a plan cache key. A key thought experiment here was that a query setting + for two identical queries which differ only by readConcern should probably apply to both. + - Finally, another borderline example, the 'hint' option is **not** part of the query shape + because it should only impact performance. The 'hint' is part of the query stats store key. + Further, the team decided that a query setting should likely impact all queries which differ + only by hint (and it should override that hint). If an operator sees that a particular query + shape should prefer one index type, it should logically apply to all queries of that shape. One + day we may add the ability to override a query setting via a hint, but this is the intended + behavior for now. diff --git a/src/mongo/db/query/benchmark/data_generator/README.md b/src/mongo/db/query/benchmark/data_generator/README.md index 332dc60ee5f..d5e32cf89de 100644 --- a/src/mongo/db/query/benchmark/data_generator/README.md +++ b/src/mongo/db/query/benchmark/data_generator/README.md @@ -1,8 +1,7 @@ # Query Data Generator -This is a data generation framework for Query benchmarks. -It is intended to be a way to easily produce workloads for Query testing use cases. -Key features of this work: +This is a data generation framework for Query benchmarks. It is intended to be a way to easily +produce workloads for Query testing use cases. Key features of this work: - Data schema are declarative. Relationships are interspersed in the schema, however. - Correlations are easy to model. This is done through a combination of three mechanisms. @@ -84,7 +83,8 @@ look for a default distribution for the type: - For basic python ``s, the default function is the Faker function `py`. See https://faker.readthedocs.io/en/master/providers/faker.providers.python.html -- For a nested object type, the data generator will generate instances of the type by looking its definition. +- For a nested object type, the data generator will generate instances of the type by looking its + definition. In this example, `InnerObject.f` will be populated by the `faker.pylist` function, and the fully populated `InnerObject` will be used to populate `OuterObject.f`. @@ -105,8 +105,8 @@ The specification class has two positional arguments: 1. `source`, which is the distribution itself. This can take three forms: - a. A function. The function signature should be `def -func(fac: datagen.util.CorrelatedDataFactory, **kwargs)`. Passing the CorrelatedDataFactory + a. A function. The function signature should be + `def func(fac: datagen.util.CorrelatedDataFactory, **kwargs)`. Passing the CorrelatedDataFactory through the function allows the values produced to be recorded. b. A type. In this case, the default distribution for the type will be used. @@ -119,9 +119,9 @@ func(fac: datagen.util.CorrelatedDataFactory, **kwargs)`. Passing the Correlated match `dependson` will be passed into `source`, along with any values that have been passed down from parent objects. - In the example below, `NestedObject2.i2_1` depends on `NestedObject.i1_1`. Because `dependson` can - only pass in a value of a field belonging to the same object, the value of `OuterObject.o_1` is - passed into the distribution function for `OuterObject.o_3` so that the `NestedObject2` + In the example below, `NestedObject2.i2_1` depends on `NestedObject.i1_1`. Because `dependson` + can only pass in a value of a field belonging to the same object, the value of `OuterObject.o_1` + is passed into the distribution function for `OuterObject.o_3` so that the `NestedObject2` distribution function can see it. ``` @@ -158,8 +158,8 @@ This data generator supports the following external distributions: - Any function available to a [`faker.Faker`](https://faker.readthedocs.io/en/master/index.html) instance. In order to use the same seed as the rest of the data generator, you should call the function from `datagen.random.global_faker()`. -- Any function available to a [`numpy Random -Generator`](https://numpy.org/doc/stable/reference/random/generator.html) In order +- Any function available to a + [`numpy Random Generator`](https://numpy.org/doc/stable/reference/random/generator.html) In order to use the same seed as the rest of the data generator, you should call the function from `datagen.random.numpy_random()`. Note that the numpy Random generator does not support built-in correlation capabilities because it uses a rng that we have not figured out how to override. @@ -305,12 +305,11 @@ and derivation. ### Linear scaling In linear scaling, two fields `a` and `b` (drawn from discrete, ordered sets) are correlated -indirectly by instead correlating them to a hidden value `n`. -Each distinct `n` maps to a potentially non-unique (`a`, `b`) pair. -In the common case, as `n` increases, so do the values of `a` and `b`, though they might not -increase at the same rates as each other. -Any linear correlation can be represented this way, including inverse correlations (by simply -reversing the order of traversal for that field). +indirectly by instead correlating them to a hidden value `n`. Each distinct `n` maps to a +potentially non-unique (`a`, `b`) pair. In the common case, as `n` increases, so do the values of +`a` and `b`, though they might not increase at the same rates as each other. Any linear correlation +can be represented this way, including inverse correlations (by simply reversing the order of +traversal for that field). This diagram provides a simple visual of how this works: @@ -326,11 +325,9 @@ Thus, `a` and `b` are directly correlated. ### Derivation Linear scaling alone is insufficient for representing many of the correlations in which we are -interested. -For example, salary might depend on not only the employee's level, but also their team and track -(whether they are a manager, etc.). -To represent such a correlation, one might think about salary as a multivariate function of the -employee's level, team, and track, such as: +interested. For example, salary might depend on not only the employee's level, but also their team +and track (whether they are a manager, etc.). To represent such a correlation, one might think about +salary as a multivariate function of the employee's level, team, and track, such as: ``` f: (level, team, track) -> salary @@ -341,8 +338,7 @@ Fortunately, this is fairly easy to model in Python as... regular functions. ## Implementation Linear scaling is primarily implemented using a custom random number generator, -`datagen.random.CorrelatedRng`. -This subclass of `random.Random` makes two important changes: +`datagen.random.CorrelatedRng`. This subclass of `random.Random` makes two important changes: 1. Replaces modulus-based arithmetic with multiplication-based arithmetic. 2. Maintains a cache of states so that random number generations are deterministically repeatable. @@ -351,7 +347,8 @@ This subclass of `random.Random` makes two important changes: ## Generating completely uncorrelated data -Completely uncorrelated data is generated if a given specification does not use the `correlation` distribution: +Completely uncorrelated data is generated if a given specification does not use the `correlation` +distribution: ```python @dataclasses.dataclass @@ -360,8 +357,8 @@ class Uncorrelated: field2: Specification(source=uniform(["a", "b"])) ``` -The configuration above produces a collection where `field1` and `field2` are completely uncorrelated -so that all combinations of values are equally likely: +The configuration above produces a collection where `field1` and `field2` are completely +uncorrelated so that all combinations of values are equally likely: ``` Enterprise test> db.Uncorrelated.aggregate([{ $group: { _id: {field1: "$field1", field2:"$field2"}, count: { $count: {} }}}, {$sort: {"_id.field1":1, "_id.field2":1}}] ) ; @@ -375,9 +372,9 @@ Enterprise test> db.Uncorrelated.aggregate([{ $group: { _id: {field1: "$field1", ## Generating partially correlated data -By default, if a field specification is configured to use a particular correlation, all data generated -for it will be 100% correlated. To mix in some uncorrelated data, create a `choice` between a -`correlation` distribution and a non-`correlation` distribution: +By default, if a field specification is configured to use a particular correlation, all data +generated for it will be 100% correlated. To mix in some uncorrelated data, create a `choice` +between a `correlation` distribution and a non-`correlation` distribution: ```python from datagen.distribution import correlation, uniform @@ -407,7 +404,8 @@ Enterprise test> db.Uncorrelated.aggregate([{ $group: { _id: {field1: "$field1", ## Generating objects -There are two ways to generate objects -- either directly from a Python dict or from a class that is defined in the spec. +There are two ways to generate objects -- either directly from a Python dict or from a class that is +defined in the spec. ### Generating from a python dict @@ -484,14 +482,20 @@ produces the following `.schema` fragment: python extract_schema.py --db tpch --uri "mongodb://localhost:20000" > out/tpch.schema ``` -This will extract the schema from one or more collections in the database and dump the extracted metadata to stdout in `.schema` format. +This will extract the schema from one or more collections in the database and dump the extracted +metadata to stdout in `.schema` format. -As `.schema` files do not natively support multiple collections, so if no collection is specified via `--collection`, the metadata from -all the collections will be joined together in a single schema specification. +As `.schema` files do not natively support multiple collections, so if no collection is specified +via `--collection`, the metadata from all the collections will be joined together in a single schema +specification. -Each collection will be sampled using `{"$sample": {"size": 10000}}` unless a different value is specified using `--sample-size`. Note that -the use of sample size that is smaller than the collection size has the following side effects: +Each collection will be sampled using `{"$sample": {"size": 10000}}` unless a different value is +specified using `--sample-size`. Note that the use of sample size that is smaller than the +collection size has the following side effects: -- The `missing_count` and `unique` sections of the schema will reflect just the sample, and not the entire collection. -- The `min` and the `max` are based on the values from the sample and are not the global min and max for the collection. -- Tf a given field was not present in the sample at all, it will also not be present in the output schema. +- The `missing_count` and `unique` sections of the schema will reflect just the sample, and not the + entire collection. +- The `min` and the `max` are based on the values from the sample and are not the global min and max + for the collection. +- Tf a given field was not present in the sample at all, it will also not be present in the output + schema. diff --git a/src/mongo/db/query/compiler/rewrites/README.md b/src/mongo/db/query/compiler/rewrites/README.md index e4e9564d181..05cefc74bc1 100644 --- a/src/mongo/db/query/compiler/rewrites/README.md +++ b/src/mongo/db/query/compiler/rewrites/README.md @@ -2,22 +2,54 @@ ## Overview -The rule-based rewrite engine is a simple but generic-purpose engine for applying sets of rewrite rules to a data structure. It is currently only used for [optimizing aggregation pipelines](https://github.com/mongodb/mongo/blob/e4bf22b6936f3795e11890c908521825120c8a05/src/mongo/db/pipeline/README.md). The following sections describe different components that make up the engine ([the rules](#rules), [the rewrite context](#rewrite-context), and [the engine](#rewrite-engine) itself). +The rule-based rewrite engine is a simple but generic-purpose engine for applying sets of rewrite +rules to a data structure. It is currently only used for +[optimizing aggregation pipelines](https://github.com/mongodb/mongo/blob/e4bf22b6936f3795e11890c908521825120c8a05/src/mongo/db/pipeline/README.md). +The following sections describe different components that make up the engine ([the rules](#rules), +[the rewrite context](#rewrite-context), and [the engine](#rewrite-engine) itself). ## Rules -The rewrite engine executes rules, which are defined by a name, precondition and transform functions, a priority and a set of tags. The precondition function determines whether the transform function should run. Priority is used to determine the order in which rules are applied when multiple rules may apply to the same element. The tags allow the engine to be invoked to only apply a certain subset of rules. +The rewrite engine executes rules, which are defined by a name, precondition and transform +functions, a priority and a set of tags. The precondition function determines whether the transform +function should run. Priority is used to determine the order in which rules are applied when +multiple rules may apply to the same element. The tags allow the engine to be invoked to only apply +a certain subset of rules. https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L51-L81 ## Rewrite Engine -The engine is a [generic class](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L164-L165) responsible for driving the rewrite process and maintaining a priority queue of rules that are applicable to the element that is being rewritten. It can be specialized to work with any data structure by providing it with an implementation of the [rewrite context](#rewrite-context) that knows how to walk and modify that structure. The engine is invoked by calling the [`applyRules()`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L181) method (see [`optimize.cpp`](https://github.com/mongodb/mongo/blob/126ab84794ef530fd2503453c9f8828743a4e7e7/src/mongo/db/pipeline/optimization/optimize.cpp#L44-L48) for example usage). The rewrite process is essentially a loop that asks the rewrite context for all rules that can apply to the current element, attempts them in priority order, and either advances to the next element or retries the rules on the same element depending on whether any transform reported that it changed the position of the current element. +The engine is a +[generic class](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L164-L165) +responsible for driving the rewrite process and maintaining a priority queue of rules that are +applicable to the element that is being rewritten. It can be specialized to work with any data +structure by providing it with an implementation of the [rewrite context](#rewrite-context) that +knows how to walk and modify that structure. The engine is invoked by calling the +[`applyRules()`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L181) +method (see +[`optimize.cpp`](https://github.com/mongodb/mongo/blob/126ab84794ef530fd2503453c9f8828743a4e7e7/src/mongo/db/pipeline/optimization/optimize.cpp#L44-L48) +for example usage). The rewrite process is essentially a loop that asks the rewrite context for all +rules that can apply to the current element, attempts them in priority order, and either advances to +the next element or retries the rules on the same element depending on whether any transform +reported that it changed the position of the current element. https://github.com/mongodb/mongo/blob/126ab84794ef530fd2503453c9f8828743a4e7e7/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L181-L203 -Besides constructing the engine and calling [`applyRules()`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L181), users of the engine should not interact with it directly. Rules never interact with the engine directly either. +Besides constructing the engine and calling +[`applyRules()`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L181), +users of the engine should not interact with it directly. Rules never interact with the engine +directly either. ## Rewrite Context -The rewrite engine itself is agnostic to the details of the data structure that it is rewriting. It relies on the interface provided by a concrete [`RewriteContext`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L90) implementation to walk and modify the structure, and to decide which rules can apply to which elements. The interface is defined as follows: https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L92-L112 +The rewrite engine itself is agnostic to the details of the data structure that it is rewriting. It +relies on the interface provided by a concrete +[`RewriteContext`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L90) +implementation to walk and modify the structure, and to decide which rules can apply to which +elements. The interface is defined as follows: +https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L92-L112 -Similarly, rules have access to the context and can use it to enqueue additional rules. The context can also expose additional helpers to rules, e.g. for modifying the structure that is being rewritten. See [`rule_based_rewrites::pipeline::Transforms`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L202) for an example. +Similarly, rules have access to the context and can use it to enqueue additional rules. The context +can also expose additional helpers to rules, e.g. for modifying the structure that is being +rewritten. See +[`rule_based_rewrites::pipeline::Transforms`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L202) +for an example. diff --git a/src/mongo/db/query/plan_cache/README.md b/src/mongo/db/query/plan_cache/README.md index 420fb3ac25d..e1803dd171c 100644 --- a/src/mongo/db/query/plan_cache/README.md +++ b/src/mongo/db/query/plan_cache/README.md @@ -15,7 +15,8 @@ The query engine currently supports two plan cache implementations: 1. [Classic Plan Cache](#classic-plancache) 1. [Slot-Based Execution (SBE) Plan Cache](#sbe-plancache) -In both cases, logically, the plan cache is an in-memory `map`. See below for implementation details of their respective key-value pairs. +In both cases, logically, the plan cache is an in-memory `map`. See +below for implementation details of their respective key-value pairs. | | Classic | SBE | | ---------------- | -------------------------------------- | ------------------------------------ | @@ -32,19 +33,26 @@ In both cases, logically, the plan cache is an in-memory `map db.coll.aggregate([{$planCacheStats: {}}]); > ``` > -> At a high level, the output is a document that contains a list of entries, each of which contains the query shape's `CanonicalQuery`, `planCacheShapeHash`, `planCacheKey`, active/inactive status, and many other data points. +> At a high level, the output is a document that contains a list of entries, each of which contains +> the query shape's `CanonicalQuery`, `planCacheShapeHash`, `planCacheKey`, active/inactive status, +> and many other data points. > -> See the [docs](https://www.mongodb.com/docs/manual/reference/operator/aggregation/planCacheStats/#output) for an in-depth explanation of the output. +> See the +> [docs](https://www.mongodb.com/docs/manual/reference/operator/aggregation/planCacheStats/#output) +> for an in-depth explanation of the output. ## Classic [`PlanCache`](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/plan_cache/classic_plan_cache.h#L289) -Each collection has its own instance of the Classic plan cache. The plan cache only exists on `mongod`. +Each collection has its own instance of the Classic plan cache. The plan cache only exists on +`mongod`. ### [`PlanCacheKey`](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/plan_cache/classic_plan_cache.h#L54) A `PlanCacheKey` is a hash value that encodes the [query shape](../query_shape/README.md). -[`encodeClassic()`](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/canonical_query_encoder.cpp#L1287) is used to convert a `CanonicalQuery` into a hexadecimal string representation of the query shape: the `PlanCacheKey`. +[`encodeClassic()`](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/canonical_query_encoder.cpp#L1287) +is used to convert a `CanonicalQuery` into a hexadecimal string representation of the query shape: +the `PlanCacheKey`. For example, given this query: @@ -59,28 +67,55 @@ The classic encoding is "`an[eqa,eqb]|ac|||fc`". - `an` represents the implicit `MatchExpression::AND` - `[eqa, eqb]` represents the two children of the `AND`, "equals a" and "equals b". - `ac` represents an _ascending_ sort on `c`. -- `f` [indicates](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/canonical_query_encoder.cpp#L1298-L1302) `apiStrict`=_false_. -- `c` [indicates](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/canonical_query_encoder.cpp#L1305-L1336) _classic_ execution. +- `f` + [indicates](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/canonical_query_encoder.cpp#L1298-L1302) + `apiStrict`=_false_. +- `c` + [indicates](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/canonical_query_encoder.cpp#L1305-L1336) + _classic_ execution. -Note that `db.c.find({a: 1, b: 2}, {a: 0}).sort({c: 1})` has the same encoding as above, since the projection section is [left empty](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/canonical_query_encoder.cpp#L671-L675) when the entire document is required to complete the projection regardless. +Note that `db.c.find({a: 1, b: 2}, {a: 0}).sort({c: 1})` has the same encoding as above, since the +projection section is +[left empty](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/canonical_query_encoder.cpp#L671-L675) +when the entire document is required to complete the projection regardless. ### [`PlanCacheEntry`](https://github.com/mongodb/mongo/blob/0765809bf08f0c55e37ab6d7ef496568b662cc33/src/mongo/db/query/plan_cache/classic_plan_cache.h#L274) -A `PlanCacheEntry` is a wrapper for [`SolutionCacheData`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/db/query/plan_cache/classic_plan_cache.h#L223), which is a member of the winning `QuerySolution`. This `SolutionCacheData` is a "hint" that when joined with an actual query can be used to reconstruct a copy of the original `QuerySolution`. This is done by traversing `SolutionCacheData`'s internal [`PlanCacheIndexTree`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/db/query/plan_cache/classic_plan_cache.h#L242), which consists of [index tags](../plan_enumerator/README.md#i-index-tagging) that assign specific indexes to the `MatchExpression`'s leaf predicates. +A `PlanCacheEntry` is a wrapper for +[`SolutionCacheData`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/db/query/plan_cache/classic_plan_cache.h#L223), +which is a member of the winning `QuerySolution`. This `SolutionCacheData` is a "hint" that when +joined with an actual query can be used to reconstruct a copy of the original `QuerySolution`. This +is done by traversing `SolutionCacheData`'s internal +[`PlanCacheIndexTree`](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/src/mongo/db/query/plan_cache/classic_plan_cache.h#L242), +which consists of [index tags](../plan_enumerator/README.md#i-index-tagging) that assign specific +indexes to the `MatchExpression`'s leaf predicates. -For example, `SolutionCacheData` does not store index _bounds_, but it does store which indexes are best suited each leaf predicate. By combining this information with the actual query, we can reproduce the index bounds. +For example, `SolutionCacheData` does not store index _bounds_, but it does store which indexes are +best suited each leaf predicate. By combining this information with the actual query, we can +reproduce the index bounds. -See [here](https://github.com/mongodb/mongo/blob/bc7d24c035466c435ade89b62f958a6fa4e22333/src/mongo/db/query/classic_plan_cache.h#L93-L112) for more information. +See +[here](https://github.com/mongodb/mongo/blob/bc7d24c035466c435ade89b62f958a6fa4e22333/src/mongo/db/query/classic_plan_cache.h#L93-L112) +for more information. ## SBE [`PlanCache`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/sbe_plan_cache.h#L248) -Unlike the classic plan cache, the SBE plan cache exists as one instance for the entire `mongod` process, rather than on a per-collection basis; it [decorates](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/sbe_plan_cache.cpp#L50) the [`ServiceContext`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/service_context.h#L357). +Unlike the classic plan cache, the SBE plan cache exists as one instance for the entire `mongod` +process, rather than on a per-collection basis; it +[decorates](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/sbe_plan_cache.cpp#L50) +the +[`ServiceContext`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/service_context.h#L357). -This approach is beneficial to SBE's design because SBE is intended to execute queries that span multiple collections, thus avoiding the classic plan cache's restriction to one collection per instance. +This approach is beneficial to SBE's design because SBE is intended to execute queries that span +multiple collections, thus avoiding the classic plan cache's restriction to one collection per +instance. ### [`PlanCacheKey`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/sbe_plan_cache.h#L114) -The SBE Plan Cache's `PlanCacheKey`s are conceptually equivalent to the [Classic Plan Cache's](#plancachekey), but [`encodeSBE()`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/canonical_query_encoder.cpp#L1341) is used to generate `PlanCacheKey`s. +The SBE Plan Cache's `PlanCacheKey`s are conceptually equivalent to the +[Classic Plan Cache's](#plancachekey), but +[`encodeSBE()`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/canonical_query_encoder.cpp#L1341) +is used to generate `PlanCacheKey`s. For example, the same query: @@ -88,49 +123,89 @@ For example, the same query: db.c.find({a: 1, b: 2}).sort({c: 1}) ``` -after undergoing `encodeSBE()` looks like this: `an[eqa?,eqb?]|ac|||nnnnf|`. The encoding accounts for the auto-parameterization of the `MatchExpression` component, ensuring that identical expression trees have the same key. See [`encodeSBE()`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/canonical_query_encoder.cpp#L1341) for details. +after undergoing `encodeSBE()` looks like this: `an[eqa?,eqb?]|ac|||nnnnf|`. The encoding accounts +for the auto-parameterization of the `MatchExpression` component, ensuring that identical expression +trees have the same key. See +[`encodeSBE()`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/canonical_query_encoder.cpp#L1341) +for details. ### [`PlanCacheEntry`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/sbe_plan_cache.h#L233) -Unlike the Classic Plan Cache, the SBE Plan Cache directly stores [`CachedSbePlan`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/sbe_plan_cache.h#L210)s. Each `CachedSbePlan` is a wrapper for an entire `sbe::PlanStage` execution tree. See [below](#building-cached-plans) for more information. +Unlike the Classic Plan Cache, the SBE Plan Cache directly stores +[`CachedSbePlan`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/sbe_plan_cache.h#L210)s. +Each `CachedSbePlan` is a wrapper for an entire `sbe::PlanStage` execution tree. See +[below](#building-cached-plans) for more information. ## General Plan Cache Behavior ### Initial Writes -When a query is planned for the first time and a winning solution is selected, the [`ClassicPlanCacheWriter`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/exec/plan_cache_util.h#L148) [adds](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/exec/plan_cache_util.cpp#L380) the winning query plan to the plan cache for future use. When plans are initially added to the cache, they are marked as **inactive**. +When a query is planned for the first time and a winning solution is selected, the +[`ClassicPlanCacheWriter`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/exec/plan_cache_util.h#L148) +[adds](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/exec/plan_cache_util.cpp#L380) +the winning query plan to the plan cache for future use. When plans are initially added to the +cache, they are marked as **inactive**. > ### Aside: Inactive vs. Active Cache Entries > -> An _inactive_ cache entry exists but cannot be used (yet). Inactive cache entries can be [promoted to active](#updating-the-plan-cache) when a query of the same shape runs and exhibits similar or better trial period performance, as measured by the entry's number of ["works"](#aside-plan-cache-works). This behavior prevents situations where a plan cache entry is created with an unreasonably high works value. When this happens, the plan can get stuck in the cache since [replanning](#aside-replanning) will never kick in. +> An _inactive_ cache entry exists but cannot be used (yet). Inactive cache entries can be +> [promoted to active](#updating-the-plan-cache) when a query of the same shape runs and exhibits +> similar or better trial period performance, as measured by the entry's number of +> ["works"](#aside-plan-cache-works). This behavior prevents situations where a plan cache entry is +> created with an unreasonably high works value. When this happens, the plan can get stuck in the +> cache since [replanning](#aside-replanning) will never kick in. > ### Aside: Plan Cache Decision Metrics > -> The plan cache stores a [`PlanCacheDecisionMetrics`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/plan_cache.h#L112) values for each entry. They contain the amount of ["works"](../../exec/runtime_planners/classic_runtime_planner/README.md#aside-works) and "reads" associated with the plan at the time of caching. +> The plan cache stores a +> [`PlanCacheDecisionMetrics`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/plan_cache.h#L112) +> values for each entry. They contain the amount of +> ["works"](../../exec/runtime_planners/classic_runtime_planner/README.md#aside-works) and "reads" +> associated with the plan at the time of caching. > > - For Classic, this is the number of calls to `PlanStage::work()` > - For SBE, this is the number of individual _reads_ done from storage-level cursors. ### Updating the Plan Cache -If after multiplanning, a query's shape matches an existing entry, a few possible scenarios may occur, as described in [`getNewEntryState()`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/plan_cache.h#L763). +If after multiplanning, a query's shape matches an existing entry, a few possible scenarios may +occur, as described in +[`getNewEntryState()`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/plan_cache/plan_cache.h#L763). 1. If the existing entry is **inactive**: - 1. If the new winning plan's efficiency is greater than or equal to the existing entry (fewer or equal works), the plan is believed to be "valuable" and it replaces the existing entry. This new entry is given _active_ state. - 2. If the new winning plan is less efficient (more works) than the existing entry, the existing entry's "works" value is updated to `min(new works, 2 * existing works)`. The entry remains _inactive_. + 1. If the new winning plan's efficiency is greater than or equal to the existing entry (fewer or + equal works), the plan is believed to be "valuable" and it replaces the existing entry. This + new entry is given _active_ state. + 2. If the new winning plan is less efficient (more works) than the existing entry, the existing + entry's "works" value is updated to `min(new works, 2 * existing works)`. The entry remains + _inactive_. 2. If the existing entry is **active**: - 1. If the new winning plan is more efficient (fewer works), the cache is updated with a new (lower) works value and the plan from the new entry. + 1. If the new winning plan is more efficient (fewer works), the cache is updated with a new + (lower) works value and the plan from the new entry. 2. If the new winning plan is not as efficient, no change occurs. ### Using Active Entries -When a query is issued and an **active** plan cache entry already exists for the query, the cached plan cache entry is utilized and query planning is skipped altogether. See [Building Cached Plans](#building-cached-plans) for details on how cached query plans are reconstructed and executed. +When a query is issued and an **active** plan cache entry already exists for the query, the cached +plan cache entry is utilized and query planning is skipped altogether. See +[Building Cached Plans](#building-cached-plans) for details on how cached query plans are +reconstructed and executed. > ### Aside: Replanning > -> If we've decided to use a cached plan, a trial execution period is run to gather the first batch of results. If the number of works required exceeds the [`internalQueryCacheEvictionRatio`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/query_knobs.idl#L288) (10x by default), this cache entry is deactivated and replanning is triggered. +> If we've decided to use a cached plan, a trial execution period is run to gather the first batch +> of results. If the number of works required exceeds the +> [`internalQueryCacheEvictionRatio`](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/query_knobs.idl#L288) +> (10x by default), this cache entry is deactivated and replanning is triggered. > -> Replanning works by [returning a non-OK `ReplanningRequired` status](https://github.com/mongodb/mongo/blob/5f0af5b9f793b9a2e06e1d72e5553e7a7da891ed/src/mongo/db/exec/classic/cached_plan.cpp#L225-L229) from `CachedPlanStage`, carrying [`ReplanningRequiredInfo`](https://github.com/mongodb/mongo/blob/5f0af5b9f793b9a2e06e1d72e5553e7a7da891ed/src/mongo/db/query/replanning_required_info.h#L48) with the cache mode and old plan hash. This error is caught by [`retryMakePlanner`](https://github.com/mongodb/mongo/blob/5f0af5b9f793b9a2e06e1d72e5553e7a7da891ed/src/mongo/db/query/get_executor_helpers.cpp#L168), which re-runs the full query planning path, circumventing subplanning (TODO SERVER-120492: decide if this needs to be reworded). +> Replanning works by +> [returning a non-OK `ReplanningRequired` status](https://github.com/mongodb/mongo/blob/5f0af5b9f793b9a2e06e1d72e5553e7a7da891ed/src/mongo/db/exec/classic/cached_plan.cpp#L225-L229) +> from `CachedPlanStage`, carrying +> [`ReplanningRequiredInfo`](https://github.com/mongodb/mongo/blob/5f0af5b9f793b9a2e06e1d72e5553e7a7da891ed/src/mongo/db/query/replanning_required_info.h#L48) +> with the cache mode and old plan hash. This error is caught by +> [`retryMakePlanner`](https://github.com/mongodb/mongo/blob/5f0af5b9f793b9a2e06e1d72e5553e7a7da891ed/src/mongo/db/query/get_executor_helpers.cpp#L168), +> which re-runs the full query planning path, circumventing subplanning (TODO SERVER-120492: decide +> if this needs to be reworded). ```mermaid %%{ init: {'themeVariables':{'fontSize': '32px'}}}%% @@ -160,59 +235,110 @@ flowchart LR ### Classic -After a query command is converted into a `CanonicalQuery`, the plan cache is immediately consulted to confirm whether a `PlanCacheEntry` for this query already exists, using `encodeClassic()` to check if a `PlanCacheKey` is already in the cache. If an active cache entry already exists, the index tags from the `PlanCacheEntry`'s `PlanCacheIndexTree` are applied to the query's `MatchExpression` filter to [reconstruct](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/query_planner.cpp#L692) the corresponding `QuerySolution` and `PlanStage` trees, entirely bypassing plan enumeration and ranking. +After a query command is converted into a `CanonicalQuery`, the plan cache is immediately consulted +to confirm whether a `PlanCacheEntry` for this query already exists, using `encodeClassic()` to +check if a `PlanCacheKey` is already in the cache. If an active cache entry already exists, the +index tags from the `PlanCacheEntry`'s `PlanCacheIndexTree` are applied to the query's +`MatchExpression` filter to +[reconstruct](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/query_planner.cpp#L692) +the corresponding `QuerySolution` and `PlanStage` trees, entirely bypassing plan enumeration and +ranking. ### SBE Building an SBE `PlanCacheEntry` is more complex. The process is as follows: -1. Clone the original `CachedSbePlan` so that each query can have its own `sbe::PlanStage` tree, while the plan cache holds the original. +1. Clone the original `CachedSbePlan` so that each query can have its own `sbe::PlanStage` tree, + while the plan cache holds the original. 1. Compile any expressions in the execution tree to SBE bytecode. -1. Queries are [auto-parameterized](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/canonical_query.cpp#L227-L233), meaning that eligible constants in the incoming `MatchExpression` are automatically assigned input parameter ids. +1. Queries are + [auto-parameterized](https://github.com/mongodb/mongo/blob/aaef082f46f3a48a1134ae870b25da28f4d94e08/src/mongo/db/query/canonical_query.cpp#L227-L233), + meaning that eligible constants in the incoming `MatchExpression` are automatically assigned + input parameter ids. - Only the `filter` of the query is auto-parameterized. -1. Now as an `sbe::PlanStage` tree, replace any constants with references to slots in the SBE `RuntimeEnvironment`. - - The resulting plan is now parameterized; it can be rebound to new constants by assigning new values to `RuntimeEnvironment` slots via a [`inputParamToSlotMap`](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/stage_builder/sbe/builder_data.h#L229), a map from input parameter id to runtime environment slot. +1. Now as an `sbe::PlanStage` tree, replace any constants with references to slots in the SBE + `RuntimeEnvironment`. + - The resulting plan is now parameterized; it can be rebound to new constants by assigning new + values to `RuntimeEnvironment` slots via a + [`inputParamToSlotMap`](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/stage_builder/sbe/builder_data.h#L229), + a map from input parameter id to runtime environment slot. > ### Aside: `IntervalEvaluationTree` > -> The [Interval Evaluation Tree](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/interval_evaluation_tree.h#L59) (IET) is used to restore index bounds from a cached SBE plan, allowing index bounds to be evaluated dynamically from [`_inputParamIdToExpressionMap`](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/canonical_query.h#L429-L430), generated by [`MatchExpression::parameterize()`](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/matcher/expression.h#L343-L347). +> The +> [Interval Evaluation Tree](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/interval_evaluation_tree.h#L59) +> (IET) is used to restore index bounds from a cached SBE plan, allowing index bounds to be +> evaluated dynamically from +> [`_inputParamIdToExpressionMap`](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/canonical_query.h#L429-L430), +> generated by +> [`MatchExpression::parameterize()`](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/matcher/expression.h#L343-L347). > -> `translate()` has two function definitions, [one](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/index_bounds_builder.h#L118-L135) for generating an IET and [another](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/index_bounds_builder.h#L137-L145) for generating index bounds from a cached IET. +> `translate()` has two function definitions, +> [one](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/index_bounds_builder.h#L118-L135) +> for generating an IET and +> [another](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/index_bounds_builder.h#L137-L145) +> for generating index bounds from a cached IET. -See [here](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/stage_builder/sbe/builder_data.h#L217-L228) for an example. +See +[here](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/stage_builder/sbe/builder_data.h#L217-L228) +for an example. ## Cache Eviction and Invalidation -The Plan Cache supports a least-recently used (LRU) [eviction policy](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/lru_key_value.h#L76) in order to avoid growing too large. This value is controlled by [`internalQueryCacheMaxEntriesPerCollection`](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/query_knobs.idl#L258) +The Plan Cache supports a least-recently used (LRU) +[eviction policy](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/lru_key_value.h#L76) +in order to avoid growing too large. This value is controlled by +[`internalQueryCacheMaxEntriesPerCollection`](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/query_knobs.idl#L258) -Any DDL event (e.g. collection drop or index creation, deletion, or hiding) empties the plan cache for the relevant collection. In the Classic case, this is a full flush. +Any DDL event (e.g. collection drop or index creation, deletion, or hiding) empties the plan cache +for the relevant collection. In the Classic case, this is a full flush. > ### Aside: Multiplanning Storms > -> When the plan cache is fully flushed (or generally empty for a common query shape), the server is susceptible to a **multiplanning storm**. A multiplanning storm occurs when many queries of the same shape hit the server at once, all simultaneously multiplanning and using resources accordingly. Without a plan cache entry for this recurring query shape or enough time to store one query's winning plan for future use, the server may unnecessarily consume large amounts of memory. For example, if a query shape requires a multikey index scan or blocking sort and many are issued concurrently without a plan cache entry, the risk of an Out-of-Memory (OOM) error is high. See [SPM-3900](https://jira.mongodb.org/browse/SPM-3900) for mitigation efforts. +> When the plan cache is fully flushed (or generally empty for a common query shape), the server is +> susceptible to a **multiplanning storm**. A multiplanning storm occurs when many queries of the +> same shape hit the server at once, all simultaneously multiplanning and using resources +> accordingly. Without a plan cache entry for this recurring query shape or enough time to store one +> query's winning plan for future use, the server may unnecessarily consume large amounts of memory. +> For example, if a query shape requires a multikey index scan or blocking sort and many are issued +> concurrently without a plan cache entry, the risk of an Out-of-Memory (OOM) error is high. See +> [SPM-3900](https://jira.mongodb.org/browse/SPM-3900) for mitigation efforts. ## Single-Solution Queries -While the classic plan cache was designed specifically to avoid repeated multi-planning, the SBE plan cache skips all phases of query optimization and compilation. This skips the costly `QuerySolution` to a `sbe::PlanStage` conversion. +While the classic plan cache was designed specifically to avoid repeated multi-planning, the SBE +plan cache skips all phases of query optimization and compilation. This skips the costly +`QuerySolution` to a `sbe::PlanStage` conversion. -When a query using the Classic engine has only one `QuerySolution`, multiplanning does not occur, so no cache entry is created. However, in the SBE case, we store cache entries for single-solution queries to avoid re-converting to an `sbe::PlanStage` tree. +When a query using the Classic engine has only one `QuerySolution`, multiplanning does not occur, so +no cache entry is created. However, in the SBE case, we store cache entries for single-solution +queries to avoid re-converting to an `sbe::PlanStage` tree. -There is a long-standing request ([SERVER-13341](https://jira.mongodb.org/browse/SERVER-13341)) to cache single-solution queries in the Classic Plan Cache. +There is a long-standing request ([SERVER-13341](https://jira.mongodb.org/browse/SERVER-13341)) to +cache single-solution queries in the Classic Plan Cache. ## Subplanning -Rooted `$or` queries (queries that include a `$or` at the top level) interact differently with the [plan cache](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/exec/subplan.cpp#L188-L203). For an introduction to subplanning, refer to [Classic Runtime Planning](../../exec/runtime_planners/classic_runtime_planner_for_sbe/README.md#subplanner). +Rooted `$or` queries (queries that include a `$or` at the top level) interact differently with the +[plan cache](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/exec/subplan.cpp#L188-L203). +For an introduction to subplanning, refer to +[Classic Runtime Planning](../../exec/runtime_planners/classic_runtime_planner_for_sbe/README.md#subplanner). -Rooted `$or` queries interact with the plan cache on a [_per-clause basis_](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/exec/subplan.cpp#L247-L249); each branch of the `$or` uses the plan cache separately. +Rooted `$or` queries interact with the plan cache on a +[_per-clause basis_](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/exec/subplan.cpp#L247-L249); +each branch of the `$or` uses the plan cache separately. For each `$or` branch: - If there is a cache entry for this branch, recover and use it. - If there is no plan cache entry for this branch: - If there is only a single solution for this branch, use it. - - If there are multiple solutions for this branch, multiplan each branch individually. Cache the winner only if there is no tie and we return at least one document. + - If there are multiple solutions for this branch, multiplan each branch individually. Cache the + winner only if there is no tie and we return at least one document. -The restriction on when to cache a winning plan helps keep the plan cache contained to "good" plans, since [we don't replan (and evict) subplanned queries](https://jira.mongodb.org/browse/SERVER-18777). +The restriction on when to cache a winning plan helps keep the plan cache contained to "good" plans, +since +[we don't replan (and evict) subplanned queries](https://jira.mongodb.org/browse/SERVER-18777). For example, say we run these three queries, where `A`, `B`, `C`, and `D` are arbitrary predicates: @@ -222,11 +348,17 @@ db.c.find({$or: [A, B, D]}); db.c.find({$or: [B, C, D]}); ``` -Assuming all four predicates justify more than one solution, the plan cache will contain four entries, one for each predicate. +Assuming all four predicates justify more than one solution, the plan cache will contain four +entries, one for each predicate. ## Plan Pinning -The SBE plan cache supports ["plan pinning"](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/plan_cache/plan_cache.h#L183-L185), which gives pinned plans no works value and therefore does not subject them to replanning. For example, [single-solution queries](#single-solution-queries) and [rooted `$or`](#subplanning) queries are pinned. These plans can still be invalidated via [standard cache eviction](#cache-eviction-and-invalidation), however. +The SBE plan cache supports +["plan pinning"](https://github.com/mongodb/mongo/blob/17f71567688c266de1f9a4cfc20ef6a42570ba03/src/mongo/db/query/plan_cache/plan_cache.h#L183-L185), +which gives pinned plans no works value and therefore does not subject them to replanning. For +example, [single-solution queries](#single-solution-queries) and [rooted `$or`](#subplanning) +queries are pinned. These plans can still be invalidated via +[standard cache eviction](#cache-eviction-and-invalidation), however. --- diff --git a/src/mongo/db/query/plan_enumerator/README.md b/src/mongo/db/query/plan_enumerator/README.md index c56b89c0560..9271320484d 100644 --- a/src/mongo/db/query/plan_enumerator/README.md +++ b/src/mongo/db/query/plan_enumerator/README.md @@ -2,12 +2,32 @@ ## Overview -After a query is [canonicalized](../README_logical_models.md#canonicalquery) and optimized through [heuristic rewrites](../../matcher/README.md), the query planner generates multiple candidate plans, exploring various combinations of available indexes to optimize data access. The resulting physical plans, represented as a vector of `QuerySolution`s, are passed on to either the [multiplanner](../../exec/runtime_planners/classic_runtime_planner/README.md) or the cost-based ranker (link TODO SERVER-100250) to determine an efficient winning plan. +After a query is [canonicalized](../README_logical_models.md#canonicalquery) and optimized through +[heuristic rewrites](../../matcher/README.md), the query planner generates multiple candidate plans, +exploring various combinations of available indexes to optimize data access. The resulting physical +plans, represented as a vector of `QuerySolution`s, are passed on to either the +[multiplanner](../../exec/runtime_planners/classic_runtime_planner/README.md) or the cost-based +ranker (link TODO SERVER-100250) to determine an efficient winning plan. -The entrypoint to query planning is [`QueryPlanner::plan()`](https://github.com/mongodb/mongo/blob/3b45ca6c10c2a964ab7d606d4f4b04fc3d493bcc/src/mongo/db/query/query_planner.cpp#L938), which is invoked during the [process](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/get_executor.cpp#L521) of constructing a plan executor for a query. Given a `CanonicalQuery` and a list of available indices and other data in [`QueryPlannerParams`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_planner_params.h#L115), the function returns a list of possible query solutions. Broadly, planning involves two main phases: +The entrypoint to query planning is +[`QueryPlanner::plan()`](https://github.com/mongodb/mongo/blob/3b45ca6c10c2a964ab7d606d4f4b04fc3d493bcc/src/mongo/db/query/query_planner.cpp#L938), +which is invoked during the +[process](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/get_executor.cpp#L521) +of constructing a plan executor for a query. Given a `CanonicalQuery` and a list of available +indices and other data in +[`QueryPlannerParams`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_planner_params.h#L115), +the function returns a list of possible query solutions. Broadly, planning involves two main phases: -1. [**Index Tagging**](#i-index-tagging): This phase identifies the indexes that can be used to satisfy query predicates. The output is a [**rated tree**](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_planner.cpp#L1161-L1164) that contains annotations of relevant indices to indexable nodes in the query's `MatchExpression` tree. -1. [**Plan Enumeration**](#ii-plan-enumeration): Once the tree is tagged with index information, the plan enumeration phase generates a set of feasible execution plans from the power set (all possible subsets) of tagged indexes. A **data access plan** is built for each enumerated plan, with additional **coverage analysis** for sorts and projections. The output is a set of candidate query solutions. +1. [**Index Tagging**](#i-index-tagging): This phase identifies the indexes that can be used to + satisfy query predicates. The output is a + [**rated tree**](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_planner.cpp#L1161-L1164) + that contains annotations of relevant indices to indexable nodes in the query's `MatchExpression` + tree. +1. [**Plan Enumeration**](#ii-plan-enumeration): Once the tree is tagged with index information, the + plan enumeration phase generates a set of feasible execution plans from the power set (all + possible subsets) of tagged indexes. A **data access plan** is built for each enumerated plan, + with additional **coverage analysis** for sorts and projections. The output is a set of candidate + query solutions. ```mermaid graph TD @@ -45,11 +65,15 @@ graph TD K --> L ``` -Note that if no indexed plans are possible, we will fallback to a [collection scan](#collection-scan-plans). If indexed plans are present, we will always choose one of them over a collection scan. +Note that if no indexed plans are possible, we will fallback to a +[collection scan](#collection-scan-plans). If indexed plans are present, we will always choose one +of them over a collection scan. ## I. Index Tagging -The purpose of index tagging is to produce a set of indexes that can satisfy components of the query's filter, which will then be systematically combined and assigned to predicates in the plan enumeration phase. The process involves: +The purpose of index tagging is to produce a set of indexes that can satisfy components of the +query's filter, which will then be systematically combined and assigned to predicates in the plan +enumeration phase. The process involves: 1. Identifying which fields of the `MatchExpression` are satisfiable by an index. 1. Figuring out which indexes can be used to answer predicates over the indexable fields. @@ -57,94 +81,175 @@ The purpose of index tagging is to produce a set of indexes that can satisfy com ### 1. Identify indexable fields in the query -Given a `MatchExpression` tree, the [`QueryPlannerIXSelect::getFields()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L247) function recursively traverses the tree, identifies the fields that can potentially use an index, and populates a map ([`RelevantFieldIndexMap`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.h#L55)) with those fields and metadata about their indexability requirements. +Given a `MatchExpression` tree, the +[`QueryPlannerIXSelect::getFields()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L247) +function recursively traverses the tree, identifies the fields that can potentially use an index, +and populates a map +([`RelevantFieldIndexMap`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.h#L55)) +with those fields and metadata about their indexability requirements. The current node's type affects its traversal behavior in the following manner: -- If the node is a [**sargable**](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/indexability.h#L46) leaf node, i.e. it can use an index over its field path, then it's added to the map. [Metadata](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.h#L51) about whether it can use a sparse index is stored as well. +- If the node is a + [**sargable**](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/indexability.h#L46) + leaf node, i.e. it can use an index over its field path, then it's added to the map. + [Metadata](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.h#L51) + about whether it can use a sparse index is stored as well. > ### Aside: Sparse Indexes > -> A **sparse index** only contains entries for documents where the indexed field is present, even if it contains a null value. It skips over any document that's missing the indexed field. Since it doesn't contain all the documents in a collection, it is considered "sparse." On the other hand, non-sparse indexes will store null values for documents that don't contain the indexed field. +> A **sparse index** only contains entries for documents where the indexed field is present, even if +> it contains a null value. It skips over any document that's missing the indexed field. Since it +> doesn't contain all the documents in a collection, it is considered "sparse." On the other hand, +> non-sparse indexes will store null values for documents that don't contain the indexed field. > -> Sparse indexes can answer queries presuming the existence of the sparse field. For instance, the following index +> Sparse indexes can answer queries presuming the existence of the sparse field. For instance, the +> following index > > ``` > db.coll.createIndex({field: 1}, {sparse: true}); > ``` > -> will be able to answer the query `{field: {$gt: 3}}` or `{field: {$type: "array"}}`. However, it cannot be used for the query `{field: {$exists: false}}`. +> will be able to answer the query `{field: {$gt: 3}}` or `{field: {$type: "array"}}`. However, it +> cannot be used for the query `{field: {$exists: false}}`. - If we're in a logical node (e.g. `AND` or `OR`), its children are traversed. -- If the node is an [`ELEM_MATCH_OBJECT`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L260) with a non-empty path component, then its children are traversed with the prefix prepended to the path. -- If the node is a [`NOR`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L252), the function stops traversal because any children nodes are not indexable. +- If the node is an + [`ELEM_MATCH_OBJECT`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L260) + with a non-empty path component, then its children are traversed with the prefix prepended to the + path. +- If the node is a + [`NOR`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L252), + the function stops traversal because any children nodes are not indexable. ### 2. Find relevant indexes #### a. Expand indexes -First, the planner [expands](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L334) the list of `IndexEntry`s present in the collection, resolving all wildcard (`$**`) indexes that may be able to answer the query. +First, the planner +[expands](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L334) +the list of `IndexEntry`s present in the collection, resolving all wildcard (`$**`) indexes that may +be able to answer the query. > ### Aside: Wildcard Indexes > -> Since MongoDB supports flexible schemas where document field names may differ within a collection, it may not always be possible or helpful to index a specific field. **Wildcard indexes** support queries against such arbitrary or unknown fields. It can be created using the wildcard specifier (`$**`) as the index key: +> Since MongoDB supports flexible schemas where document field names may differ within a collection, +> it may not always be possible or helpful to index a specific field. **Wildcard indexes** support +> queries against such arbitrary or unknown fields. It can be created using the wildcard specifier +> (`$**`) as the index key: > > ``` > db.coll.createIndex({"$**": }); > ``` -This process will first filter out fields that aren't supported by a sparse index, because wildcard indexes are inherently sparse. For the remaining fields, the planner will build mock `IndexEntry`s by stubbing in the provided field into the wildcard field. In the case of a compound wildcard index, even if the wildcard field is irrelevant, it may be possible that the regular fields can be used to answer the query. The index is expanded for later analysis. +This process will first filter out fields that aren't supported by a sparse index, because wildcard +indexes are inherently sparse. For the remaining fields, the planner will build mock `IndexEntry`s +by stubbing in the provided field into the wildcard field. In the case of a compound wildcard index, +even if the wildcard field is irrelevant, it may be possible that the regular fields can be used to +answer the query. The index is expanded for later analysis. #### b. Filter relevant indexes -Next, the planner [filters](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L314) a list of all `IndexEntry`s in the collection and outputs only those indexes prefixed by fields we have predicates over. These are the indexes that could be useful in answering the query. Each considered index must be either (1) non-sparse or (2) able to answer a field that is supported by a sparse index. +Next, the planner +[filters](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L314) +a list of all `IndexEntry`s in the collection and outputs only those indexes prefixed by fields we +have predicates over. These are the indexes that could be useful in answering the query. Each +considered index must be either (1) non-sparse or (2) able to answer a field that is supported by a +sparse index. ### 3. Determine how useful each index is to each predicate -Finally, the planner calls the [`rateIndices()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L760) function to determine how useful each relevant index is to the predicates in the subtree rooted at 'node'. It outputs a **rated tree**, where each indexable node in the query's `MatchExpression` is affixed with [`RelevantTag`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/index_tag.h#L100)(s). +Finally, the planner calls the +[`rateIndices()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L760) +function to determine how useful each relevant index is to the predicates in the subtree rooted at +'node'. It outputs a **rated tree**, where each indexable node in the query's `MatchExpression` is +affixed with +[`RelevantTag`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/index_tag.h#L100)(s). -An index is considered useful if it is [**compatible**](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L369), which can depend on the index type, `MatchExpression` type, and other query metadata. For instance: +An index is considered useful if it is +[**compatible**](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L369), +which can depend on the index type, `MatchExpression` type, and other query metadata. For instance: - Hashed indexes can only be used with sets of equalities. -- For comparisons to collatable types (strings, arrays, objects), the query collation must match the index collation. +- For comparisons to collatable types (strings, arrays, objects), the query collation must match the + index collation. - Expression (`$expr`) language comparisons aren't indexable if the field has multikey components. - The `NOT` operator has specific restictions on when it can use an index. -Generally, compound indexes are more useful when the predicate is on the primary field because the data is grouped by the first field in the index and then by each subsequent field. However, if a compound index is not prefixed by a predicate's path, it may still be useful if there exists another predicate that (1) will use that index and (2) is connected to the original predicate by sharing an `AND` parent. Given a leaf node with the predicate `{b: 1}` and the compound index `{a: 1, b: 1}`, the predicate will still be tagged with the index if its part of a query like `{$and: [{a: 1}, {$or: [{b: 1}, {c: 1}]}]}`. +Generally, compound indexes are more useful when the predicate is on the primary field because the +data is grouped by the first field in the index and then by each subsequent field. However, if a +compound index is not prefixed by a predicate's path, it may still be useful if there exists another +predicate that (1) will use that index and (2) is connected to the original predicate by sharing an +`AND` parent. Given a leaf node with the predicate `{b: 1}` and the compound index `{a: 1, b: 1}`, +the predicate will still be tagged with the index if its part of a query like +`{$and: [{a: 1}, {$or: [{b: 1}, {c: 1}]}]}`. -Before the rated tree is passed into the plan enumeration phase, it goes through some additional post-processing to ensure a correct and minimal set of index assignments is applied to each node in the tree. The planner [removes](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L841) invalid assignments to text and geo indexes, as these can only be used to satisfy text and geo queries. It also strips invalid assignments to wildcard indexes and partial indexes if the assignment is incompatible with the index's filter expression. +Before the rated tree is passed into the plan enumeration phase, it goes through some additional +post-processing to ensure a correct and minimal set of index assignments is applied to each node in +the tree. The planner +[removes](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L841) +invalid assignments to text and geo indexes, as these can only be used to satisfy text and geo +queries. It also strips invalid assignments to wildcard indexes and partial indexes if the +assignment is incompatible with the index's filter expression. -It also applies an optimization that [removes](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L900) unnecessary index assignments, reducing time spent considering redundant or similar indexes in the plan enumeration and selection phases. For instance, imagine we have a query `{$and: [{a: 1}, {b: 1}]}`, where `a` is a unique field that doesn't contain any duplicate values. It can be satisfiable by the following indexes: +It also applies an optimization that +[removes](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.cpp#L900) +unnecessary index assignments, reducing time spent considering redundant or similar indexes in the +plan enumeration and selection phases. For instance, imagine we have a query +`{$and: [{a: 1}, {b: 1}]}`, where `a` is a unique field that doesn't contain any duplicate values. +It can be satisfiable by the following indexes: - `{a: 1}` with `{unique: true}` - `{a: 1, b: 1}` - `{a: 1, c: 1}` -Since there is a single-field unique index on `a`, we can strip all the other index assignments and just keep this one, as key-value lookups are the fastest plan and thus preferred over any more complex plan. Instead of having to generate multiple plans, the plan enumerator only needs to generate a plan containing an index scan over `a`. +Since there is a single-field unique index on `a`, we can strip all the other index assignments and +just keep this one, as key-value lookups are the fastest plan and thus preferred over any more +complex plan. Instead of having to generate multiple plans, the plan enumerator only needs to +generate a plan containing an index scan over `a`. ### 4. Special Cases -**`$elemMatch`**: -The [`ElemMatchContext`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.h#L70) is used during index selection to keep track of if any `$elemMatch` predicates were encountered when walking a `MatchExpression` tree. Special logic is required here because `$elemMatch` has different semantics regarding multikey indexes (indexes on fields containing array values). Additionally, if we're inside an `ELEM_MATCH_OBJECT`, every predicate in the current clause has an implicit prefix on the `$elemMatch` path, so we shouldn't tag indexes that don't account for the prefix. For instance, if we're evaluating the first clause of the `$or` in +**`$elemMatch`**: The +[`ElemMatchContext`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_ixselect.h#L70) +is used during index selection to keep track of if any `$elemMatch` predicates were encountered when +walking a `MatchExpression` tree. Special logic is required here because `$elemMatch` has different +semantics regarding multikey indexes (indexes on fields containing array values). Additionally, if +we're inside an `ELEM_MATCH_OBJECT`, every predicate in the current clause has an implicit prefix on +the `$elemMatch` path, so we shouldn't tag indexes that don't account for the prefix. For instance, +if we're evaluating the first clause of the `$or` in ``` {a: {$elemMatch: {$or: [{b: 1, f: 1}, ... ]}}} ``` -and we have a partial index on the clause `{b: 1, f: 1}`, we would see that the expression above is actually referring to `a.b` and `a.f`, and should thus remove the invalid index from consideration. +and we have a partial index on the clause `{b: 1, f: 1}`, we would see that the expression above is +actually referring to `a.b` and `a.f`, and should thus remove the invalid index from consideration. -**Hints**: -A user may override the default index selection and query optimization process by providing a [**hint**](https://www.mongodb.com/docs/manual/reference/method/cursor.hint/). In this case, the query planner will only consider the hinted index and skip the process above entirely. If the hint in a range query also contains valid values for `.min()` and `.max()`, we can skip index bounds building and return a single solution. +**Hints**: A user may override the default index selection and query optimization process by +providing a [**hint**](https://www.mongodb.com/docs/manual/reference/method/cursor.hint/). In this +case, the query planner will only consider the hinted index and skip the process above entirely. If +the hint in a range query also contains valid values for `.min()` and `.max()`, we can skip index +bounds building and return a single solution. -**Query Settings**: -A user may also pass in a list of allowed indices in the **indexHints** field of the **query settings**. This takes precedence if a hint is also provided. Usually, a hint can only be used for a specific query and is tied to the operation it is issued with, but the query settings can be applied to a query shape cluster-wide and are persisted after shutdown. +**Query Settings**: A user may also pass in a list of allowed indices in the **indexHints** field of +the **query settings**. This takes precedence if a hint is also provided. Usually, a hint can only +be used for a specific query and is tied to the operation it is issued with, but the query settings +can be applied to a query shape cluster-wide and are persisted after shutdown. -**Text and Geo Queries**: -Text and geo queries are unique in that they require a text and geo index, respectively, for successful execution. Each index type has special restrictions that require different index tagging logic than what is described above. For more information, refer to the docs on [text](https://www.mongodb.com/docs/manual/core/indexes/index-types/index-text/) and [geo](https://www.mongodb.com/docs/manual/core/indexes/index-types/index-geospatial/) indexes. +**Text and Geo Queries**: Text and geo queries are unique in that they require a text and geo index, +respectively, for successful execution. Each index type has special restrictions that require +different index tagging logic than what is described above. For more information, refer to the docs +on [text](https://www.mongodb.com/docs/manual/core/indexes/index-types/index-text/) and +[geo](https://www.mongodb.com/docs/manual/core/indexes/index-types/index-geospatial/) indexes. ## II. Plan Enumeration -If there are any relevant indexes at the end of the index tagging phase, the query planner tries to create indexed plans through the [`PlanEnumerator`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.h#L112). The output is a vector of `QuerySolution`s that will undergo multiplanning in order to pick one winning plan. The process is broken up into several steps: +If there are any relevant indexes at the end of the index tagging phase, the query planner tries to +create indexed plans through the +[`PlanEnumerator`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.h#L112). +The output is a vector of `QuerySolution`s that will undergo multiplanning in order to pick one +winning plan. The process is broken up into several steps: 1. Enumerate plan trees with `IndexTag`s. 1. Construct a data access plan for each tagged tree. @@ -153,7 +258,10 @@ If there are any relevant indexes at the end of the index tagging phase, the que > ### Aside: Enable Debug Logging > -> The whole plan enumeration process can be quite complex, and it can be difficult to reason about how a sample query on a collection with a set of indexes is processed. Thus, it can be helpful to turn on debug logging to visualize the outputs of the different phases (e.g. the rated tree, tagged tree, and memo). Start up a `mongod` with the following options: +> The whole plan enumeration process can be quite complex, and it can be difficult to reason about +> how a sample query on a collection with a set of indexes is processed. Thus, it can be helpful to +> turn on debug logging to visualize the outputs of the different phases (e.g. the rated tree, +> tagged tree, and memo). Start up a `mongod` with the following options: > > ``` > ./bazel-bin/install-dist-test/bin/mongod --setParameter logComponentVerbosity='{query: {verbosity: 5}}' @@ -163,11 +271,15 @@ If there are any relevant indexes at the end of the index tagging phase, the que #### a. Initialize the memo -First, the plan enumerator [initializes](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L312) the underlying **memo** structure starting from the tagged `_root` of the `MatchExpression`. +First, the plan enumerator +[initializes](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L312) +the underlying **memo** structure starting from the tagged `_root` of the `MatchExpression`. > ### Aside: Memo > -> The `_memo` structure tracks and manages reusable subplans during plan enumeration, avoiding redundant computations for parts of the query that share the same structure. It helps efficiently handle complex queries by storing and combining intermediate results. +> The `_memo` structure tracks and manages reusable subplans during plan enumeration, avoiding +> redundant computations for parts of the query that share the same structure. It helps efficiently +> handle complex queries by storing and combining intermediate results. > > **Key Components**: > @@ -176,14 +288,18 @@ First, the plan enumerator [initializes](https://github.com/mongodb/mongo/blob/e > - Defined as a `typedef` over `size_t` for readability. > - An entry in `_memo` is a `NodeAssignment`. > - `NodeAssignment` -> - Represents an association between query predicates and indexes. For a full description of the different types of `NodeAssignment`s, refer to [`enumerator_memo.h`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/enumerator_memo.h): -> - `OR` and array node assignments generally associate a `MemoID` to a `subnodes` vector of other `MemoID`s. -> - `AndAssignment` holds a `choices` vector of possible subplans represented by an `AndEnumerableState`, which itself holds a vector of `OneIndexAssignment`s. -> - This holds a single `index`, as well as a vector of `MatchExpression`s (`preds`) and a vector of `IndexPosition`s (`positions`). +> - Represents an association between query predicates and indexes. For a full description of the +> different types of `NodeAssignment`s, refer to +> [`enumerator_memo.h`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/enumerator_memo.h): +> - `OR` and array node assignments generally associate a `MemoID` to a `subnodes` vector of other +> `MemoID`s. +> - `AndAssignment` holds a `choices` vector of possible subplans represented by an +> `AndEnumerableState`, which itself holds a vector of `OneIndexAssignment`s. +> - This holds a single `index`, as well as a vector of `MatchExpression`s (`preds`) and a +> vector of `IndexPosition`s (`positions`). > - These are associated vectors: `preds[i]` uses the current index at position `positions[i]`. > -> **Example**: -> Given the following query: +> **Example**: Given the following query: > > ``` > { @@ -217,25 +333,68 @@ First, the plan enumerator [initializes](https://github.com/mongodb/mongo/blob/e > ├── Index 2: {c: 1} > ``` -The [`prepMemo()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L383) function traverses the `MatchExpression` recursively and generates the memo from it. The function returns true if the provided node uses an index, and false otherwise. A parent node may require information from its children nodes to determine whether or not it can be indexed. +The +[`prepMemo()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L383) +function traverses the `MatchExpression` recursively and generates the memo from it. The function +returns true if the provided node uses an index, and false otherwise. A parent node may require +information from its children nodes to determine whether or not it can be indexed. -For example, an `OR` requires all its children to be indexed in order to be indexed, because an `IXSCAN` on one branch doesn't provide a superset of the documents required by the other branches. An `AND`, on the other hand, only requires one of its fields to be indexed because the `IXSCAN` returns a superset of the documents required by the query. The rest of the query filter can be applied as a filter on top of a `FETCH` that's placed on top of the `IXSCAN`. If a node cannot be indexed, we immediately bail out and return to the parent node, so we don't do any additional work (e.g. in the case of an `OR` whose first child can't be indexed, we don't check to see if the remaining children can be indexed). +For example, an `OR` requires all its children to be indexed in order to be indexed, because an +`IXSCAN` on one branch doesn't provide a superset of the documents required by the other branches. +An `AND`, on the other hand, only requires one of its fields to be indexed because the `IXSCAN` +returns a superset of the documents required by the query. The rest of the query filter can be +applied as a filter on top of a `FETCH` that's placed on top of the `IXSCAN`. If a node cannot be +indexed, we immediately bail out and return to the parent node, so we don't do any additional work +(e.g. in the case of an `OR` whose first child can't be indexed, we don't check to see if the +remaining children can be indexed). -If the current node can be indexed, the memo [allocates](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L370) a `NodeAssignment` and associates it with the provided `MatchExpression`. +If the current node can be indexed, the memo +[allocates](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L370) +a `NodeAssignment` and associates it with the provided `MatchExpression`. -After filling out the memo, if the `internalQueryPlannerEnableIndexPruning` feature flag is enabled, the plan enumerator [prunes](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/memo_prune.cpp#L275) index assignments from the memo if they're interchangeable with another existing assignment. This ensures that we output a set of plans that are sufficiently distinct from one another. For example, given the query `{a: 1}`, the indexes `{a: 1}` and `{a: 1, b: 1}` are interchangeable, so only one will be kept for further consideration. +After filling out the memo, if the `internalQueryPlannerEnableIndexPruning` feature flag is enabled, +the plan enumerator +[prunes](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/memo_prune.cpp#L275) +index assignments from the memo if they're interchangeable with another existing assignment. This +ensures that we output a set of plans that are sufficiently distinct from one another. For example, +given the query `{a: 1}`, the indexes `{a: 1}` and `{a: 1, b: 1}` are interchangeable, so only one +will be kept for further consideration. #### b. Generate tagged trees -At this point, the memo is fully initialized and pruned so that the most relevant indexes remain. The plan enumerator generates successive tagged trees by calling [`getNext()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L349), which outputs a possible plan. Leaves in the plan that have a field name (not logical nodes) are tagged with an index to use. The function returns a `MatchExpression` representing a point in the query tree, or a `nullptr` if no more plans can be outputted. Plans are generated until there are no more plans left, or until we hit the [`maxIndexedSolutions`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_planner_params.h#L340) limit. +At this point, the memo is fully initialized and pruned so that the most relevant indexes remain. +The plan enumerator generates successive tagged trees by calling +[`getNext()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L349), +which outputs a possible plan. Leaves in the plan that have a field name (not logical nodes) are +tagged with an index to use. The function returns a `MatchExpression` representing a point in the +query tree, or a `nullptr` if no more plans can be outputted. Plans are generated until there are no +more plans left, or until we hit the +[`maxIndexedSolutions`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_planner_params.h#L340) +limit. -The [`tagMemo()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L1648) function traverses the memo structure and annotates the tree with `IndexTag`s for the chosen indices recursively, exploring the available indexes for each node, and for each index, exploring the available indexes for the node's children. The `IndexTag` contains information such as the index number, index type, relevant position in the index (for compound indexes), and whether or not it is safe to combine bounds for multiple leaf expressions on the same field in the index. Next, [`tagForSort()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L254) tags each node of the tree with the lowest numbered index that the subtree rooted at that node uses. +The +[`tagMemo()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L1648) +function traverses the memo structure and annotates the tree with `IndexTag`s for the chosen indices +recursively, exploring the available indexes for each node, and for each index, exploring the +available indexes for the node's children. The `IndexTag` contains information such as the index +number, index type, relevant position in the index (for compound indexes), and whether or not it is +safe to combine bounds for multiple leaf expressions on the same field in the index. Next, +[`tagForSort()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L254) +tags each node of the tree with the lowest numbered index that the subtree rooted at that node uses. -Finally, we move to the next enumeration state by calling [`nextMemo()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L1793). This step advances the enumeration state for a particular subtree of the query, attempting to move the current node (and its subtree) to the next valid state of enumeration. If the current node has exhausted all of its states, it restarts its enumeration at the beginning state and signals to its parent to advance to the next state. This ensures that valid query plans are enumerated efficiently while respecting certain constraints, such as an upper bound for `OR` enumerations. +Finally, we move to the next enumeration state by calling +[`nextMemo()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_enumerator/plan_enumerator.cpp#L1793). +This step advances the enumeration state for a particular subtree of the query, attempting to move +the current node (and its subtree) to the next valid state of enumeration. If the current node has +exhausted all of its states, it restarts its enumeration at the beginning state and signals to its +parent to advance to the next state. This ensures that valid query plans are enumerated efficiently +while respecting certain constraints, such as an upper bound for `OR` enumerations. > ### Aside: Or Pushdown > -> `OR` pushdown is an optimization that pushes down predicates in a `MatchExpression` deeper into the tree if it has a sibling `OR`, with the goal of enabling better index utilization. For example, imagine we had the following `MatchExpression` tree: +> `OR` pushdown is an optimization that pushes down predicates in a `MatchExpression` deeper into +> the tree if it has a sibling `OR`, with the goal of enabling better index utilization. For +> example, imagine we had the following `MatchExpression` tree: > > ``` > AND @@ -251,7 +410,10 @@ Finally, we move to the next enumeration state by calling [`nextMemo()`](https:/ > > Let's say the collection has an index `{a: 1}`. > -> At a first glance, it looks like we can't satisfy the query with the index because we are missing indexes on `c`, `d`, and `e`, which are part of an `OR`. But if we push down the `{a: 5}` predicate to `{c: 7}` and `{d: 8}` in the deepest `OR`, and to the `{e: 9}` in the mid-level `OR`, we can `AND`-combine the predicates. The query is now satisfiable with the given index: +> At a first glance, it looks like we can't satisfy the query with the index because we are missing +> indexes on `c`, `d`, and `e`, which are part of an `OR`. But if we push down the `{a: 5}` +> predicate to `{c: 7}` and `{d: 8}` in the deepest `OR`, and to the `{e: 9}` in the mid-level `OR`, +> we can `AND`-combine the predicates. The query is now satisfiable with the given index: > > ``` > OR @@ -274,7 +436,10 @@ Finally, we move to the next enumeration state by calling [`nextMemo()`](https:/ > ### Aside: Lockstep Or Enumeration > -> `$or` enumeration can generate an exponential number of plans, so it is usually limited at some arbitrary cutoff. To maximize our chances of enumerating better plans within this cutoff limit, the lockstep `OR` enumeration prefers plans which use the same index across all branches of a contained `$or`. Consider the following example: +> `$or` enumeration can generate an exponential number of plans, so it is usually limited at some +> arbitrary cutoff. To maximize our chances of enumerating better plans within this cutoff limit, +> the lockstep `OR` enumeration prefers plans which use the same index across all branches of a +> contained `$or`. Consider the following example: > > ``` > // Query @@ -290,36 +455,77 @@ Finally, we move to the next enumeration state by calling [`nextMemo()`](https:/ > [a_b, a_b], [a_c, a_c], [a_c, a_b], then [a_b, a_c] > ``` > -> This behavior is controlled by the `internalQueryEnumerationPreferLockstepOrEnumeration` knob, and is turned on by default. +> This behavior is controlled by the `internalQueryEnumerationPreferLockstepOrEnumeration` knob, and +> is turned on by default. ### 2. Construct data access plan for each tagged tree -For each tagged tree that's enumerated, the planner [builds](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_access.cpp#L1977) a **data access plan**. This involves mapping the logical plan into a physical plan that can be understood by the plan executor. The physical representation is encapsulated by a `QuerySolutionNode`. +For each tagged tree that's enumerated, the planner +[builds](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_access.cpp#L1977) +a **data access plan**. This involves mapping the logical plan into a physical plan that can be +understood by the plan executor. The physical representation is encapsulated by a +`QuerySolutionNode`. > ### Aside: `QuerySolutionNode` > -> A [`QuerySolutionNode`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_solution.h#L166) is an abstract representation of a query plan. It defines the hierarchy and sequence of operations needed to execute the query, such as index scan scans, collection scans, and fetches. Once constructed, this tree of `QuerySolutionNode` objects can be transcribed into a tree of [`PlanStage`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/exec/plan_stage.h#L125)s, each of which is a basic building block used in executing a compiled query plan. The `PlanStage` tree is what gets handed off to the [`PlanExecutor`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_executor_impl.cpp#L345) for execution. +> A +> [`QuerySolutionNode`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_solution.h#L166) +> is an abstract representation of a query plan. It defines the hierarchy and sequence of operations +> needed to execute the query, such as index scan scans, collection scans, and fetches. Once +> constructed, this tree of `QuerySolutionNode` objects can be transcribed into a tree of +> [`PlanStage`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/exec/plan_stage.h#L125)s, +> each of which is a basic building block used in executing a compiled query plan. The `PlanStage` +> tree is what gets handed off to the +> [`PlanExecutor`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/plan_executor_impl.cpp#L345) +> for execution. > -> For a complete list of the different types of `QuerySolutionNode`s, refer to [`query_solution.h`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_solution.h). Here are some common ones: +> For a complete list of the different types of `QuerySolutionNode`s, refer to +> [`query_solution.h`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_solution.h). +> Here are some common ones: > -> - `IndexScanNode`: Represents a scan over a particular index. It retrieves index entries within the specified index bounds, minimizing the amount of data scanned. -> - `FetchNode`: Provides additional filtering or data retrieval after an index scan. It retrieves full documents from the collection matching the `RecordId`s returned from the index scan, and applies a residual filter from the query, if present. -> - `CollectionScanNode`: Represents a full collection scan, typically used when no suitable indexes are available. Tends to be the most expensive as it requires scanning all the documents in the collection from disk. +> - `IndexScanNode`: Represents a scan over a particular index. It retrieves index entries within +> the specified index bounds, minimizing the amount of data scanned. +> - `FetchNode`: Provides additional filtering or data retrieval after an index scan. It retrieves +> full documents from the collection matching the `RecordId`s returned from the index scan, and +> applies a residual filter from the query, if present. +> - `CollectionScanNode`: Represents a full collection scan, typically used when no suitable indexes +> are available. Tends to be the most expensive as it requires scanning all the documents in the +> collection from disk. Starting from the node at `root`, the data access plan is built recursively: -- **Leaf nodes**: If a leaf node is tagged with an index, the function builds an `IndexScanNode`, generating a set of index bounds and filling out their tightness information. If the bounds are exact, the set of documents that satisfy the predicate is equivalent to the set of documents that the scan provides, and the `QuerySolutionNode` is complete. If the bounds are inexact, then the set of documents returned by the scan are a superset of the documents satisfying the predicate. A `FETCH` must be placed above the `IXSCAN` to guarantee the correct results are returned. -- **Logical operators**: Logical operators like `AND` and `OR` delegate their access planning to [`buildIndexedAnd()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_access.cpp#L1691) and [`buildIndexedOr()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_access.cpp#L1841), respectively. - - The `AND` node requires that at least one child uses an index. All non-indexed predicates are placed above the `AND` with a fetch and filter, as the index provides a superset of results. If an index intersection plan is chosen, we choose between `AndHashNode` or an `AndSortedNode` depending on whether the data is already sorted by disk location. - - The `OR` node requires that all children use an index. An `OR` cannot have any filters hanging above it, so if any children are missing indes tags, we immediately bail out of processing. There is a special case: if some children are unable to be satisfied by an index scan, it may be possible to use a `CLUSTERED_IXSCAN` plan for them. At the end, the node collapses any identical index scans across its branches into a single scan to prevent duplicate work. +- **Leaf nodes**: If a leaf node is tagged with an index, the function builds an `IndexScanNode`, + generating a set of index bounds and filling out their tightness information. If the bounds are + exact, the set of documents that satisfy the predicate is equivalent to the set of documents that + the scan provides, and the `QuerySolutionNode` is complete. If the bounds are inexact, then the + set of documents returned by the scan are a superset of the documents satisfying the predicate. A + `FETCH` must be placed above the `IXSCAN` to guarantee the correct results are returned. +- **Logical operators**: Logical operators like `AND` and `OR` delegate their access planning to + [`buildIndexedAnd()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_access.cpp#L1691) + and + [`buildIndexedOr()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_access.cpp#L1841), + respectively. + - The `AND` node requires that at least one child uses an index. All non-indexed predicates are + placed above the `AND` with a fetch and filter, as the index provides a superset of results. If + an index intersection plan is chosen, we choose between `AndHashNode` or an `AndSortedNode` + depending on whether the data is already sorted by disk location. + - The `OR` node requires that all children use an index. An `OR` cannot have any filters hanging + above it, so if any children are missing indes tags, we immediately bail out of processing. + There is a special case: if some children are unable to be satisfied by an index scan, it may be + possible to use a `CLUSTERED_IXSCAN` plan for them. At the end, the node collapses any identical + index scans across its branches into a single scan to prevent duplicate work. > ### Aside: Clustered Index Scans > -> Clustered indexes are available on clustered collections, which store indexed documents in the same WiredTiger file as the index specification. This allows documents to be retrieved directly from the index file and avoids an expensive fetch from the disk. +> Clustered indexes are available on clustered collections, which store indexed documents in the +> same WiredTiger file as the index specification. This allows documents to be retrieved directly +> from the index file and avoids an expensive fetch from the disk. > > The clustered index key must be `{_id: 1}`. -- **Array operators**: Nodes like `ELEM_MATCH_OBJECT` are processed by recursively building subplans for child expressions. The result is wrapped in a `FetchNode` because may return a superset of results. For instance, consider this example: +- **Array operators**: Nodes like `ELEM_MATCH_OBJECT` are processed by recursively building subplans + for child expressions. The result is wrapped in a `FetchNode` because may return a superset of + results. For instance, consider this example: ``` // Query @@ -338,7 +544,10 @@ Starting from the node at `root`, the data access plan is built recursively: ] ``` -The index scan on `arr.a = 10` would return documents 1, 2, and 3; the index scan on `arr.b = 20` would return documents 1, 3, and 4. This is a superset of the desired documents, so a filter is required to match only documents where `arr` contains an element where both `a = 10` and `b = 20` are true. The final result after the fetch + filter would be: +The index scan on `arr.a = 10` would return documents 1, 2, and 3; the index scan on `arr.b = 20` +would return documents 1, 3, and 4. This is a superset of the desired documents, so a filter is +required to match only documents where `arr` contains an element where both `a = 10` and `b = 20` +are true. The final result after the fetch + filter would be: ``` [ @@ -349,17 +558,31 @@ The index scan on `arr.a = 10` would return documents 1, 2, and 3; the index sca #### a. Index Bounds Building -The entrypoint to index bounds building is [`translate()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/index_bounds_builder.cpp#L359), which turns the `MatchExpression` in `expr` into a set of index bounds over a particular field. These index bounds are represented by an [`OrderedIntervalList`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/index_bounds.h#L53), which is an ordered list of intervals over one field. +The entrypoint to index bounds building is +[`translate()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/index_bounds_builder.cpp#L359), +which turns the `MatchExpression` in `expr` into a set of index bounds over a particular field. +These index bounds are represented by an +[`OrderedIntervalList`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/index_bounds.h#L53), +which is an ordered list of intervals over one field. Bounds will fall into one of the following categories: -- **Point interval**: These represent a point predicate over `[N, N]`. They represent equality matches. -- **All values interval**: These bounds span from `[MinKey, MaxKey]`, or `[MaxKey, MinKey]` if the index is sorted in the reverse order. These are non-type bracketing, meaning they scan across `BSON` type boundaries and consider all values. They are often used for `{$exists: true}` queries, and can be exact bounds if the index is sparse. -- **Range interval**: These bounds are commonly used for range predicates, and will span `start, end`. They also have `BoundsInclusion` information describing whether the bounds are inclusive or exclusive (e.g. for `GTE` vs `GT`). +- **Point interval**: These represent a point predicate over `[N, N]`. They represent equality + matches. +- **All values interval**: These bounds span from `[MinKey, MaxKey]`, or `[MaxKey, MinKey]` if the + index is sorted in the reverse order. These are non-type bracketing, meaning they scan across + `BSON` type boundaries and consider all values. They are often used for `{$exists: true}` queries, + and can be exact bounds if the index is sparse. +- **Range interval**: These bounds are commonly used for range predicates, and will span + `start, end`. They also have `BoundsInclusion` information describing whether the bounds are + inclusive or exclusive (e.g. for `GTE` vs `GT`). -Given an `OrderedIntervalList` of intervals containing index bounds, it is possible to **intersect**, **union**, and **complement** those intervals. +Given an `OrderedIntervalList` of intervals containing index bounds, it is possible to +**intersect**, **union**, and **complement** those intervals. -- **Intersect**: this compares intervals in an `AND` and merges overlaps, reducing the amount of work done during an index scan. For instance, the following query will scan a smaller range of values after its interval bounds are intersected: +- **Intersect**: this compares intervals in an `AND` and merges overlaps, reducing the amount of + work done during an index scan. For instance, the following query will scan a smaller range of + values after its interval bounds are intersected: ``` // Query @@ -372,7 +595,9 @@ OIL: [ [5, 25], [10, 30] ] OIL: [ [10, 25] ] ``` -- **Union**: this sorts and then compares intervals in an `OR`, merging overlapping or adjacent intervals. The output is a minimal set of non-overlapping intervals and removing redundancies. For example, the overlapping intervals below are merged and simplified: +- **Union**: this sorts and then compares intervals in an `OR`, merging overlapping or adjacent + intervals. The output is a minimal set of non-overlapping intervals and removing redundancies. For + example, the overlapping intervals below are merged and simplified: ``` // Query @@ -404,24 +629,47 @@ OIL: [ [3, 6) ] OIL: [ [MinKey, 3), [6, MaxKey] ] ``` -The direction of the scan will impact the bounds that are generated. The forward direction (`{field: 1}`) is the default, and would result in the intervals above. If the index is in the reverse direction (`{field: -1}`), the intervals must also scan the index in that order. For instance, the interval `[MinKey, MaxKey]` would become `[MaxKey, MinKey]`. +The direction of the scan will impact the bounds that are generated. The forward direction +(`{field: 1}`) is the default, and would result in the intervals above. If the index is in the +reverse direction (`{field: -1}`), the intervals must also scan the index in that order. For +instance, the interval `[MinKey, MaxKey]` would become `[MaxKey, MinKey]`. > ### Aside: `IntervalEvaluationTree` > -> The [Interval Evaluation Tree](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/interval_evaluation_tree.h#L59) (IET) is used to restore index bounds from a cached SBE plan, allowind index bounds to be evaluated dynamically from an `inputParamIdMap`. An `ietBuilder` can be passed into the `translate()` function, producing parameterized index bounds. For more information, refer to the Plan Cache [README](../plan_cache/README.md). +> The +> [Interval Evaluation Tree](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/interval_evaluation_tree.h#L59) +> (IET) is used to restore index bounds from a cached SBE plan, allowind index bounds to be +> evaluated dynamically from an `inputParamIdMap`. An `ietBuilder` can be passed into the +> `translate()` function, producing parameterized index bounds. For more information, refer to the +> Plan Cache [README](../plan_cache/README.md). #### b. Bounds Tightness -**Tightness** describes the degree of precision with which predicates can be evaluated based on the index bounds. It contains two axes: (1) exactness, and (2) fetch requirement. +**Tightness** describes the degree of precision with which predicates can be evaluated based on the +index bounds. It contains two axes: (1) exactness, and (2) fetch requirement. -1. **Exact** vs **inexact** refers to whether a residual predicate needs to be applied after an index scan. Given the query `{$and: [{a: 1}, {b: 1}]}` and an index on `a`, the scan will return documents where `a = 1`. To satisfy the query predicate, we still need to apply the residual `b = 1` filter to return the desired results. -1. A **fetch** is required if the query requires data which is not in the index to be satisfied. **Covered** plans are able to access all the required data from the index. For instance, a query with a filter of `{a: 1}` and an inclusion projection on `{a: 1}` can be covered by a (non-multikey) index on `{a: 1}`, because the values of `a` are stored in the index file, and that is the only field the query cares about. We don't need to fetch any additional information about the documents from disk. (A multikey index couldn't cover this query because it stores each array element separately.) +1. **Exact** vs **inexact** refers to whether a residual predicate needs to be applied after an + index scan. Given the query `{$and: [{a: 1}, {b: 1}]}` and an index on `a`, the scan will return + documents where `a = 1`. To satisfy the query predicate, we still need to apply the residual + `b = 1` filter to return the desired results. +1. A **fetch** is required if the query requires data which is not in the index to be satisfied. + **Covered** plans are able to access all the required data from the index. For instance, a query + with a filter of `{a: 1}` and an inclusion projection on `{a: 1}` can be covered by a + (non-multikey) index on `{a: 1}`, because the values of `a` are stored in the index file, and + that is the only field the query cares about. We don't need to fetch any additional information + about the documents from disk. (A multikey index couldn't cover this query because it stores each + array element separately.) -The comprehensive list of `BoundsTightness` values can be found [here](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/index_bounds_builder.h#L80). +The comprehensive list of `BoundsTightness` values can be found +[here](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/index_bounds_builder.h#L80). > ### Aside: missing and null > -> Our current index format does not distinguish between missing fields, and fields with an explicit `null` value. This means that an index scan over `{field: '[null, null]'}` is inexact for the query `{field: null}`, as it returns documents where the field is both `missing` and has an exact value equal to `null`. Thus, a fetch with a filter over `{field: null}` is required to return the correct results. This is what the simplified explain looks like: +> Our current index format does not distinguish between missing fields, and fields with an explicit +> `null` value. This means that an index scan over `{field: '[null, null]'}` is inexact for the +> query `{field: null}`, as it returns documents where the field is both `missing` and has an exact +> value equal to `null`. Thus, a fetch with a filter over `{field: null}` is required to return the +> correct results. This is what the simplified explain looks like: > > ``` > winningPlan: { @@ -437,7 +685,10 @@ The comprehensive list of `BoundsTightness` values can be found [here](https://g > } > ``` > -> Note that the opposite is true for the inverse, `{field: {$ne: null}}`. Here, the semantics of the `$ne` implicitly require that the field is present in the document. The bounds can be inverted and are exact, so a filter is not required to return the correct results. This is what the simplified explain looks like: +> Note that the opposite is true for the inverse, `{field: {$ne: null}}`. Here, the semantics of the +> `$ne` implicitly require that the field is present in the document. The bounds can be inverted and +> are exact, so a filter is not required to return the correct results. This is what the simplified +> explain looks like: > > ``` > winningPlan: { @@ -452,21 +703,43 @@ The comprehensive list of `BoundsTightness` values can be found [here](https://g > } > ``` > -> Note that sparse indexes can provide an exact scan over explicit `null` values, but not missings, because it distinguishes them by not indexing the missing values. Partial indexes can do the same thing, with even more flexibility. +> Note that sparse indexes can provide an exact scan over explicit `null` values, but not missings, +> because it distinguishes them by not indexing the missing values. Partial indexes can do the same +> thing, with even more flexibility. ### 3. Perform sort and covering analysis for the `QuerySolutionNode` -After the planner generates a `QuerySolutionNode` that defines the data access plan, the solution root is passed to [`analyzeDataAccess()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_analysis.cpp#L1340) for sort and covering analysis. The data access planning phase primarily uses the `MatchExpression` component of a query's filter to determine what indexes can be used to retrieve the required data, but it may be possible that the query's `sort` or `projection` can also be satisfied by those indexes even though they are independent of the data. Otherwise, additional stages may need to be added on top of the root to provide the sort, projection, etc. During this phase, the `QuerySolutionNode` root is wrapped into a [`QuerySolution`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_solution.h#L409), which is a container that holds the root node as well as metadata such as index usage and coverage analysis. +After the planner generates a `QuerySolutionNode` that defines the data access plan, the solution +root is passed to +[`analyzeDataAccess()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_analysis.cpp#L1340) +for sort and covering analysis. The data access planning phase primarily uses the `MatchExpression` +component of a query's filter to determine what indexes can be used to retrieve the required data, +but it may be possible that the query's `sort` or `projection` can also be satisfied by those +indexes even though they are independent of the data. Otherwise, additional stages may need to be +added on top of the root to provide the sort, projection, etc. During this phase, the +`QuerySolutionNode` root is wrapped into a +[`QuerySolution`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/query_solution.h#L409), +which is a container that holds the root node as well as metadata such as index usage and coverage +analysis. #### a. Analyze Sort -If the query contains a sort, the [`analyzeSort()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_analysis.cpp#L1249) function attempts to use the index to provide a sort to avoid falling back on an expensive in-memory blocking sort. For instance, if a traversal preference is provided, it tries to reverse the direction of the index scans to match that order. It may be the case that the indexed plan naturally provides the requested sort order, so no additional stages are required. +If the query contains a sort, the +[`analyzeSort()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_analysis.cpp#L1249) +function attempts to use the index to provide a sort to avoid falling back on an expensive in-memory +blocking sort. For instance, if a traversal preference is provided, it tries to reverse the +direction of the index scans to match that order. It may be the case that the indexed plan naturally +provides the requested sort order, so no additional stages are required. -At this point, if the plan cannot cover the sort through an index scan, it attempts an [`explodeForSort()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_analysis.cpp#L1021) optimization to "explode" index scans over point intervals to an `OR` of subscans to provide a sort. If that is still insufficient, we must add a blocking sort stage. +At this point, if the plan cannot cover the sort through an index scan, it attempts an +[`explodeForSort()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_analysis.cpp#L1021) +optimization to "explode" index scans over point intervals to an `OR` of subscans to provide a sort. +If that is still insufficient, we must add a blocking sort stage. > ### Aside: Explode for Sort Optimization > -> The "explode for sort" optimization tries to rewrite an index scan over many point intervals to an `OR` of several index scans to leverage an indexed sort. For example: +> The "explode for sort" optimization tries to rewrite an index scan over many point intervals to an +> `OR` of several index scans to leverage an indexed sort. For example: > > ``` > // Query @@ -483,25 +756,40 @@ At this point, if the plan cannot cover the sort through an index scan, it attem > Index Scan 2: a: [2, 2], b: [MinKey, MaxKey] > ``` > -> When the bounds on the primary key are on a point interval, each scan provides the sort order on the secondary key. If these scans are unioned with a merge sort instead of a hashing `OR`, the sort order provided by the scans is maintained. +> When the bounds on the primary key are on a point interval, each scan provides the sort order on +> the secondary key. If these scans are unioned with a merge sort instead of a hashing `OR`, the +> sort order provided by the scans is maintained. #### b. Analyze Projection -If the query contains a projection, the [`analyzeProjection()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_analysis.cpp#L501) function determines the most efficient way to apply a projection to the plan. If the query requires the entire document or other fields that aren't fully provided by the index scans, a `FETCH` is added if it's not present. Otherwise, we know that all the required fields are provided by the index scans. A projection may still be required if: +If the query contains a projection, the +[`analyzeProjection()`](https://github.com/mongodb/mongo/blob/e16bc2248a3410167e39d09bb9bc29a96f026ead/src/mongo/db/query/planner_analysis.cpp#L501) +function determines the most efficient way to apply a projection to the plan. If the query requires +the entire document or other fields that aren't fully provided by the index scans, a `FETCH` is +added if it's not present. Otherwise, we know that all the required fields are provided by the index +scans. A projection may still be required if: - The plan returns extra fields that must be removed. - The projection computes new values. - The projection requires reformatting data. -`ProjectionNodeDefault` will handle all these scenarios, but it is the slowest option. The planner may be able to leverage a fast path using `ProjectionNodeSimple`, which can be used for projections that operate only on top-level fields (no dotted fields), or even better, `ProjectionNodeCovered`, which can be used for plans whose index scans can cover the projection. +`ProjectionNodeDefault` will handle all these scenarios, but it is the slowest option. The planner +may be able to leverage a fast path using `ProjectionNodeSimple`, which can be used for projections +that operate only on top-level fields (no dotted fields), or even better, `ProjectionNodeCovered`, +which can be used for plans whose index scans can cover the projection. ## III. Cleanup and Refinement -After every plan is enumerated, converted into a physical `QuerySolution`, and added to the vector of all plans, the planner cleans up any index tags on the query's `MatchExpression` tree. Some final optimizations are made: +After every plan is enumerated, converted into a physical `QuerySolution`, and added to the vector +of all plans, the planner cleans up any index tags on the query's `MatchExpression` tree. Some final +optimizations are made: -- We will always use an `EOF` solution if the query is trivially false, avoiding an unbounded index scan where all fetched documents are filtered out. -- If a sort is provided, there may be an index that satisfies it, even if it wasn't over any predicates in the filter component of the query. -- Similarly, if a projection is required, there may be an index that allows for a covered plan, even if none were considered earlier. +- We will always use an `EOF` solution if the query is trivially false, avoiding an unbounded index + scan where all fetched documents are filtered out. +- If a sort is provided, there may be an index that satisfies it, even if it wasn't over any + predicates in the filter component of the query. +- Similarly, if a projection is required, there may be an index that allows for a covered plan, even + if none were considered earlier. - Distinct queries can use an index even in the absence of a filter or sort. ### Collection Scan Plans @@ -511,7 +799,8 @@ We will **only** consider a collection scan in the following cases: 1. A collection scan is explicitly requested by the user. 1. There are no indexed plans, so we fallback to a full collection scan. 1. The collection is clustered, and the clustered index is used by the query. -1. The collection is clustered, and the query contains a sort that is provided by the clustered index. +1. The collection is clustered, and the query contains a sort that is provided by the clustered + index. ```mermaid graph TD diff --git a/src/mongo/db/query/query_shape/README.md b/src/mongo/db/query/query_shape/README.md index 0c7f46918b3..f573dc6bc66 100644 --- a/src/mongo/db/query/query_shape/README.md +++ b/src/mongo/db/query/query_shape/README.md @@ -22,12 +22,12 @@ db.example.findOne({x: "string"}); While different literal _values_ result in the same shape (matching `x` for 23 vs 53), different BSON _types_ of the literal are considered distinct shapes (matching `x` for 53 vs "string"). -The concept of a query shape exists not just for the find command, but for many of the CRUD -commands (distinct, count, and aggregate). It also includes most (but not all) components of these -commands, not just the query predicate (MatchExpression). In these ways, "query" is meant more -generally. While some components included in the query shape are shared across the different types -of commands (e.g., the "hint" field), some are unique. For example, a find command would include a -`filter` while an aggregate command would have a `pipeline`. +The concept of a query shape exists not just for the find command, but for many of the CRUD commands +(distinct, count, and aggregate). It also includes most (but not all) components of these commands, +not just the query predicate (MatchExpression). In these ways, "query" is meant more generally. +While some components included in the query shape are shared across the different types of commands +(e.g., the "hint" field), some are unique. For example, a find command would include a `filter` +while an aggregate command would have a `pipeline`. You can see which components are considered part of the query shape or not for each specific shape type in their respective "shape component" classes, whose purpose is to determine which components @@ -65,11 +65,11 @@ There are 3 different serialization options: - `kToDebugTypeString`: human readable format, type string of the literal is serialized - `{x: 5, y: "hello"}` -> `{x: "?number", y: "?string"}` - `kToRepresentativeParseableValue`: literal serialized to one canonical value for given type, which - must be parseable - `{x: 5, y: "hello"}` -> `{x: 1, y: "?"}` - An example of a query which is serialized differently due to the parseable requirement is `{x: -{$regex: "^p.*"}}`. If we serialized the pattern as if it were a normal string we would end up - with `{x: {$regex: "?"}}` however `"?"` is not a valid regex pattern, so this would fail - parsing. Instead we will serialize it this way to maintain parseability, `{x: {$regex: -"\\?"}}`, since `"\\?"` is valid regex. + must be parseable - `{x: 5, y: "hello"}` -> `{x: 1, y: "?"}` - An example of a query which is + serialized differently due to the parseable requirement is `{x: {$regex: "^p.*"}}`. If we + serialized the pattern as if it were a normal string we would end up with `{x: {$regex: "?"}}` + however `"?"` is not a valid regex pattern, so this would fail parsing. Instead we will serialize + it this way to maintain parseability, `{x: {$regex: "\\?"}}`, since `"\\?"` is valid regex. See [serialization_options.h](serialization_options.h) for more details. @@ -85,4 +85,5 @@ transformed from user input. -[query-stats-disambiguation]: /src/mongo/db/query/README_query_shape_disambiguation#query-stats-vs-query-shape-which-options-go-where +[query-stats-disambiguation]: + /src/mongo/db/query/README_query_shape_disambiguation#query-stats-vs-query-shape-which-options-go-where diff --git a/src/mongo/db/query/query_stats/README.md b/src/mongo/db/query/query_stats/README.md index 17ed859353b..c8a4465c245 100644 --- a/src/mongo/db/query/query_stats/README.md +++ b/src/mongo/db/query/query_stats/README.md @@ -1,9 +1,9 @@ # Query Stats This directory is the home of the infrastructure related to recording runtime query statistics for -the database. It is not to be confused with `src/mongo/db/query/compiler/stats/` which is the home of the -logic for computing and maintaining statistics about a collection or index's data distribution - for -use by the query planner. +the database. It is not to be confused with `src/mongo/db/query/compiler/stats/` which is the home +of the logic for computing and maintaining statistics about a collection or index's data +distribution - for use by the query planner. The system will collect metrics for each query execution, and the results will be aggregated in a structure called the [`QueryStatsStore`](#querystatsstore) upon completion of each successful @@ -31,8 +31,8 @@ db.example.findOne({x: 53}); then the `QueryStatsStore` should contain an entry for a single query shape which would record 2 executions and some related statistics (see [`QueryStatsEntry`](query_stats_entry.h) for details). -For more information on query shape and the overlap here, see -[query shape disambiguation][disambiguation] and [query shape docs][query shape]. +For more information on query shape and the overlap here, see [query shape +disambiguation][disambiguation] and [query shape docs][query shape]. The query stats store has _more_ dimensions (i.e. more granularity) to group incoming queries than just the query shape. For example, these queries would all three have the same shape but the first @@ -50,36 +50,37 @@ size will be treated separately from the example which does not specify a batch #### Engineering Considerations The dimensions considered will depend on the command, but can generally be found in the -[`Key`](key.h) interface, which will generate the query stats store keys by which -we accumulate statistics. As one example, you can find the -[`FindKey`](find_key.h) which will include all the things tracked in the -`FindCmdQueryStatsStoreKeyComponents` (including `batchSize` shown in this example). +[`Key`](key.h) interface, which will generate the query stats store keys by which we accumulate +statistics. As one example, you can find the [`FindKey`](find_key.h) which will include all the +things tracked in the `FindCmdQueryStatsStoreKeyComponents` (including `batchSize` shown in this +example). ### Query Stats Store Cache Size The size of the`QueryStatsStore` can be set by the server parameter [`internalQueryStatsCacheSize`](#server-parameters), and the partitions will be created based off -that. See [`queryStatsStoreManagerRegisterer`][partition calculation comment] for more details about how -the number of partitions and their size is determined; Each partition is an LRU cache, therefore, if -adding a new entry to the partition makes it go over its size limit, the least recently used entries -will be evicted to drop below the max size. Eviction will be tracked in the new [server status -metrics](#server-status-metrics) for queryStats. +that. See [`queryStatsStoreManagerRegisterer`][partition calculation comment] for more details about +how the number of partitions and their size is determined; Each partition is an LRU cache, +therefore, if adding a new entry to the partition makes it go over its size limit, the least +recently used entries will be evicted to drop below the max size. Eviction will be tracked in the +new [server status metrics](#server-status-metrics) for queryStats. ## Metric Collection At a high level, when a query is run and collection of query stats is enabled, during planning we -call [`registerRequest`][register request] in which the query stats store key will be -generated based on the query's shape and the various other dimensions. The key will always be serialized -and stored on the `opDebug`. For commands that support `getMore`s, it will also be stored on the cursor, so that we can -continue to aggregate the operation's metrics until it is complete. +call [`registerRequest`][register request] in which the query stats store key will be generated +based on the query's shape and the various other dimensions. The key will always be serialized and +stored on the `opDebug`. For commands that support `getMore`s, it will also be stored on the cursor, +so that we can continue to aggregate the operation's metrics until it is complete. -Once the query execution is fully complete, [`writeQueryStats`][write query stats] will be called and -will either retrieve the entry for the key from the store if it exists and update it, or create a new one and add it to the store. -See more details in the [comments][write query stats comments]. +Once the query execution is fully complete, [`writeQueryStats`][write query stats] will be called +and will either retrieve the entry for the key from the store if it exists and update it, or create +a new one and add it to the store. See more details in the [comments][write query stats comments]. ### Adding New Metrics -When adding a new metric to Query Stats, follow these steps to ensure the metric is valuable, performant, and maintainable. +When adding a new metric to Query Stats, follow these steps to ensure the metric is valuable, +performant, and maintainable. #### 1. Define Clear Value Proposition @@ -87,30 +88,48 @@ Before implementing, ask yourself: - What specific insight does this metric provide that isn't already available? - Is this metric actionable for users (user observability, TSEs, and perf engineers)? -- Does this metric help diagnose performance issues, understand workload patterns, or optimize query planning? +- Does this metric help diagnose performance issues, understand workload patterns, or optimize query + planning? - Can it be derived from existing metrics through simple calculations? (If yes, reconsider.) #### 2. Define the Metric in `QueryStatsEntry` -Add the metric to the appropriate section of [`QueryStatsEntry`](query_stats_entry.h) using the appropriate `AggregatedMetric` type (e.g., `AggregatedMetric` for counters or derived values like rate). Follow the existing patterns for organization if applicable (e.g., cursor stats, query execution stats, planner stats, write stats). +Add the metric to the appropriate section of [`QueryStatsEntry`](query_stats_entry.h) using the +appropriate `AggregatedMetric` type (e.g., `AggregatedMetric` for counters or derived +values like rate). Follow the existing patterns for organization if applicable (e.g., cursor stats, +query execution stats, planner stats, write stats). #### 3. Connect new Metric all the way to `OpDebug::AdditiveMetrics` -- Add the metric to [`OpDebug::AdditiveMetrics`](../../op_debug.cpp) and `OpDebug::AdditiveMetrics::add` to capture raw values during execution. +- Add the metric to [`OpDebug::AdditiveMetrics`](../../op_debug.cpp) and + `OpDebug::AdditiveMetrics::add` to capture raw values during execution. - Add a corresponding field to [`QueryStatsSnapshot`](query_stats.h) to transport the value. -- Update [`query_stats::captureMetrics()`](query_stats.cpp) to extract and transform the metric from `OpDebug::AdditiveMetrics` into the snapshot. +- Update [`query_stats::captureMetrics()`](query_stats.cpp) to extract and transform the metric from + `OpDebug::AdditiveMetrics` into the snapshot. If you need to propagate a metric from mongod to mongos, then define: -- In the cursor response [`cursor_response.idl`](../client_cursor/cursor_response.idl), add your metric field to the `CursorMetrics` struct and let IDL codegen generate the serialization/deserialization BSON logic. Ensure the field is optional so older clients/routers can ignore it gracefully. -- Define the shard aggregation logic for the metric in [`data_bearing_node_metrics.h`](data_bearing_node_metrics.h) under `DataBearingNodeMetrics::add()` and `DataBearingNodeMetrics::aggregateCursorMetrics()`, alongside `OpDebug::AdditiveMetrics::aggregateCursorMetrics()` and `OpDebug::getCursorMetrics()`. Use these aggregation semantics: addition for totals (e.g., `docsExamined`), maximum for maximums (e.g., `maxAcquisitionDelinquency`), OR for boolean flags (e.g., `hasSortStage`), and AND for conjunctions (e.g., `fromPlanCache`). +- In the cursor response [`cursor_response.idl`](../client_cursor/cursor_response.idl), add your + metric field to the `CursorMetrics` struct and let IDL codegen generate the + serialization/deserialization BSON logic. Ensure the field is optional so older clients/routers + can ignore it gracefully. +- Define the shard aggregation logic for the metric in + [`data_bearing_node_metrics.h`](data_bearing_node_metrics.h) under `DataBearingNodeMetrics::add()` + and `DataBearingNodeMetrics::aggregateCursorMetrics()`, alongside + `OpDebug::AdditiveMetrics::aggregateCursorMetrics()` and `OpDebug::getCursorMetrics()`. Use these + aggregation semantics: addition for totals (e.g., `docsExamined`), maximum for maximums (e.g., + `maxAcquisitionDelinquency`), OR for boolean flags (e.g., `hasSortStage`), and AND for + conjunctions (e.g., `fromPlanCache`). #### 4. Instrument with the Query Lifecycle -Determine where in the query execution path the metric should be set in `OpDebug::AdditiveMetrics` by getting the `OpDebug` member of `CurOp`: +Determine where in the query execution path the metric should be set in `OpDebug::AdditiveMetrics` +by getting the `OpDebug` member of `CurOp`: -- Identify the appropriate point where the metric value is available (i.e., during planning, execution, or on cursor operations). Avoid setting it multiple times. -- Metrics collection occurs in the hot path of query execution and must add negligible CPU and memory cost. +- Identify the appropriate point where the metric value is available (i.e., during planning, + execution, or on cursor operations). Avoid setting it multiple times. +- Metrics collection occurs in the hot path of query execution and must add negligible CPU and + memory cost. #### 5. Add Regression Tests @@ -118,24 +137,29 @@ Determine where in the query execution path the metric should be set in `OpDebug - Verify aggregation correctness with edge cases (zero values, overflow, boundary conditions, etc.) - Validate metrics appear correctly in `$queryStats` output. - Test in sharded clusters to verify proper flow from shards to router and correct aggregation. -- Determine which benchmarks could be impacted (i.e., query latency, throughput, resource utilization). If the metric is recorded only once during the query's lifetime, the performance impact should be minimal and further validation may not be necessary. Ensure benchmarks are run with query stats sampling enabled. +- Determine which benchmarks could be impacted (i.e., query latency, throughput, resource + utilization). If the metric is recorded only once during the query's lifetime, the performance + impact should be minimal and further validation may not be necessary. Ensure benchmarks are run + with query stats sampling enabled. If you have any questions, feel free to contact `#query-integration-observability` Slack channel. ### Data-bearing Node Metrics Some metrics are only known to data-bearing nodes. When a query is selected for query stats -gathering in a sharded cluster, the router requests that the shards gather those metrics and -include them in cursor responses by setting the `includeQueryStatsMetrics` field to `true` in -requests it makes to the shards. The router then aggregates the metrics received from the shards -into its own query stats store. In executing such a query, the local shard may need to send further -queries to other (foreign) shards. In such cases, the local shard forwards the -`includeQueryStatsMetrics` field to the foreign shard(s). The local shard then aggregates the -metrics it receives into those it includes in its response. +gathering in a sharded cluster, the router requests that the shards gather those metrics and include +them in cursor responses by setting the `includeQueryStatsMetrics` field to `true` in requests it +makes to the shards. The router then aggregates the metrics received from the shards into its own +query stats store. In executing such a query, the local shard may need to send further queries to +other (foreign) shards. In such cases, the local shard forwards the `includeQueryStatsMetrics` field +to the foreign shard(s). The local shard then aggregates the metrics it receives into those it +includes in its response. ### Metrics Reference -The following table summarizes all query stats metrics. Some metrics computed on the router are rolled up from the shards, and some are computed locally. The "Router Notes" column clarifies how the metric is computed. +The following table summarizes all query stats metrics. Some metrics computed on the router are +rolled up from the shards, and some are computed locally. The "Router Notes" column clarifies how +the metric is computed. | Metric | Description | Router Notes | | -------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | @@ -194,66 +218,76 @@ top-level leaves room for sub-components in a nested section and avoids ambiguit boundaries. Place a metric in a **nested subsection** when it belongs to a specific functionality or component. -When naming the subsection and metric, consider: where is it collected, and which part of the -system is it meant to help debug? +When naming the subsection and metric, consider: where is it collected, and which part of the system +is it meant to help debug? ### Rate Limiting Whether or not query stats will be recorded for a specific query execution depends on a Rate -Limiter, which limits the number of recordings based on the [server parameters](#server-parameters). The goal of the rate limiter -is to minimize impact to overall system performance through restricting traffic. Our rate limiter provides two algorithms: -Window-based policy and sample-based policy. Window-based policy limits the number of recordings per second, whereas sample-based -policy limits the fraction of queries to be recorded. If a query is run but the rate limiter decides not to record it, the query -will still execute as expected but query stats will not be updated in the query stats store. See details [here](rate_limiting.h). +Limiter, which limits the number of recordings based on the [server parameters](#server-parameters). +The goal of the rate limiter is to minimize impact to overall system performance through restricting +traffic. Our rate limiter provides two algorithms: Window-based policy and sample-based policy. +Window-based policy limits the number of recordings per second, whereas sample-based policy limits +the fraction of queries to be recorded. If a query is run but the rate limiter decides not to record +it, the query will still execute as expected but query stats will not be updated in the query stats +store. See details [here](rate_limiting.h). ### Explain -Non-aggregate command types take separate paths when the command is run as an explain as opposed to when -they are not run as an explain. We do not collect query stats metrics on the explain-only paths. However, aggregate -explains run through the same path as non-explains, so query stats are collected for aggregate explains. +Non-aggregate command types take separate paths when the command is run as an explain as opposed to +when they are not run as an explain. We do not collect query stats metrics on the explain-only +paths. However, aggregate explains run through the same path as non-explains, so query stats are +collected for aggregate explains. -In the aggregate case, `explain` is not included in the query shape, so an aggregation command that has `explain: true`, -vs. the same command without it will have the same query shape. However we do want to collect separate metrics -for these as they are different, so we include `explain` as a dimension in the query stats store key if present (for agg only). +In the aggregate case, `explain` is not included in the query shape, so an aggregation command that +has `explain: true`, vs. the same command without it will have the same query shape. However we do +want to collect separate metrics for these as they are different, so we include `explain` as a +dimension in the query stats store key if present (for agg only). ### Views -Queries on views are always run as an aggregation, since the view is defined as a pipeline. Because of this, -query stats for non-aggregate commands on views would be registered and collected as aggregates without intervention. -There are two considerations here: +Queries on views are always run as an aggregation, since the view is defined as a pipeline. Because +of this, query stats for non-aggregate commands on views would be registered and collected as +aggregates without intervention. There are two considerations here: #### 1. Registering the request -We want all commands on views to be registered as the original command type rather than as an aggregate. -We do this by making sure to call `registerRequest` before the top-level command path redirects to the aggregate -path, which sets the query stats store key on `CurOp`. This will prevent it from being regenerated as an agg. +We want all commands on views to be registered as the original command type rather than as an +aggregate. We do this by making sure to call `registerRequest` before the top-level command path +redirects to the aggregate path, which sets the query stats store key on `CurOp`. This will prevent +it from being regenerated as an agg. -However, note that there are special cases even beyond this. When a query is rate-limited in the original `registerRequest` -call, or when it is being run as an explain, we will not set the query stats store key, but we still do not want -the aggregate path to register the request. To handle this case, we set the `disableForSubqueryExecution` flag on the -`OpDebug.QueryStatsInfo` struct to indicate that this request should not be registered for query stats. +However, note that there are special cases even beyond this. When a query is rate-limited in the +original `registerRequest` call, or when it is being run as an explain, we will not set the query +stats store key, but we still do not want the aggregate path to register the request. To handle this +case, we set the `disableForSubqueryExecution` flag on the `OpDebug.QueryStatsInfo` struct to +indicate that this request should not be registered for query stats. #### 2. Collecting the metrics -Regardless of where the query stats store key was generated, the aggregate path will attempt to collect metrics -for any query that has a key populated on `OpDebug`. This is acceptable in many cases, but for commands that must -do post-processing after running the view aggregation pipeline (specifically, the distinct command), this results -in incorrect metrics. These commands must take care to not pass the generated query stats store key to the aggregation -path and instead collect metrics on their own after the aggregation pipeline is complete. +Regardless of where the query stats store key was generated, the aggregate path will attempt to +collect metrics for any query that has a key populated on `OpDebug`. This is acceptable in many +cases, but for commands that must do post-processing after running the view aggregation pipeline +(specifically, the distinct command), this results in incorrect metrics. These commands must take +care to not pass the generated query stats store key to the aggregation path and instead collect +metrics on their own after the aggregation pipeline is complete. ### Change Streams -Query stats also behaves a bit differently for change stream queries. For change stream collections, like normal collections, -we will still collect query stats on creation. However, an important difference is that we will actually treat each `getMore` as its own query, -and collect and update query stats for each one rather than accumulating them on the cursor and recording once execution -completes. We have a flag to determine whether the collection has a change stream, [\_queryStatsWillNeverExhaust][query stats will never exhaust], -and decide based on that whether to take the change stream approach. +Query stats also behaves a bit differently for change stream queries. For change stream collections, +like normal collections, we will still collect query stats on creation. However, an important +difference is that we will actually treat each `getMore` as its own query, and collect and update +query stats for each one rather than accumulating them on the cursor and recording once execution +completes. We have a flag to determine whether the collection has a change stream, +[\_queryStatsWillNeverExhaust][query stats will never exhaust], and decide based on that whether to +take the change stream approach. ## Metric Retrieval To retrieve the stats gathered in the `QueryStatsStore`, there is a new aggregation stage, `$queryStats`. This stage must be the first in a pipeline and it must be run against the admin -database. The structure of the command is as follows (note `aggregate: 1` reflecting there is no collection): +database. The structure of the command is as follows (note `aggregate: 1` reflecting there is no +collection): ```js db.adminCommand({ @@ -263,10 +297,7 @@ db.adminCommand({ $queryStats: { tranformIdentifiers: { algorithm: "hmac-sha-256", - hmacKey: BinData( - 8, - "87c4082f169d3fef0eef34dc8e23458cbb457c3sf3n2", - ) /* bindata + hmacKey: BinData(8, "87c4082f169d3fef0eef34dc8e23458cbb457c3sf3n2") /* bindata subtype 8 - a new type for sensitive data */, }, }, @@ -346,8 +377,9 @@ following way: - `key`: Query Stats Key. - `keyHash`: Hash of the Query Stats Store Key representative value. Corresponds to the `key` field. -- `queryShapeHash`: Hash of the Query Shape representative value. Corresponds to the `key.queryShape` field. - This is particularly useful for cross-referencing query statistics with Persistent Query Settings. +- `queryShapeHash`: Hash of the Query Shape representative value. Corresponds to the + `key.queryShape` field. This is particularly useful for cross-referencing query statistics with + Persistent Query Settings. - `asOf`: UTC time when $queryStats read this entry from the store. This will not return the same UTC time for each result. The data structure used for the store is partitioned, and each partition will be read at a snapshot individually. You may see up to the number of partitions in unique @@ -366,56 +398,62 @@ following way: - `metrics.totalExecMicros`: Estimated time spent computing and returning all batches, which is the same as the above for single-batch queries, as well as for change streams. - `metrics.cpuNanos`: Estimated total CPU time spent by a query operation in nanoseconds. This value - should always be greater than 0 and will not be returned on platforms other than Linux, since collecting - cpu time is only supported on Linux. + should always be greater than 0 and will not be returned on platforms other than Linux, since + collecting cpu time is only supported on Linux. - `metrics.workingTimeMillis`: Various broken down statistics for the estimated time spent executing this query, excluding time spent blocked. -- `metrics.cursor.firstResponseExecMicros`: Estimated time spent computing and returning the first batch. -- `metrics.queryExec.docsReturned`: Various broken down statistics for the number of documents returned by - observation of this query. -- `metrics.queryExec.keysExamined`: Various broken down statistics for the number of index keys examined while - executing this query, including getMores. -- `metrics.queryExec.docsExamined`: Various broken down statistics for the number of documents examined while - executing this query, including getMores. -- `metrics.queryExec.bytesRead`: Various broken down statistics for the number of bytes read from disk while - executing this query, including getMores. -- `metrics.queryExec.readTimeMicros`: Various broken down statistics for the amount of time spent reading from disk - while executing this query, including getMores. -- `metrics.queryExec.delinquentAcquisitions`: Number of times that an execution ticket acquisition was overdue by - a query operation, including getMores. -- `metrics.queryExec.totalAcquisitionDelinquencyMillis`: Total time in milliseconds that an execution ticket - acquisition was overdue by a query operation, including getMores. -- `metrics.queryExec.maxAcquisitionDelinquencyMillis`: Maximum time in milliseconds that an execution ticket - acquisition was overdue by a query operation, including getMores. +- `metrics.cursor.firstResponseExecMicros`: Estimated time spent computing and returning the first + batch. +- `metrics.queryExec.docsReturned`: Various broken down statistics for the number of documents + returned by observation of this query. +- `metrics.queryExec.keysExamined`: Various broken down statistics for the number of index keys + examined while executing this query, including getMores. +- `metrics.queryExec.docsExamined`: Various broken down statistics for the number of documents + examined while executing this query, including getMores. +- `metrics.queryExec.bytesRead`: Various broken down statistics for the number of bytes read from + disk while executing this query, including getMores. +- `metrics.queryExec.readTimeMicros`: Various broken down statistics for the amount of time spent + reading from disk while executing this query, including getMores. +- `metrics.queryExec.delinquentAcquisitions`: Number of times that an execution ticket acquisition + was overdue by a query operation, including getMores. +- `metrics.queryExec.totalAcquisitionDelinquencyMillis`: Total time in milliseconds that an + execution ticket acquisition was overdue by a query operation, including getMores. +- `metrics.queryExec.maxAcquisitionDelinquencyMillis`: Maximum time in milliseconds that an + execution ticket acquisition was overdue by a query operation, including getMores. - `metrics.queryExec.totalTimeQueuedMicros`: Time spent queued for execution control. - `metrics.queryExec.totalAdmissions`: Number of admission control events. -- `metrics.queryExec.wasLoadShed`: Aggregate counts of the number of query executions that were and were not load shed. -- `metrics.queryExec.wasDeprioritized`: Aggregate counts of the number of query executions that were and were not deprioritized. -- `metrics.queryExec.wasMarkedNonDeprioritizable`: Aggregate counts of the number of query executions that were and were not marked as non-deprioritizable. -- `metrics.queryExec.numInterruptChecksPerSec`: Number of times checkForInterrupt is called per second by a - query operation, including getMores. -- `metrics.queryExec.overdueInterruptApproxMaxMillis`: Maximum time in milliseconds that checkForInterrupt was - delayed for a sampled query operation, including getMores. +- `metrics.queryExec.wasLoadShed`: Aggregate counts of the number of query executions that were and + were not load shed. +- `metrics.queryExec.wasDeprioritized`: Aggregate counts of the number of query executions that were + and were not deprioritized. +- `metrics.queryExec.wasMarkedNonDeprioritizable`: Aggregate counts of the number of query + executions that were and were not marked as non-deprioritizable. +- `metrics.queryExec.numInterruptChecksPerSec`: Number of times checkForInterrupt is called per + second by a query operation, including getMores. +- `metrics.queryExec.overdueInterruptApproxMaxMillis`: Maximum time in milliseconds that + checkForInterrupt was delayed for a sampled query operation, including getMores. - `metrics.queryExec.peakTrackedMemBytes`: Peak memory usage for the node. - `metrics.queryExec.clusterPeakTrackedMemBytes`: Peak memory usage across the cluster. -- `metrics.queryPlanner.hasSortStage`: Aggregate counts of the number of query executions that did and did not - include a sort stage, respectively. -- `metrics.queryPlanner.usedDisk`: Aggregate counts of the number of query executions that did and did not use - disk, respectively. -- `metrics.queryPlanner.fromMultiPlanner`: Aggregate counts of the number of query executions that did and did - not use the multi-planner, respectively. A query is considered to have used the multi-planner - if any internal query generated as part of its execution used the multi-planner. -- `metrics.queryPlanner.fromPlanCache`: Aggregate counts of the number of query executions that did and did - not use the plan cache, respectively. A query is considered to have not used the plan cache if - any internal query generated as part of its execution did not use the plan cache. -- `metrics.queryPlanner.planningTimeMicros`: The wall-clock time in microseconds from the moment a planning - request is received to the moment the winning plan is finalized. This metric is expected to be positive - regardless of whether the plan came from (e.g. multi-planner, cost-based ranker, plan cache). -- `metrics.queryPlanner.costBasedRanker.cardinalityEstimationMethods`: Aggregate counts of the number of times a - source of query plan cost estimate was used (e.g. sampling, heuristics). The count will be 0 if the source was not used. -- `metrics.queryPlanner.costBasedRanker.nDocsSampled`: The number of documents sampled when using cost-based ranker (CBR) - with sampling method. This metric is expected to be 0 if CBR was not used to generate the plan or another CE - method was used, like histogram. +- `metrics.queryPlanner.hasSortStage`: Aggregate counts of the number of query executions that did + and did not include a sort stage, respectively. +- `metrics.queryPlanner.usedDisk`: Aggregate counts of the number of query executions that did and + did not use disk, respectively. +- `metrics.queryPlanner.fromMultiPlanner`: Aggregate counts of the number of query executions that + did and did not use the multi-planner, respectively. A query is considered to have used the + multi-planner if any internal query generated as part of its execution used the multi-planner. +- `metrics.queryPlanner.fromPlanCache`: Aggregate counts of the number of query executions that did + and did not use the plan cache, respectively. A query is considered to have not used the plan + cache if any internal query generated as part of its execution did not use the plan cache. +- `metrics.queryPlanner.planningTimeMicros`: The wall-clock time in microseconds from the moment a + planning request is received to the moment the winning plan is finalized. This metric is expected + to be positive regardless of whether the plan came from (e.g. multi-planner, cost-based ranker, + plan cache). +- `metrics.queryPlanner.costBasedRanker.cardinalityEstimationMethods`: Aggregate counts of the + number of times a source of query plan cost estimate was used (e.g. sampling, heuristics). The + count will be 0 if the source was not used. +- `metrics.queryPlanner.costBasedRanker.nDocsSampled`: The number of documents sampled when using + cost-based ranker (CBR) with sampling method. This metric is expected to be 0 if CBR was not used + to generate the plan or another CE method was used, like histogram. - `metrics.writes`: Contains the metrics relevant to writes. - `metrics.writes.nMatched`: The number of documents selected for update. - `metrics.writes.nUpserted`: The number of documents inserted by an upsert. @@ -467,9 +505,10 @@ following way: for write commands. - `0` - Disable recording write commands. - `1` - Enable recording write commands. - - This parameter is only effective if query stats is already enabled via `internalQueryStatsRateLimit` - or `internalQueryStatsSampleRate`. - - This parameter may become a floating point value to support percentage-based sampling in the future. + - This parameter is only effective if query stats is already enabled via + `internalQueryStatsRateLimit` or `internalQueryStatsSampleRate`. + - This parameter may become a floating point value to support percentage-based sampling in the + future. - `logComponentVerbosity.queryStats`: - Controls the logging behavior for query stats. See [Logging](#logging) for details. @@ -523,9 +562,15 @@ output one document per query stats key - output in the "key" field. [disambiguation]: /src/mongo/db/query/README_query_shape_disambiguation.md [query shape]: /src/mongo/db/query/query_shape/README.md -[query stats store]: https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.h#L100-L104 -[partition calculation comment]: https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.cpp#L173-179 -[register request]: https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.h#L196-L199 -[write query stats]: https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.h#L253-L258 -[write query stats comments]: https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.h#L243-L252 -[query stats will never exhaust]: https://github.com/mongodb/mongo/blob/8be794e1983e2b24938489ad2b018b630ea9b563/src/mongo/db/clientcursor.h#L510 +[query stats store]: + https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.h#L100-L104 +[partition calculation comment]: + https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.cpp#L173-179 +[register request]: + https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.h#L196-L199 +[write query stats]: + https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.h#L253-L258 +[write query stats comments]: + https://github.com/mongodb/mongo/blob/3cc7cd2a439e25fff9dd26fb1f94057d837a06f9/src/mongo/db/query/query_stats/query_stats.h#L243-L252 +[query stats will never exhaust]: + https://github.com/mongodb/mongo/blob/8be794e1983e2b24938489ad2b018b630ea9b563/src/mongo/db/clientcursor.h#L510 diff --git a/src/mongo/db/query/query_tester/README.md b/src/mongo/db/query/query_tester/README.md index 42ec137ab3e..38f9f183870 100644 --- a/src/mongo/db/query/query_tester/README.md +++ b/src/mongo/db/query/query_tester/README.md @@ -2,17 +2,27 @@ ## Overview -**QueryTester** is a test harness designed to streamline E2E logic testing of MongoDB queries. It validates query results by executing them against a live MongoDB instance (e.g. `mongod`, `mongos`, or any system that implements the MongoDB wire protocol) and comparing the output to pre-defined expected results. +**QueryTester** is a test harness designed to streamline E2E logic testing of MongoDB queries. It +validates query results by executing them against a live MongoDB instance (e.g. `mongod`, `mongos`, +or any system that implements the MongoDB wire protocol) and comparing the output to pre-defined +expected results. -QueryTester is ideal for small, reproducible test cases that verify query behavior with minimal setup. This tool follows a focused paradigm, exclusively supporting queries and DML operations with configurable settings. Support for passthroughs may be added in the future. +QueryTester is ideal for small, reproducible test cases that verify query behavior with minimal +setup. This tool follows a focused paradigm, exclusively supporting queries and DML operations with +configurable settings. Support for passthroughs may be added in the future. -QueryTester does not currently support extensive setup or complex infrastructure, but it is designed to be extensible, with the potential to handle more complex environments in the future. The overall goal of Tester's design, however, is to validate query logic in a simple, clear, and consistent manner. +QueryTester does not currently support extensive setup or complex infrastructure, but it is designed +to be extensible, with the potential to handle more complex environments in the future. The overall +goal of Tester's design, however, is to validate query logic in a simple, clear, and consistent +manner. -Each QueryTester use case expects three files to work together: a `.test`, a `.results`, and a `.coll`. See [below](#file-types-and-formats) for templates of each. +Each QueryTester use case expects three files to work together: a `.test`, a `.results`, and a +`.coll`. See [below](#file-types-and-formats) for templates of each. ## Debugging BFs -See [Runbook: Triaging Query Correctness BFs](https://docs.google.com/document/d/1lIdwnR_pMoYEBKL8Np8X4igMqIUultegy8As_9g2R8Y/edit?tab=t.0#heading=h.13k8s02tb8j3) +See +[Runbook: Triaging Query Correctness BFs](https://docs.google.com/document/d/1lIdwnR_pMoYEBKL8Np8X4igMqIUultegy8As_9g2R8Y/edit?tab=t.0#heading=h.13k8s02tb8j3) ## Getting Started @@ -26,7 +36,8 @@ bazel build install-mongotest The tester expects a mongod/mongos to be running, and will execute tests against that process. -To run a single test for the first time, try using the following command from the root of the mongo repo: +To run a single test for the first time, try using the following command from the root of the mongo +repo: ```sh mongotest -t /tests/manual_tests/example/testA.test --drop --load --mode compare @@ -61,14 +72,12 @@ To perform other operations, consult the table below. ### .test -See `tests/manual_tests/example/testA.test`. The file format is as follows: -First line must be the testName (matching the filename without the extension). -Second line is the database to run the test against. -Third line is a list of collection files to load. All collections are expected to be in a `collections` directory somewhere along the path to the test file. -Fourth line is a newline noting the end of the header. -After the header, each line is a test line: -` {commandToRun}` -with each test line being followed by a newline. +See `tests/manual_tests/example/testA.test`. The file format is as follows: First line must be the +testName (matching the filename without the extension). Second line is the database to run the test +against. Third line is a list of collection files to load. All collections are expected to be in a +`collections` directory somewhere along the path to the test file. Fourth line is a newline noting +the end of the header. After the header, each line is a test line: ` {commandToRun}` with +each test line being followed by a newline. The template is as follows: @@ -100,8 +109,10 @@ The template is as follows: ### .results -See `tests/manual_tests/example/testA.results`. -These have the same format as .test files above, with the exception that each test must be followed by a line with the expected documents. These are allowed to be on multiple lines, and a result array is read as the line after the test until the next newline. +See `tests/manual_tests/example/testA.results`. These have the same format as .test files above, +with the exception that each test must be followed by a line with the expected documents. These are +allowed to be on multiple lines, and a result array is read as the line after the test until the +next newline. The template is as follows: @@ -119,13 +130,14 @@ The template is as follows: ... further tests and results ``` -Some files have a `.queryShapeHash.results` extension. These are the expected results for the queryShapeHash test type, and they only contain the queryShapeHash, as in the `tests/manual_tests/example/testQueryShapeHash.queryShapeHash.results` example. +Some files have a `.queryShapeHash.results` extension. These are the expected results for the +queryShapeHash test type, and they only contain the queryShapeHash, as in the +`tests/manual_tests/example/testQueryShapeHash.queryShapeHash.results` example. ### .coll -See `tests/manual_tests/example/basic.coll`. -These files are split into two sections divided by an empty line. -Above the empty line are index definitions, one per line. They can be of the form: +See `tests/manual_tests/example/basic.coll`. These files are split into two sections divided by an +empty line. Above the empty line are index definitions, one per line. They can be of the form: 1. `{}`, or 2. `{key: }`, or @@ -150,9 +162,8 @@ The template is as follows: ### Comments Whole-line inline comments can be added in any `.test`, `.results`, and `.coll` file by starting the -line with `//`. -Partial-line comments, such as `foo // comment`, are not supported, and the line will be read in its -entirety. +line with `//`. Partial-line comments, such as `foo // comment`, are not supported, and the line +will be read in its entirety. Comments in input `.test` and `.results` files will be persisted in the output as much as possible. diff --git a/src/mongo/db/query/search/README.md b/src/mongo/db/query/search/README.md index 937275d0adb..e053bb3c4cc 100644 --- a/src/mongo/db/query/search/README.md +++ b/src/mongo/db/query/search/README.md @@ -1,25 +1,45 @@ # Welcome to search - we're glad you're here! -This README serves as a landing page for all search\* related documentation to make these resources easier to discover for new engineers. As you contribute new documentation for search features, please don't forget to link it below and kindly add a reverse link to this README in your new doc. Happy knowledge sharing! +This README serves as a landing page for all search\* related documentation to make these resources +easier to discover for new engineers. As you contribute new documentation for search features, +please don't forget to link it below and kindly add a reverse link to this README in your new doc. +Happy knowledge sharing! -> [!NOTE] -> For the purposes of this README, search is a shorthand to refer to all mongot pipeline stages, so $search/$vectorSearch/$searchMeta. +> [!NOTE] For the purposes of this README, search is a shorthand to refer to all mongot pipeline +> stages, so $search/$vectorSearch/$searchMeta. ## Technical Details -To read about the high-level technical implementation of $search and $searchMeta, please check out [search_technical_overview.md](/src/mongo/db/query/search/search_technical_overview.md). +To read about the high-level technical implementation of $search and $searchMeta, please check out +[search_technical_overview.md](/src/mongo/db/query/search/search_technical_overview.md). -To read about a high-level overview of $vectorSearch, please check out [vectorSearch_technical_overview.md](/src/mongo/db/pipeline/search/vectorSearch_technical_overview.md) +To read about a high-level overview of $vectorSearch, please check out +[vectorSearch_technical_overview.md](/src/mongo/db/pipeline/search/vectorSearch_technical_overview.md) ## Testing -Testing search features is a bit different than other aggregation stages! You will fall into one of three categories: +Testing search features is a bit different than other aggregation stages! You will fall into one of +three categories: -1. If your search feature requires currently in-progress/incomplete changes to 10gen/mongot, you will need to test your mongod changes with mongot-mock. To find out more about writing a jstest that uses mongot-mock, please follow this [wiki](https://wiki.corp.mongodb.com/display/~zixuan.zhuang@mongodb.com/How+to+run+%24search+locally+using+Mongot+Mock). You can peruse more examples in [jstests/with_mongot/mongotmock/](/jstests/with_mongot/mongotmock/). +1. If your search feature requires currently in-progress/incomplete changes to 10gen/mongot, you + will need to test your mongod changes with mongot-mock. To find out more about writing a jstest + that uses mongot-mock, please follow this + [wiki](https://wiki.corp.mongodb.com/display/~zixuan.zhuang@mongodb.com/How+to+run+%24search+locally+using+Mongot+Mock). + You can peruse more examples in + [jstests/with_mongot/mongotmock/](/jstests/with_mongot/mongotmock/). -2. If your search feature is fully supported on the release mongot binary (eg the mongot required changes have been merged to 10gen/mongot and it's been released), you can write a standard jstest for your search feature. However you will need a mongot binary on your machine to run your jstest locally. To learn how to acquire a mongot binary locally and get more details on end-to-end testing search features locally and on evergreen, please checkout [mongot_testing_instructions.md](/jstests/with_mongot/e2e/mongot_testing_instructions.md). +2. If your search feature is fully supported on the release mongot binary (eg the mongot required + changes have been merged to 10gen/mongot and it's been released), you can write a standard jstest + for your search feature. However you will need a mongot binary on your machine to run your jstest + locally. To learn how to acquire a mongot binary locally and get more details on end-to-end + testing search features locally and on evergreen, please checkout + [mongot_testing_instructions.md](/jstests/with_mongot/e2e/mongot_testing_instructions.md). -3. If your search feature is supported on the latest mongot binary (from 10gen/mongot) but not on the released version, you can use tags to ensure your test only runs on the appropriate version. Mongot versions follow the format X.Y.Z, where X is the API version, Y updates for functionality changes, and Z updates for bug fixes. You can use the following tag formats to disallow tests from running on the released version of mongot. +3. If your search feature is supported on the latest mongot binary (from 10gen/mongot) but not on + the released version, you can use tags to ensure your test only runs on the appropriate version. + Mongot versions follow the format X.Y.Z, where X is the API version, Y updates for functionality + changes, and Z updates for bug fixes. You can use the following tag formats to disallow tests + from running on the released version of mongot. - `requires_mongot_X` for a future API version - `requires_mongot_X_Y` for feature updates @@ -27,28 +47,50 @@ Testing search features is a bit different than other aggregation stages! You wi For example: - - If the latest version is `1.39.0-67-g935bf894a` and the release version is `1.38.1`, use `requires_mongot_1_39` as the tag for your test. - - If the latest version is `1.39.3` and the release version is `1.39.2`, use `requires_mongot_1_39_3` as the tag for your test. + - If the latest version is `1.39.0-67-g935bf894a` and the release version is `1.38.1`, use + `requires_mongot_1_39` as the tag for your test. + - If the latest version is `1.39.3` and the release version is `1.39.2`, use + `requires_mongot_1_39_3` as the tag for your test. - Once the released version matches or exceeds the tag on the test, the test will also run for the released version. + Once the released version matches or exceeds the tag on the test, the test will also run for the + released version. -Regardless of the category you find yourself in, you are required to run all e2e tests defined on both 10gen/mongod and 10gen/mongot repos. To learn how to run all cross-repo e2e tests, please check out [jstests/with_mongot/cross_repo_testing_requirements.md](/jstests/with_mongot/cross_repo_testing_requirements.md). +Regardless of the category you find yourself in, you are required to run all e2e tests defined on +both 10gen/mongod and 10gen/mongot repos. To learn how to run all cross-repo e2e tests, please check +out +[jstests/with_mongot/cross_repo_testing_requirements.md](/jstests/with_mongot/cross_repo_testing_requirements.md). ## Hybrid Search -Hybrid Search encompasses two possible stages: `$rankFusion` and `$scoreFusion`. Both of these stages accept one or more "input pipelines" (each of which is its own valid query) that search for documents in a single collection (without modifying them). Then, the hybrid search stage combines the results from all the input pipelines into a single ordered results set, based on some ranking or scoring methodology that factors in the user-set weight (influence) of each input pipeline. $rankFusion uses the Reciprocal Rank Fusion algorithm while $scoreFusion relies on the user's custom score combination configuration/logic. See [this docs page](https://dochub.mongodb.org/core/rank-fusion) to learn more about hybrid search. +Hybrid Search encompasses two possible stages: `$rankFusion` and `$scoreFusion`. Both of these +stages accept one or more "input pipelines" (each of which is its own valid query) that search for +documents in a single collection (without modifying them). Then, the hybrid search stage combines +the results from all the input pipelines into a single ordered results set, based on some ranking or +scoring methodology that factors in the user-set weight (influence) of each input pipeline. +$rankFusion uses the Reciprocal Rank Fusion algorithm while $scoreFusion relies on the user's custom +score combination configuration/logic. See +[this docs page](https://dochub.mongodb.org/core/rank-fusion) to learn more about hybrid search. -> [!NOTE] -> Hybrid Search stages with and without mongot input pipelines can run on views but not in view definitions. For more information about how mongot stages run on views, see this page. +> [!NOTE] Hybrid Search stages with and without mongot input pipelines can run on views but not in +> view definitions. For more information about how mongot stages run on views, see this page. ### scoreDetails Technical Overview -The hybrid search stages ($rankFusion and $scoreFusion) allow a user to specify whether that stage's scoreDetails metadata should be set (note that the metadata is set at the document level). The two other stages that also support scoreDetails functionality are $score and $search. See the Phase 3 section below to understand the difference in scoreDetails structure between the hybrid search stages and non-hybrid search stages ($search and $score). scoreDetails, at a high level, functions like an $explain in that it provides information about how each document's score was calculated under the hood. This in turn helps the user understand the resulting order of documents. To learn more about the scoreDetails field and its subfields for hybrid search stages, please refer to either the [$rankFusion](https://dochub.mongodb.org/core/$rankFusion) or [$scoreFusion](https://dochub.mongodb.org/core/$scoreFusion) docs page. +The hybrid search stages +($rankFusion and $scoreFusion) allow a user to specify whether that stage's scoreDetails metadata should be set (note that the metadata is set at the document level). The two other stages that also support scoreDetails functionality are $score and $search. See the Phase 3 section below to understand the difference in scoreDetails structure between the hybrid search stages and non-hybrid search stages ($search +and +$score). scoreDetails, at a high level, functions like an $explain in that it provides information about how each document's score was calculated under the hood. This in turn helps the user understand the resulting order of documents. To learn more about the scoreDetails field and its subfields for hybrid search stages, please refer to either the [$rankFusion](https://dochub.mongodb.org/core/$rankFusion) +or [$scoreFusion](https://dochub.mongodb.org/core/$scoreFusion) docs page. The scoreDetails field is built up in 4 phases. -1. **Phase 1**: Add scoreDetails to each input pipeline's set of resulting documents. - These scoreDetails will be added as a document field with the name **``.``\_scoreDetails** (ex: for a pipeline called **_searchPipe_**, the added field’s name would be **<`INTERNAL_FIELDS`>._searchPipe_\_scoreDetails**) where **`INTERNAL_FIELDS`** is the hybrid search stage's internal fields name. The input pipeline's scoreDetails field will be a BSONObj that generates one of the following values for scoreDetails: +1. **Phase 1**: Add scoreDetails to each input pipeline's set of resulting documents. These + scoreDetails will be added as a document field with the name + **``.``\_scoreDetails** (ex: for a pipeline called + **_searchPipe_**, the added field’s name would be + **<`INTERNAL_FIELDS`>._searchPipe_\_scoreDetails**) where **`INTERNAL_FIELDS`** is the hybrid + search stage's internal fields name. The input pipeline's scoreDetails field will be a BSONObj + that generates one of the following values for scoreDetails: | Value | Hybrid Search Stage | Description | | ---------------------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -56,15 +98,30 @@ The scoreDetails field is built up in 4 phases. | `{value: {$meta: "score"}, details: []}` | $rankFusion | If the input pipeline generates score metadata, but not scoreDetails, then the incoming score is set, and the details array is set to empty. Note that all $scoreFusion input pipelines generate score metadata so each input pipeline's score value will always be [saved under a field called `inputPipelineRawScore`](https://github.com/mongodb/mongo/blob/e781072e0060950728580cda91fa2eb9a6c67ca5/src/mongo/db/pipeline/document_source_score_fusion.cpp#L834-L836) | | `{details: []}` | $rankFusion, $scoreFusion | If the input pipeline generates no scoreDetails metadata (and no score metadata in the case of $rankFusion) | - See [$rankFusion's addInputPipelineScoreDetails function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/rank_fusion_pipeline_builder.cpp#L154) and [$scoreFusion's addInputPipelineScoreDetails function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/score_fusion_pipeline_builder.cpp#L235) for the exact implementations. + See + [$rankFusion's addInputPipelineScoreDetails function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/rank_fusion_pipeline_builder.cpp#L154) + and + [$scoreFusion's addInputPipelineScoreDetails function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/score_fusion_pipeline_builder.cpp#L235) + for the exact implementations. -2. **Phase 2**: Combine all results into a total ranked ($rankFusion) or scored ($scoreFusion) set now that all input pipelines have executed. - This consists of grouping the newly added scoreDetails fields (remember that there’s 1 for each input pipeline) across all the documents. The grouping is needed because after processing the N input pipelines, there can be up to N repeats of the same document. Each document will have its own fields and any added fields for that pipeline (ex: the **<`INTERNAL_FIELDS`>._searchPipe_\_scoreDetails** is an example of an added field specific to the **_searchPipe_** pipeline. Only 1 of the N repeated documents will have this field.) The result of this step is that each unique document should have a document field named **``.``\_scoreDetails**, for each input pipeline this document appeared in. +2. **Phase 2**: Combine all results into a total ranked ($rankFusion) or scored ($scoreFusion) set + now that all input pipelines have executed. This consists of grouping the newly added + scoreDetails fields (remember that there’s 1 for each input pipeline) across all the documents. + The grouping is needed because after processing the N input pipelines, there can be up to N + repeats of the same document. Each document will have its own fields and any added fields for + that pipeline (ex: the **<`INTERNAL_FIELDS`>._searchPipe_\_scoreDetails** is an example of an + added field specific to the **_searchPipe_** pipeline. Only 1 of the N repeated documents will + have this field.) The result of this step is that each unique document should have a document + field named **``.``\_scoreDetails**, for each input + pipeline this document appeared in. - See the [groupDocsByIdAcrossInputPipeline function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/hybrid_search_pipeline_builder.cpp#L112) for the exact implementation. + See the + [groupDocsByIdAcrossInputPipeline function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/hybrid_search_pipeline_builder.cpp#L112) + for the exact implementation. -3. **Phase 3** - The third phase calculates a new field called `calculatedScoreDetails` per document that combines all the input pipeline's scoreDetails into an array, with one scoreDetails entry per input pipeline. Each entry in the array contains the following scoreDetails subfields: +3. **Phase 3** The third phase calculates a new field called `calculatedScoreDetails` per document + that combines all the input pipeline's scoreDetails into an array, with one scoreDetails entry + per input pipeline. Each entry in the array contains the following scoreDetails subfields: | $rankFusion | $scoreFusion | Optional | Description | | ------------------- | ----------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -75,7 +132,9 @@ The scoreDetails field is built up in 4 phases. | `description` | `description` | Yes | If the input pipeline generates a `description` as part of `scoreDetails`, then `description` contains that value. | | `details` | `details` | No | `details` is either empty `[]` or contains the value of the input pipeline's `scoreDetails` assuming `scoreDetails: true` for that input pipeline (ex: $search or $score). | - The non-hybrid search stages ($search and $score) set the scoreDetails fields (`value`, `description`, and `details`) at a minimum. These fields and their values will be found under the `details` field for the given input pipeline, assuming $search/$score had scoreDetails enabled (`scoreDetails: true`). + The non-hybrid search stages + ($search and $score) set the scoreDetails fields (`value`, `description`, and `details`) at a minimum. These fields and their values will be found under the `details` field for the given input pipeline, assuming $search/$score + had scoreDetails enabled (`scoreDetails: true`). **Visual Structure of a Full scoreDetails with a $search Input Pipeline:** @@ -111,10 +170,12 @@ The scoreDetails field is built up in 4 phases. } ``` - See the [constructCalculatedFinalScoreDetails function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/hybrid_search_pipeline_builder.cpp#L71) for the exact implementation. + See the + [constructCalculatedFinalScoreDetails function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/hybrid_search_pipeline_builder.cpp#L71) + for the exact implementation. -4. **Phase 4** - The fourth phase simply sets the `scoreDetails` metadata which represents the final scoreDetails value that gets returned to the user in the final results. +4. **Phase 4** The fourth phase simply sets the `scoreDetails` metadata which represents the final + scoreDetails value that gets returned to the user in the final results. | $rankFusion | $scoreFusion | Description | | ------------- | ----------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -124,4 +185,8 @@ The scoreDetails field is built up in 4 phases. | | `combination: {method: "avg"}` OR `combination: {method: "custom expression", expression: {$const: "{ string: {...}}}}` | $scoreFusion: If the combination method is _"avg"_, indicate that. If a custom combination expression was specified, indicate that and output the stringified expression. | | `details` | `details` | The fully assembled calculatedScoreDetails from the previous step. | - See [$rankFusion's buildScoreAndMergeStages function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/rank_fusion_pipeline_builder.cpp#L453) and [$scoreFusion's buildScoreAndMergeStages function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/score_fusion_pipeline_builder.cpp#L550) for the exact implementations. + See + [$rankFusion's buildScoreAndMergeStages function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/rank_fusion_pipeline_builder.cpp#L453) + and + [$scoreFusion's buildScoreAndMergeStages function](https://github.com/mongodb/mongo/blob/93f58ecf93aa7d275f1e10a69154b846a2a907f7/src/mongo/db/pipeline/score_fusion_pipeline_builder.cpp#L550) + for the exact implementations. diff --git a/src/mongo/db/query/search/mongot_queries_on_views.md b/src/mongo/db/query/search/mongot_queries_on_views.md index 848ff4304e2..04ed6229ce8 100644 --- a/src/mongo/db/query/search/mongot_queries_on_views.md +++ b/src/mongo/db/query/search/mongot_queries_on_views.md @@ -2,46 +2,92 @@ ##### Definitions -- User pipeline: The aggregation pipeline provided by the user (e.g. `coll.aggregate()`). -- View pipeline/definition: TThe pipeline used to define the view, specifying the transformations on the underlying collection (e.g. `db.createView("viewName", "underlyingCollection", )`). -- Request pipeline: The pipeline the server ultimately executes, which may be a modified version of the user pipeline due to optimizations or view logic. +- User pipeline: The aggregation pipeline provided by the user (e.g. + `coll.aggregate()`). +- View pipeline/definition: TThe pipeline used to define the view, specifying the transformations on + the underlying collection (e.g. + `db.createView("viewName", "underlyingCollection", )`). +- Request pipeline: The pipeline the server ultimately executes, which may be a modified version of + the user pipeline due to optimizations or view logic. - Search stage: A `$search`, `$vectorSearch`, or `$searchMeta` stage. -- Effective pipeline: The full pipeline applied to a collection to generate the result set, which includes the resolved view pipeline. +- Effective pipeline: The full pipeline applied to a collection to generate the result set, which + includes the resolved view pipeline. ## Overview -Search queries on views operate differently from standard view queries. Normally, a query on a view simply prepends the view's pipeline to the user's pipeline. This approach doesn't work for search queries, because a search aggregation must begin with two internal stages: `$_internalSearchMongotRemote` and `$_internalSearchIdLookup`. The requirement for the view pipeline to be at the start of the aggregation is therefore in conflict with the requirements of a search query. +Search queries on views operate differently from standard view queries. Normally, a query on a view +simply prepends the view's pipeline to the user's pipeline. This approach doesn't work for search +queries, because a search aggregation must begin with two internal stages: +`$_internalSearchMongotRemote` and `$_internalSearchIdLookup`. The requirement for the view pipeline +to be at the start of the aggregation is therefore in conflict with the requirements of a search +query. -To resolve this, the `$_internalSearchIdLookup` stage applies the view's transformations within its own sub-pipeline. This means the view is applied after the `$_internalSearchMongotRemote` stage but before the rest of the user's pipeline. While this technically violates the rule that a view pipeline must come first, it is permitted because `$_internalSearchMongotRemote` does not modify documents; it only retrieves document IDs from `mongot`. +To resolve this, the `$_internalSearchIdLookup` stage applies the view's transformations within its +own sub-pipeline. This means the view is applied after the `$_internalSearchMongotRemote` stage but +before the rest of the user's pipeline. While this technically violates the rule that a view +pipeline must come first, it is permitted because `$_internalSearchMongotRemote` does not modify +documents; it only retrieves document IDs from `mongot`. -In summary, `$_internalSearchIdLookup` takes unmodified documents from the `_id` values returned by `$_internalSearchMongotRemote`, applies the view's data transforms, and passes said transformed documents through the rest of the user pipeline. +In summary, `$_internalSearchIdLookup` takes unmodified documents from the `_id` values returned by +`$_internalSearchMongotRemote`, applies the view's data transforms, and passes said transformed +documents through the rest of the user pipeline. ## Technical Details ### Non-sharded procedure -1. `mongod` receives a query on a view namespace. Since `mongod` has the views' catalog in single node environments, it resolves the view and retrieves the `effectivePipeline` needed to apply the view (see `runAggregateOnView()`). +1. `mongod` receives a query on a view namespace. Since `mongod` has the views' catalog in single + node environments, it resolves the view and retrieves the `effectivePipeline` needed to apply the + view (see `runAggregateOnView()`). 2. `mongod` recursively calls `_runAggregate()` on the resolved view. -3. In `parsePipelineAndRegisterQueryStats()`, we call `search_helpers::checkAndSetViewOnExpCtx()` before parsing the raw BSON obj vector into a `Pipeline`. As implied, this function sets `view` on `expCtx`, making the aggregation context aware that it is operating on a view. -4. When `parsePipelineAndRegisterQueryStats()` parses the raw BSON obj vector, `createFromBson()` is called on every stage. `createFromBson()` in each of the search stages will first check if there's a view specified on the `DocumentSource`'s spec (there won't be for non-sharded) and calls `search_helpers::getViewFromExpCtx()` to retrieve the view if not. Note that for search queries on _collections_, step 3 will not set the view on `expCtx` (as the aggregation is on a collection, not a view) and therefore `search_helpers::getViewFromExpCtx()` will return `boost::none`. This view is set on the `DocumentSource`'s spec for later use. Additionally, the call to `ResolvedViewAggExState::handleViewHelper()` from `parsePipelineAndRegisterQueryStats()` will skip the step of stitching the view pipeline due to the search stage located within the user pipeline. -5. When the search stage is desugared, the view is passed to `DocumentSourceInternalSearchIdLookup` to be applied. As for `DocumentSourceInternalMongotRemote`, the spec passed to the constructor will later be used in `mongot_cursor::getRemoteCommandRequestForSearchQuery()` when establishing the cursor and sending a request to `mongot`. +3. In `parsePipelineAndRegisterQueryStats()`, we call `search_helpers::checkAndSetViewOnExpCtx()` + before parsing the raw BSON obj vector into a `Pipeline`. As implied, this function sets `view` + on `expCtx`, making the aggregation context aware that it is operating on a view. +4. When `parsePipelineAndRegisterQueryStats()` parses the raw BSON obj vector, `createFromBson()` is + called on every stage. `createFromBson()` in each of the search stages will first check if + there's a view specified on the `DocumentSource`'s spec (there won't be for non-sharded) and + calls `search_helpers::getViewFromExpCtx()` to retrieve the view if not. Note that for search + queries on _collections_, step 3 will not set the view on `expCtx` (as the aggregation is on a + collection, not a view) and therefore `search_helpers::getViewFromExpCtx()` will return + `boost::none`. This view is set on the `DocumentSource`'s spec for later use. Additionally, the + call to `ResolvedViewAggExState::handleViewHelper()` from `parsePipelineAndRegisterQueryStats()` + will skip the step of stitching the view pipeline due to the search stage located within the user + pipeline. +5. When the search stage is desugared, the view is passed to `DocumentSourceInternalSearchIdLookup` + to be applied. As for `DocumentSourceInternalMongotRemote`, the spec passed to the constructor + will later be used in `mongot_cursor::getRemoteCommandRequestForSearchQuery()` when establishing + the cursor and sending a request to `mongot`. ### Sharded procedure 1. `mongos` receives the query on the requested namespace forwards it to all shards. -2. The shards do not recognize the namespace because it is an unresolved view. They forward the request to the primary shard, which owns the view catalog and can resolve the view. -3. The primary shard throws a `CommandOnShardedViewNotSupportedOnMongod` exception back to `mongos` with the resolved view info in the exception response. +2. The shards do not recognize the namespace because it is an unresolved view. They forward the + request to the primary shard, which owns the view catalog and can resolve the view. +3. The primary shard throws a `CommandOnShardedViewNotSupportedOnMongod` exception back to `mongos` + with the resolved view info in the exception response. 4. Same as non-sharded step 3, but in `cluster_aggregate.cpp` instead of `run_aggregate.cpp`. 5. Same as non-sharded step 4. -6. When `mongos` serializes the query to perform shard targeting, it serializes the view object directly inside the search stage (see [Serialized Search Stage](#serialized-search-stage)). -7. The targeted shard performs non-sharded step 4 (sharded step 5) again, but this time we expect the view to exist on the spec object because we serialized it from `mongos`. This step demonstrates why we need to store the view on the spec at all as we cannot rely on the `expCtx` to contain the view in sharded environments. +6. When `mongos` serializes the query to perform shard targeting, it serializes the view object + directly inside the search stage (see [Serialized Search Stage](#serialized-search-stage)). +7. The targeted shard performs non-sharded step 4 (sharded step 5) again, but this time we expect + the view to exist on the spec object because we serialized it from `mongos`. This step + demonstrates why we need to store the view on the spec at all as we cannot rely on the `expCtx` + to contain the view in sharded environments. 8. Same as non-sharded step 5. ### Stored Source -An important caveat to note about this procedure is the case where a user adds `returnStoredSource: true` to their search query. Assuming that the index is set up appropriately to handle this field, returning `storedSource` means that `mongot` will send back the full document to the server, not just a list of `_id`s to lookup. As this implies, there is no need for `$_internalSearchIdLookup` in this situation as `mongot` will have applied the view on its end. Instead, we will just promote the fields in `$storedSource` to root (`DocumentSourceSearch::desugar()`). +An important caveat to note about this procedure is the case where a user adds +`returnStoredSource: true` to their search query. Assuming that the index is set up appropriately to +handle this field, returning `storedSource` means that `mongot` will send back the full document to +the server, not just a list of `_id`s to lookup. As this implies, there is no need for +`$_internalSearchIdLookup` in this situation as `mongot` will have applied the view on its end. +Instead, we will just promote the fields in `$storedSource` to root +(`DocumentSourceSearch::desugar()`). -If a user specifies `returnStoredSource: false` or doesn't specify `returnStoredSource` at all in their query, the process above remains the same and `$_internalSearchIdLookup` will be added to the pipeline. +If a user specifies `returnStoredSource: false` or doesn't specify `returnStoredSource` at all in +their query, the process above remains the same and `$_internalSearchIdLookup` will be added to the +pipeline. ## Examples diff --git a/src/mongo/db/query/search/search_technical_overview.md b/src/mongo/db/query/search/search_technical_overview.md index 1e6f6e9c207..91056a8af81 100644 --- a/src/mongo/db/query/search/search_technical_overview.md +++ b/src/mongo/db/query/search/search_technical_overview.md @@ -1,47 +1,88 @@ # Search -This document is a work-in-progress and just provides a high-level overview of the search implementation. +This document is a work-in-progress and just provides a high-level overview of the search +implementation. -[Atlas Search](https://www.mongodb.com/docs/atlas/atlas-search/) provides integrated full-text search by running queries with the $search and $searchMeta aggregation stages. You can read about the $vectorSearch aggregation stage in [vector_search](../../pipeline/search/vectorSearch_technical_overview.md). +[Atlas Search](https://www.mongodb.com/docs/atlas/atlas-search/) provides integrated full-text +search by running queries with the $search and $searchMeta aggregation stages. You can read about +the $vectorSearch aggregation stage in +[vector_search](../../pipeline/search/vectorSearch_technical_overview.md). ## Lucene -Diving into the mechanics of search requires a brief rundown of [Apache Lucene](https://lucene.apache.org/) because it is the bedrock of MongoDB's search capabilities. MongoDB employees can read more about Lucene and mongot at [go/mongot](http://go/mongot). +Diving into the mechanics of search requires a brief rundown of +[Apache Lucene](https://lucene.apache.org/) because it is the bedrock of MongoDB's search +capabilities. MongoDB employees can read more about Lucene and mongot at +[go/mongot](http://go/mongot). -Apache Lucene is an open-source text search library, written in Java. Lucene allows users to store data in three primary ways: +Apache Lucene is an open-source text search library, written in Java. Lucene allows users to store +data in three primary ways: -- inverted index: maps each term (in a set of documents) to the documents in which the term appears, in which terms are the unique words/phrases and documents are the pieces of content being indexed. Inverted indexes offer great performance for matching search terms with documents. -- storedFields: stores all field values for one document together in a row-stride fashion. In retrieval, all field values are returned at once per document, so that loading the relevant information about a document is very fast. This is very useful for search features that are improved by row-oriented data access, like search highlighting. Search highlighting marks up the search terms and displays them within the best/most relevant sections of a document. -- DocValues: column-oriented fields with a document-to-value mapping built at index time. As it facilitates column based data access, it's faster for aggregating field values for counts and facets. +- inverted index: maps each term (in a set of documents) to the documents in which the term appears, + in which terms are the unique words/phrases and documents are the pieces of content being indexed. + Inverted indexes offer great performance for matching search terms with documents. +- storedFields: stores all field values for one document together in a row-stride fashion. In + retrieval, all field values are returned at once per document, so that loading the relevant + information about a document is very fast. This is very useful for search features that are + improved by row-oriented data access, like search highlighting. Search highlighting marks up the + search terms and displays them within the best/most relevant sections of a document. +- DocValues: column-oriented fields with a document-to-value mapping built at index time. As it + facilitates column based data access, it's faster for aggregating field values for counts and + facets. ## `mongot` -`mongot` is a MongoDB-specific process written as a wrapper around Lucene and run on Atlas. Using Lucene, `mongot` indexes MongoDB databases to provide our customers with full text search capabilities. +`mongot` is a MongoDB-specific process written as a wrapper around Lucene and run on Atlas. Using +Lucene, `mongot` indexes MongoDB databases to provide our customers with full text search +capabilities. -In the current “coupled” search architecture, one `mongot` runs alongside each `mongod` or `mongos`. Each `mongod`/`mongos` and `mongot` pair are on the same physical box/server and communicate via localhost. +In the current “coupled” search architecture, one `mongot` runs alongside each `mongod` or `mongos`. +Each `mongod`/`mongos` and `mongot` pair are on the same physical box/server and communicate via +localhost. -`mongot` replicates the data from its collocated `mongod` node using change streams and builds Lucene indexes on that replicated data. `mongot` is guaranteed to be eventually consistent with mongod. Check out [mongot_cursor](/src/mongo/db/query/search/mongot_cursor.h) for the core shared code that establishes and executes communication between `mongod` and `mongot`. +`mongot` replicates the data from its collocated `mongod` node using change streams and builds +Lucene indexes on that replicated data. `mongot` is guaranteed to be eventually consistent with +mongod. Check out [mongot_cursor](/src/mongo/db/query/search/mongot_cursor.h) for the core shared +code that establishes and executes communication between `mongod` and `mongot`. ## Search Indexes -In order to run search queries, the user has to create a search index. Search index commands similarly use `mongod`/`mongos` server communication protocols to communicate with a remote search index server, but with an Envoy instance that handles forwarding the command requests to Atlas servers and then eventually to the relevant Lucene/`mongot` instances. `mongot` and Envoy instances are co-located with every `mongod` server instance, and Envoy instances are co-located with `mongos` servers as well. The precise structure of the search index architecture will likely evolve in future as improvements are made to that system. +In order to run search queries, the user has to create a search index. Search index commands +similarly use `mongod`/`mongos` server communication protocols to communicate with a remote search +index server, but with an Envoy instance that handles forwarding the command requests to Atlas +servers and then eventually to the relevant Lucene/`mongot` instances. `mongot` and Envoy instances +are co-located with every `mongod` server instance, and Envoy instances are co-located with `mongos` +servers as well. The precise structure of the search index architecture will likely evolve in future +as improvements are made to that system. Search indexes can be: - Only on specified fields ("static") - All fields (“dynamic”) -`mongot` stores the indexed data exclusively, unless the customer has opted into storing entire documents (more expensive). +`mongot` stores the indexed data exclusively, unless the customer has opted into storing entire +documents (more expensive). -There are four search index metadata commands: `createSearchIndexes`, `updateSearchIndex`, `dropSearchIndex` and `listSearchIndexes`. These commands are present on both the `mongod` and `mongos` and are passthrough commands to a remote search index management server. The `mongod`/`mongos` is aware of the address of the remote management server via a startup setParameter `searchIndexManagementHostAndPort`. +There are four search index metadata commands: `createSearchIndexes`, `updateSearchIndex`, +`dropSearchIndex` and `listSearchIndexes`. These commands are present on both the `mongod` and +`mongos` and are passthrough commands to a remote search index management server. The +`mongod`/`mongos` is aware of the address of the remote management server via a startup setParameter +`searchIndexManagementHostAndPort`. -The four commands have security authorization action types corresponding with their names. These action types are included in the same built-in roles as the regular index commands, while `updateSearchIndex` parallels collMod. +The four commands have security authorization action types corresponding with their names. These +action types are included in the same built-in roles as the regular index commands, while +`updateSearchIndex` parallels collMod. Note: Indexes can also be managed through the Atlas UI. ## $search and $searchMeta stages -There are two text search stages in the aggregation framework (and $search is not available for find commands). [$search](https://www.mongodb.com/docs/atlas/atlas-search/query-syntax/#-search) returns the results of full-text search, and [$searchMeta](https://www.mongodb.com/docs/atlas/atlas-search/query-syntax/#-searchmeta) returns metadata about search results. When used for an aggregation, either search stage must be the first stage in the pipeline. For example: +There are two text search stages in the aggregation framework (and +$search is not available for find commands). [$search](https://www.mongodb.com/docs/atlas/atlas-search/query-syntax/#-search) +returns the results of full-text search, and +[$searchMeta](https://www.mongodb.com/docs/atlas/atlas-search/query-syntax/#-searchmeta) returns +metadata about search results. When used for an aggregation, either search stage must be the first +stage in the pipeline. For example: ``` db.coll.aggregate([ @@ -51,45 +92,127 @@ db.coll.aggregate([ ]); ``` -$search and $searchMeta are parsed as [DocumentSourceSearch](/src/mongo/db/pipeline/search/document_source_search.h) and [DocumentSourceSearchMeta](/src/mongo/db/pipeline/search/document_source_search_meta.h), respectively. When using the classic engine, however, DocumentSourceSearch is [desugared](https://github.com/mongodb/mongo/blob/04f19bb61aba10577658947095020f00ac1403c4/src/mongo/db/pipeline/search/document_source_search.cpp#L118) into a sequence that uses the [$\_internalSearchMongotRemote stage](/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.h) and, if the `returnStoredSource` option is false, the [$\_internalSearchIdLookup stage](/src/mongo/db/pipeline/search/document_source_internal_search_id_lookup.h). In SBE, both $search and $searchMeta are lowered directly from the original document sources. +$search and $searchMeta are parsed as [DocumentSourceSearch](/src/mongo/db/pipeline/search/document_source_search.h) and [DocumentSourceSearchMeta](/src/mongo/db/pipeline/search/document_source_search_meta.h), respectively. When using the classic engine, however, DocumentSourceSearch is [desugared](https://github.com/mongodb/mongo/blob/04f19bb61aba10577658947095020f00ac1403c4/src/mongo/db/pipeline/search/document_source_search.cpp#L118) into a sequence that uses the [$\_internalSearchMongotRemote +stage](/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.h) and, if the +`returnStoredSource` option is false, the +[$\_internalSearchIdLookup stage](/src/mongo/db/pipeline/search/document_source_internal_search_id_lookup.h). +In SBE, both $search and $searchMeta are lowered directly from the original document sources. -For example, the stage `{$search: {query: “chocolate”, path: “flavor”}, returnStoredSource: false}` will desugar into the two stages: `{$_internalSearchMongotRemote: {query: “chocolate”, path: “flavor”}, returnStoredSource: false}` and `{$_internalSearchIdLookup: {}}`. +For example, the stage `{$search: {query: “chocolate”, path: “flavor”}, returnStoredSource: false}` +will desugar into the two stages: +`{$_internalSearchMongotRemote: {query: “chocolate”, path: “flavor”}, returnStoredSource: false}` +and `{$_internalSearchIdLookup: {}}`. ### $\_internalSearchMongotRemote -$\_internalSearchMongotRemote is the foundational stage for all search queries, e.g., $search and $searchMeta. This stage opens a cursor on `mongot` ([here](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L269)) and retrieves results one-at-a-time from the cursor ([here](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L163)). +$\_internalSearchMongotRemote is the foundational stage for all search queries, e.g., $search and +$searchMeta. +This stage opens a cursor on `mongot` +([here](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L269)) +and retrieves results one-at-a-time from the cursor +([here](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L163)). -Within this stage, the underlying [TaskExecutorCursor](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/executor/task_executor_cursor.h) acts as a black box to handle dispatching commands to `mongot` only as necessary. The cursor retrieves a batch of results from `mongot`, iterates through that batch per each `getNext` call, then schedules a `getMore` request to `mongot` whenever the previous batch is exhausted. +Within this stage, the underlying +[TaskExecutorCursor](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/executor/task_executor_cursor.h) +acts as a black box to handle dispatching commands to `mongot` only as necessary. The cursor +retrieves a batch of results from `mongot`, iterates through that batch per each `getNext` call, +then schedules a `getMore` request to `mongot` whenever the previous batch is exhausted. -Each batch returned from mongot includes a batch of BSON documents and metadata about the query results. Each document contains an \_id and a relevancy score. The relevancy score indicates how well the document’s indexed values matched the user query. Metadata is a user-specified group of fields with information about the result set as a whole, mostly including counts of various groups (or facets). +Each batch returned from mongot includes a batch of BSON documents and metadata about the query +results. Each document contains an \_id and a relevancy score. The relevancy score indicates how +well the document’s indexed values matched the user query. Metadata is a user-specified group of +fields with information about the result set as a whole, mostly including counts of various groups +(or facets). -We try to optimize time spent communicating with and waiting on mongot by tuning the `batchSize` option on mongot requests and toggling "prefetch-ing" of GetMore requests. This batchSize-tuning and prefetch-enablement logic is based on an attempt at inferring how many documents need to be requested from mongot (the upper and lower bounds of [`DocsNeededBounds`](https://github.com/mongodb/mongo/blob/03222ee4d38696f293302d0d322b7dac2ccb1e1d/src/mongo/db/pipeline/visitors/docs_needed_bounds.h/#L39)). See [`extractDocsNeededBounds()`](https://github.com/mongodb/mongo/blob/07c5da765d36bfc2bb6bc6d9b101a90cf09e82f7/src/mongo/db/pipeline/visitors/document_source_visitor_docs_needed_bounds.h/#L94) for details on how we traverse the full user pipeline to compute those bounds. +We try to optimize time spent communicating with and waiting on mongot by tuning the `batchSize` +option on mongot requests and toggling "prefetch-ing" of GetMore requests. This batchSize-tuning and +prefetch-enablement logic is based on an attempt at inferring how many documents need to be +requested from mongot (the upper and lower bounds of +[`DocsNeededBounds`](https://github.com/mongodb/mongo/blob/03222ee4d38696f293302d0d322b7dac2ccb1e1d/src/mongo/db/pipeline/visitors/docs_needed_bounds.h/#L39)). +See +[`extractDocsNeededBounds()`](https://github.com/mongodb/mongo/blob/07c5da765d36bfc2bb6bc6d9b101a90cf09e82f7/src/mongo/db/pipeline/visitors/document_source_visitor_docs_needed_bounds.h/#L94) +for details on how we traverse the full user pipeline to compute those bounds. -Once the bounds are computed and stored in the document source, we follow a set of heuristics to compute a batchSize for the initial mongot request based on those bounds ([here](https://github.com/mongodb/mongo/blob/03222ee4d38696f293302d0d322b7dac2ccb1e1d/src/mongo/db/query/search/mongot_cursor.cpp/#L110)). The heuristics include applying "oversubscription" logic for non-storedSource queries, to account for the possibility that $\_internalSearchIdLookup may discard some of the documents returned by mongot. +Once the bounds are computed and stored in the document source, we follow a set of heuristics to +compute a batchSize for the initial mongot request based on those bounds +([here](https://github.com/mongodb/mongo/blob/03222ee4d38696f293302d0d322b7dac2ccb1e1d/src/mongo/db/query/search/mongot_cursor.cpp/#L110)). +The heuristics include applying "oversubscription" logic for non-storedSource queries, to account +for the possibility that $\_internalSearchIdLookup may discard some of the documents returned by +mongot. #### Mongot GetMore Stretegy -We customize GetMore-related behaviors of the TaskExecutorCursor (enabling prefetching and tuning the batchSize option) with mongot-specific logic via the `MongotTaskExecutorCursorGetMoreStrategy`. +We customize GetMore-related behaviors of the TaskExecutorCursor (enabling prefetching and tuning +the batchSize option) with mongot-specific logic via the `MongotTaskExecutorCursorGetMoreStrategy`. -For example, if we know that we will need all documents from mongot in order to satisfy the query (for example, if the post-$search pipeline has a blocking stage like $sort or $group), then we'll immediately pre-fetch all GetMore requests and will follow a exponential batchSize growth strategy per batch. On the other hand, if the query has an extractable limit N, we attempt to retrieve all N documents in the first batch by tuning the initial batchSize; in that case, we'll never pre-fetch, and if a GetMore is actually needed, we'll tune the batchSize to try to request all still-needed documents in the next batch. See the [`MongotTaskExecutorCursorGetMoreStrategy`](https://github.com/mongodb/mongo/blob/07c5da765d36bfc2bb6bc6d9b101a90cf09e82f7/src/mongo/db/query/search/mongot_cursor_getmore_strategy.h/#L47) for all heuristics and implementation details. +For example, if we know that we will need all documents from mongot in order to satisfy the query +(for example, if the post-$search pipeline has a blocking stage like $sort or $group), then we'll +immediately pre-fetch all GetMore requests and will follow a exponential batchSize growth strategy +per batch. On the other hand, if the query has an extractable limit N, we attempt to retrieve all N +documents in the first batch by tuning the initial batchSize; in that case, we'll never pre-fetch, +and if a GetMore is actually needed, we'll tune the batchSize to try to request all still-needed +documents in the next batch. See the +[`MongotTaskExecutorCursorGetMoreStrategy`](https://github.com/mongodb/mongo/blob/07c5da765d36bfc2bb6bc6d9b101a90cf09e82f7/src/mongo/db/query/search/mongot_cursor_getmore_strategy.h/#L47) +for all heuristics and implementation details. ### $\_internalSearchIdLookup -The $\_internalSearchIdLookup stage is responsible for recreating the entire document to give to the rest of the agg pipeline (in the above example, $match and $project) and for checking to make sure the data returned is up to date with the data on `mongod`, since `mongot`’s indexed data is eventually consistent with `mongod`. For example, if `mongot` returned the \_id to a document that had been deleted, $\_internalSearchIdLookup is responsible for catching; it won’t find a document matching that \_id and then filters out that document. The stage will also perform shard filtering, where it ensures there are no duplicates from separate shards, and it will retrieve the most up-to-date field values. However, this stage doesn’t account for documents that had been inserted to the collection but not yet propagated to `mongot` via the $changeStream; that’s why search queries are eventually consistent but don’t guarantee strong consistency. +The $\_internalSearchIdLookup stage is responsible for recreating the entire document to give to the +rest of the agg pipeline (in the above example, $match and $project) and for checking to make sure +the data returned is up to date with the data on `mongod`, since `mongot`’s indexed data is +eventually consistent with `mongod`. For example, if `mongot` returned the \_id to a document that +had been deleted, $\_internalSearchIdLookup is responsible for catching; it won’t find a document +matching that \_id and then filters out that document. The stage will also perform shard filtering, +where it ensures there are no duplicates from separate shards, and it will retrieve the most +up-to-date field values. However, this stage doesn’t account for documents that had been inserted to +the collection but not yet propagated to `mongot` via the $changeStream; that’s why search queries +are eventually consistent but don’t guarantee strong consistency. -**Catalog Access and Shard Filtering**: Unlike most pipeline stages that just transform documents, `$_internalSearchIdLookup` must read from storage to perform \_id lookups. This requires a `CollectionAcquisition`, which holds both the collection pointer and the shard version information needed for filtering. Reusing the same `CollectionAcquisition` across all \_id lookups ensures a consistent snapshot of the collection and proper shard filtering. This is handled in two phases: +**Catalog Access and Shard Filtering**: Unlike most pipeline stages that just transform documents, +`$_internalSearchIdLookup` must read from storage to perform \_id lookups. This requires a +`CollectionAcquisition`, which holds both the collection pointer and the shard version information +needed for filtering. Reusing the same `CollectionAcquisition` across all \_id lookups ensures a +consistent snapshot of the collection and proper shard filtering. This is handled in two phases: -- **Setup via `bindCatalogInfo()`**: Before execution begins, [`Pipeline::bindCatalogInfo()`](https://github.com/mongodb/mongo/blob/a013280e0e5dc374f78adbc4cb68b4d190c1d9ed/src/mongo/db/pipeline/pipeline.h#L448-L450) gives each stage a chance to grab the catalog resources it needs. For $\_internalSearchIdLookup, this means receiving the `CollectionAcquisition` along with a shared [`ShardRoleTransactionResourcesStasherForPipeline`](https://github.com/mongodb/mongo/blob/a013280e0e5dc374f78adbc4cb68b4d190c1d9ed/src/mongo/db/pipeline/shard_role_transaction_resources_stasher_for_pipeline.h#L46) that preserves transaction resources across getMores. The stage bundles these into a [`DSInternalSearchIdLookUpCatalogResourceHandle`](https://github.com/mongodb/mongo/blob/a013280e0e5dc374f78adbc4cb68b4d190c1d9ed/src/mongo/db/pipeline/search/document_source_internal_search_id_lookup.h#L257-L277) to use during execution. -- **Execution via `CatalogResourceHandle`**: During `doGetNext()`, the stage calls `acquire()` on the `CatalogResourceHandle` to restore transaction resources onto the opCtx (required before accessing the `CollectionAcquisition`), sets up the \_id lookup query, then calls `release()`. The underlying query executor then uses the same `CollectionAcquisition` and manages its own stashing and unstashing during execution. +- **Setup via `bindCatalogInfo()`**: Before execution begins, + [`Pipeline::bindCatalogInfo()`](https://github.com/mongodb/mongo/blob/a013280e0e5dc374f78adbc4cb68b4d190c1d9ed/src/mongo/db/pipeline/pipeline.h#L448-L450) + gives each stage a chance to grab the catalog resources it needs. For $\_internalSearchIdLookup, + this means receiving the `CollectionAcquisition` along with a shared + [`ShardRoleTransactionResourcesStasherForPipeline`](https://github.com/mongodb/mongo/blob/a013280e0e5dc374f78adbc4cb68b4d190c1d9ed/src/mongo/db/pipeline/shard_role_transaction_resources_stasher_for_pipeline.h#L46) + that preserves transaction resources across getMores. The stage bundles these into a + [`DSInternalSearchIdLookUpCatalogResourceHandle`](https://github.com/mongodb/mongo/blob/a013280e0e5dc374f78adbc4cb68b4d190c1d9ed/src/mongo/db/pipeline/search/document_source_internal_search_id_lookup.h#L257-L277) + to use during execution. +- **Execution via `CatalogResourceHandle`**: During `doGetNext()`, the stage calls `acquire()` on + the `CatalogResourceHandle` to restore transaction resources onto the opCtx (required before + accessing the `CollectionAcquisition`), sets up the \_id lookup query, then calls `release()`. The + underlying query executor then uses the same `CollectionAcquisition` and manages its own stashing + and unstashing during execution. ### Explains -Like normal explain queries, search explain queries can be run with three different verbosities, "queryPlanner" which does not execute the query, and "executionStats" and "allPlansExecution" which do execute the query and output execution stats about the query. +Like normal explain queries, search explain queries can be run with three different verbosities, +"queryPlanner" which does not execute the query, and "executionStats" and "allPlansExecution" which +do execute the query and output execution stats about the query. -For queries with "queryPlanner" verbosity, we specify "queryPlanner" in our query to mongot, it returns an explain object without a cursor. We directly return this object in our explain output. +For queries with "queryPlanner" verbosity, we specify "queryPlanner" in our query to mongot, it +returns an explain object without a cursor. We directly return this object in our explain output. -For queries with "executionStats" or "allPlansExecution" verbosity levels, we follow the same path as normal search queries to establish cursor(s) on mongot. By including the explain verbosity in our query to mongot, we receive an explain object along with the usual cursor(s) containing documents. These documents are then returned to the subsequent stages of the pipeline, and the execution of the query continues. It's important to note that the merge phase of a sharded query is not executed during an explain (see [SPM-3100](https://jira.mongodb.org/browse/SPM-3100)). If a `getMore` command is issued against the cursor, mongot will return a new explain object which contains updated statistics on its execution of the query. The latest explain object is stored on the [TaskExecutorCursor](https://github.com/mongodb/mongo/blob/a71fa6a39a916983c38c23684cd23ac930ae5616/src/mongo/executor/task_executor_cursor.h#L267) as it handles the `getMore`s. We include the latest explain object from mongot in the explain for [$\_internalSearchMongotRemote, $searchMeta](https://github.com/mongodb/mongo/blob/a71fa6a39a916983c38c23684cd23ac930ae5616/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L112), and [$vectorSearch](https://github.com/mongodb/mongo/blob/a71fa6a39a916983c38c23684cd23ac930ae5616/src/mongo/db/pipeline/search/document_source_vector_search.cpp#L133-L134) to output the most up to date information. +For queries with "executionStats" or "allPlansExecution" verbosity levels, we follow the same path +as normal search queries to establish cursor(s) on mongot. By including the explain verbosity in our +query to mongot, we receive an explain object along with the usual cursor(s) containing documents. +These documents are then returned to the subsequent stages of the pipeline, and the execution of the +query continues. It's important to note that the merge phase of a sharded query is not executed +during an explain (see [SPM-3100](https://jira.mongodb.org/browse/SPM-3100)). If a `getMore` command +is issued against the cursor, mongot will return a new explain object which contains updated +statistics on its execution of the query. The latest explain object is stored on the +[TaskExecutorCursor](https://github.com/mongodb/mongo/blob/a71fa6a39a916983c38c23684cd23ac930ae5616/src/mongo/executor/task_executor_cursor.h#L267) +as it handles the `getMore`s. We include the latest explain object from mongot in the explain for +[$\_internalSearchMongotRemote, $searchMeta](https://github.com/mongodb/mongo/blob/a71fa6a39a916983c38c23684cd23ac930ae5616/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L112), +and +[$vectorSearch](https://github.com/mongodb/mongo/blob/a71fa6a39a916983c38c23684cd23ac930ae5616/src/mongo/db/pipeline/search/document_source_vector_search.cpp#L133-L134) +to output the most up to date information. ### Didn't Find What You're Looking For? -Visit [the landing page](/src/mongo/db/query/search/README.md) for all $search/$vectorSearch/$searchMeta related documentation for server contributors. +Visit [the landing page](/src/mongo/db/query/search/README.md) for all +$search/$vectorSearch/$searchMeta related documentation for server contributors. diff --git a/src/mongo/db/query/timeseries/README.md b/src/mongo/db/query/timeseries/README.md index a45ad8cbca6..50d72075c87 100644 --- a/src/mongo/db/query/timeseries/README.md +++ b/src/mongo/db/query/timeseries/README.md @@ -1,11 +1,13 @@ # Query Rewrites for Timeseries Collections -For a general overview about how timeseries collection are implemented, see [db/timeseries/README.md][db readme]. For sharding -specific logic, see [db/global_catalog/README_timeseries.md][catalog readme]. This document will focus on query translations and -optimizations for timeseries collections and the `$_internalUnpackBucket` aggregation stage, and assumes knowledge of timeseries -collections basics. We perform timeseries rewrites before and during optimizations. For clarity in this README and all query timeseries -resources, when we are performing pre-optimization rewrites we will use the term _translations_, and we will use the term _rewrites_ -for timeseries optimizations. +For a general overview about how timeseries collection are implemented, see +[db/timeseries/README.md][db readme]. For sharding specific logic, see +[db/global_catalog/README_timeseries.md][catalog readme]. This document will focus on query +translations and optimizations for timeseries collections and the `$_internalUnpackBucket` +aggregation stage, and assumes knowledge of timeseries collections basics. We perform timeseries +rewrites before and during optimizations. For clarity in this README and all query timeseries +resources, when we are performing pre-optimization rewrites we will use the term _translations_, and +we will use the term _rewrites_ for timeseries optimizations. There are two different types of timeseries collections. @@ -14,159 +16,184 @@ There are two different types of timeseries collections. ## Pre 9.0: Legacy timeseries collections -These timeseries collection have 2 namespaces that are automatically made when the collection is created. The user defined namespace -will be a view, and `system.buckets.` will store the timeseries documents in bucket document format. More details about the -buckets can be found in [db/timeseries/README.md][db readme]. +These timeseries collection have 2 namespaces that are automatically made when the collection is +created. The user defined namespace will be a view, and `system.buckets.` will store the +timeseries documents in bucket document format. More details about the buckets can be found in +[db/timeseries/README.md][db readme]. ### Queries on the view -Because the user-created timeseries collection is a view, all queries against it (`find`, `count`, `distinct` -and `aggregate`) are transformed into an aggregation request against the backing `system.buckets.` collection with the -`$_internalUnpackBucket` stage prepended to the generated pipeline. +Because the user-created timeseries collection is a view, all queries against it (`find`, `count`, +`distinct` and `aggregate`) are transformed into an aggregation request against the backing +`system.buckets.` collection with the `$_internalUnpackBucket` stage prepended to the +generated pipeline. -The entrypoint for these aggregate operations is `runAggregate()` and then `runAggregateOnView()`. If -all the validation checks pass, the view is then resolved. A resolved view will contain the original pipeline, -the namespace of the collection underlying the view, and for timeseries collections more information about -the buckets collection, such as if the buckets collection uses an extended range (dates pre 1970). The -resolved view is turned into a new aggregation request, where an internal aggregation stage -(`$_internalUnpackBucket`) is added as the first stage in the pipeline (see `asExpandedViewAggregation()`). -Then `runAggregate` is called again on this new request. +The entrypoint for these aggregate operations is `runAggregate()` and then `runAggregateOnView()`. +If all the validation checks pass, the view is then resolved. A resolved view will contain the +original pipeline, the namespace of the collection underlying the view, and for timeseries +collections more information about the buckets collection, such as if the buckets collection uses an +extended range (dates pre 1970). The resolved view is turned into a new aggregation request, where +an internal aggregation stage (`$_internalUnpackBucket`) is added as the first stage in the pipeline +(see `asExpandedViewAggregation()`). Then `runAggregate` is called again on this new request. ### Queries on the buckets collection -Queries directly on the **buckets collection** are executed like non timeseries collection -queries. However, prior to PM-3167 (on versions before 7.2), queries on the buckets collection are forced -to run in the classic execution engine. In 7.2+, if the queries are eligible, they will run in SBE. +Queries directly on the **buckets collection** are executed like non timeseries collection queries. +However, prior to PM-3167 (on versions before 7.2), queries on the buckets collection are forced to +run in the classic execution engine. In 7.2+, if the queries are eligible, they will run in SBE. -Similar to non timeseries collections, the `$_internalUnpackBucket` stage is not used when executing queries against the buckets -collection, because the buckets collection is not a view. We do not expect users to directly query -the buckets collection, since the buckets collection is made automatically when users create timeseries -collections. Also, users should query the view to take advantage of timeseries specific optimizations. +Similar to non timeseries collections, the `$_internalUnpackBucket` stage is not used when executing +queries against the buckets collection, because the buckets collection is not a view. We do not +expect users to directly query the buckets collection, since the buckets collection is made +automatically when users create timeseries collections. Also, users should query the view to take +advantage of timeseries specific optimizations. ## 9.0+: Viewless timeseries collections -Unlike viewful timeseries collections, these collections have a **single** namespace, which the user defines. Therefore, there -is no `system.buckets.` collection and there is no view to resolve when querying viewless timeseries collections. +Unlike viewful timeseries collections, these collections have a **single** namespace, which the user +defines. Therefore, there is no `system.buckets.` collection and there is no view to +resolve when querying viewless timeseries collections. ### Queries that return user documents -> [!NOTE] -> This is the expected workflow for users. +> [!NOTE] This is the expected workflow for users. -Like with queries against viewful timeseries collections, queries against viewless timeseries collections are also transformed -into aggregation requests, since we still must prepend the `$_internalUnpackBucket` aggregation stage. `find`, `count`, and `distinct` -will all check if the collection is timeseries by checking the collection catalog data (either `CollectionOrViewAcquisition` for the -shard role, or `CollectionRoutingInfo` for the router role). If the collection is a timeseries collection, we will translate the command -as an aggregation request. +Like with queries against viewful timeseries collections, queries against viewless timeseries +collections are also transformed into aggregation requests, since we still must prepend the +`$_internalUnpackBucket` aggregation stage. `find`, `count`, and `distinct` will all check if the +collection is timeseries by checking the collection catalog data (either +`CollectionOrViewAcquisition` for the shard role, or `CollectionRoutingInfo` for the router role). +If the collection is a timeseries collection, we will translate the command as an aggregation +request. -In the `aggregate` command, we also check if the collection is timeseries by checking the collection catalog data. During aggregation, -after any views are resolved and the pipeline is parsed and validated, we will use the collection catalog data in -[performPreOptimizationRewrites][pre op rewrites] to prepend the `$_internalUnpackBucket` stage. The pipeline will then set -`_translatedForViewlessTimeseries` to true, since the timeseries translation **can only happen once** during the lifetime of a pipeline. +In the `aggregate` command, we also check if the collection is timeseries by checking the collection +catalog data. During aggregation, after any views are resolved and the pipeline is parsed and +validated, we will use the collection catalog data in +[performPreOptimizationRewrites][pre op rewrites] to prepend the `$_internalUnpackBucket` stage. The +pipeline will then set `_translatedForViewlessTimeseries` to true, since the timeseries translation +**can only happen once** during the lifetime of a pipeline. ### Queries that return raw buckets -> [!NOTE] -> This is not the expected workflow for users. +> [!NOTE] This is not the expected workflow for users. -Unlike viewful timeseries collections, we cannot query `system.buckets.` directly because the namespace does not exist. -To maintain this functionality, queries directly on the buckets use the same namespace as the timeseries collection and set the `rawData` -field on the command object to `true`. Therefore, queries on `` with `rawData = true` will return the same results as +Unlike viewful timeseries collections, we cannot query `system.buckets.` directly because +the namespace does not exist. To maintain this functionality, queries directly on the buckets use +the same namespace as the timeseries collection and set the `rawData` field on the command object to +`true`. Therefore, queries on `` with `rawData = true` will return the same results as queries on `system.buckets.`. ### Considerations when working with viewless timeseries -These were important lessons taken from SPM-4217, which added aggregation support for viewless timeseries. +These were important lessons taken from SPM-4217, which added aggregation support for viewless +timeseries. -1. Detecting if a collection is timeseries now requires accessing collection catalog data. The collection catalog must be up to date, - since the correctness of timeseries queries depend on detecting that the collection is timeseries. Therefore, we recommend using the - shard role and the router role APIs to ensure the data from the catalog is up to date. Additionally, when accessing the catalog we must - consider all 3 scenarios: +1. Detecting if a collection is timeseries now requires accessing collection catalog data. The + collection catalog must be up to date, since the correctness of timeseries queries depend on + detecting that the collection is timeseries. Therefore, we recommend using the shard role and the + router role APIs to ensure the data from the catalog is up to date. Additionally, when accessing + the catalog we must consider all 3 scenarios: - - **Tracked collections**. If we are acting as a router, we can use `CollectionRoutingInfo` retrieved by a `RoutingContext` - (see [sharded_agg_helpers::finalizeAndMaybePreparePipelineForExecution][finalize func] for an example). - - **Untracked collections**. These are unsharded collections that live on the primary shard. The config server does not have - information about this collection, so we must contact the primary shard to check if the collection is timeseries. - - **Local collections**. If we are in a shard role or can perform a local read, we can use the local catalog. These can be - unsharded collections that live on that shard, or all collections in a non sharded cluster. + - **Tracked collections**. If we are acting as a router, we can use `CollectionRoutingInfo` + retrieved by a `RoutingContext` (see + [sharded_agg_helpers::finalizeAndMaybePreparePipelineForExecution][finalize func] for an + example). + - **Untracked collections**. These are unsharded collections that live on the primary shard. The + config server does not have information about this collection, so we must contact the primary + shard to check if the collection is timeseries. + - **Local collections**. If we are in a shard role or can perform a local read, we can use the + local catalog. These can be unsharded collections that live on that shard, or all collections + in a non sharded cluster. -2. We must consider how an aggregation stage or command should work when `rawData = true`. For example, `$out` errors if `rawData = true` - because `$out` cannot work on raw buckets ([code link][out]). +2. We must consider how an aggregation stage or command should work when `rawData = true`. For + example, `$out` errors if `rawData = true` because `$out` cannot work on raw buckets ([code + link][out]). 3. Two `StageConstraint`s are important for defining timeseries behavior: - - `canRunOnTimeseries` should be set to `false` if the stage should error when run on a timeseries collection. For example, - `$search` can never run on a timeseries collection ([code link][search]). - - `consumesLogicalCollectionData` should be set to `false` if the stage processes collection metadata, internally created data such as - oplog entries, or has no input. If the first stage of the pipeline sets `consumesLogicalCollectionData` to `false`, then the - `$_internalUnpackBucket` stage will not be prepended ([code link][ts translation]). Examples are `$queue` and `$collStats` - ([code link][collstats]). + - `canRunOnTimeseries` should be set to `false` if the stage should error when run on a + timeseries collection. For example, `$search` can never run on a timeseries collection ([code + link][search]). + - `consumesLogicalCollectionData` should be set to `false` if the stage processes collection + metadata, internally created data such as oplog entries, or has no input. If the first stage of + the pipeline sets `consumesLogicalCollectionData` to `false`, then the `$_internalUnpackBucket` + stage will not be prepended ([code link][ts translation]). Examples are `$queue` and + `$collStats` ([code link][collstats]). ## How collMod affects timeseries queries -Users can run a `collMod` command at anytime, which can only increase the granularity value for timeseries collections. Increasing the -granularity increases the value of `bucketMaxSpanSeconds`, which means that new buckets will span more time. Buckets already created -do not change. `bucketMaxSpanSeconds` is used to push down `$match` predicates and target shards if the shard key is on `timeField` -(which is a deprecated feature). But this value can change! A query that was already running when the `collMod` command was issued -can use an older value of `bucketMaxSpanSeconds` and miss new writes during the aggregation (see +Users can run a `collMod` command at anytime, which can only increase the granularity value for +timeseries collections. Increasing the granularity increases the value of `bucketMaxSpanSeconds`, +which means that new buckets will span more time. Buckets already created do not change. +`bucketMaxSpanSeconds` is used to push down `$match` predicates and target shards if the shard key +is on `timeField` (which is a deprecated feature). But this value can change! A query that was +already running when the `collMod` command was issued can use an older value of +`bucketMaxSpanSeconds` and miss new writes during the aggregation (see [bucket_unpacking_with_sort_granularity_change.js][granularity test]). -Similarly, a query can use an older value of granularity when targeting shards. If we are targeting shards with the old granularity -value, we might miss buckets made with the new granularity value, which is acceptable if the query started before the `collMod` command. +Similarly, a query can use an older value of granularity when targeting shards. If we are targeting +shards with the old granularity value, we might miss buckets made with the new granularity value, +which is acceptable if the query started before the `collMod` command. -Conversely, if the query targets shards using the new granularity value, then our query predicate would span more time (since granularity -can only increase), so we will target more shards than before. We can illustrate this with a simplified example (for more details see -[sharding timeseries README][catalog readme]): +Conversely, if the query targets shards using the new granularity value, then our query predicate +would span more time (since granularity can only increase), so we will target more shards than +before. We can illustrate this with a simplified example (for more details see [sharding timeseries +README][catalog readme]): -To ensure we capture all relevant buckets we expand our query predicate by the value of`bucketMaxSpanSeconds`. So if `bucketMaxSpanSeconds` -is `1 minute` and the query predicate is time equals 5:01pm, we will add and subtract `1 minute` to the query predicate and retrieve all -buckets that hold measurements between 5:00-5:02pm. If `bucketMaxSpanSeconds` increases to `1 hour`, we will add and subtract `1 hour` in -the query predicate to retrieve buckets with measurements between 4:00 - 6:00pm. This wider query predicate could require contacting more -shards. During the `$_internalUnpackBucket` stage we will filter out all measurements that don't match the predicate. +To ensure we capture all relevant buckets we expand our query predicate by the value +of`bucketMaxSpanSeconds`. So if `bucketMaxSpanSeconds` is `1 minute` and the query predicate is time +equals 5:01pm, we will add and subtract `1 minute` to the query predicate and retrieve all buckets +that hold measurements between 5:00-5:02pm. If `bucketMaxSpanSeconds` increases to `1 hour`, we will +add and subtract `1 hour` in the query predicate to retrieve buckets with measurements between +4:00 - 6:00pm. This wider query predicate could require contacting more shards. During the +`$_internalUnpackBucket` stage we will filter out all measurements that don't match the predicate. # Query optimizations for timeseries The `$_internalUnpackBucket` stage is implemented by `DocumentSourceInternalUnpackBucket` and, like -all other document sources, provides the `doOptimizeAt()` function. This function contains most of the query rewrites -and optimizations specific to timeseries. The optimizations contained in this function are listed below. Most of -them aim to limit the number of buckets that need to be unpacked to satisfy the user's query and in some cases -may remove the `$_internalBucketUnpack` stage completely. For example, removing the `$_internalBucketUnpack` -stage and rewriting a `$group` stage has been seen to increase performance by 100x. +all other document sources, provides the `doOptimizeAt()` function. This function contains most of +the query rewrites and optimizations specific to timeseries. The optimizations contained in this +function are listed below. Most of them aim to limit the number of buckets that need to be unpacked +to satisfy the user's query and in some cases may remove the `$_internalBucketUnpack` stage +completely. For example, removing the `$_internalBucketUnpack` stage and rewriting a `$group` stage +has been seen to increase performance by 100x. Just like all document sources, `DocumentSourceInternalUnpackBucket::doOptimizeAt()` can be called any number of times during the optimization of a pipeline. Optimizations added should handle being -called repeatedly. Additionally, `DocumentSourceInternalUnpackBucket::doOptimizeAt()` only peeks at the next -stage after `$_internalUnpackBucket`. This might prevent eligible stages from being optimized if there -are stages right after `$_internalUnpackBucket` that cannot be optimized. This is a known limitation in the -classic execution engine. +called repeatedly. Additionally, `DocumentSourceInternalUnpackBucket::doOptimizeAt()` only peeks at +the next stage after `$_internalUnpackBucket`. This might prevent eligible stages from being +optimized if there are stages right after `$_internalUnpackBucket` that cannot be optimized. This is +a known limitation in the classic execution engine. -The descriptions of the optimizations below are summaries and do not list all of the requirements for -each optimization. +The descriptions of the optimizations below are summaries and do not list all of the requirements +for each optimization. ## A quick note about the `timeField` and `metaField` Most of the query rewrites described below will rely on the `timeField` and `metaField`. The user -inputted `timeField` and `metaField` values will be different than what is stored in the buckets collection. -When implementing and testing query optimizations, the rewrites from the user `timeField` and `metaField` values -to the buckets collection fields must be tested. +inputted `timeField` and `metaField` values will be different than what is stored in the buckets +collection. When implementing and testing query optimizations, the rewrites from the user +`timeField` and `metaField` values to the buckets collection fields must be tested. -Let's look at an example with a timeseries collection created with these options: `{timeField: t, metaField: m}`. +Let's look at an example with a timeseries collection created with these options: +`{timeField: t, metaField: m}`. -The `timeField` will be used in the `control` object in the buckets collection. The `control.min.