mongo/buildscripts/cost_model
Max Verbinnen 0285d3c416 SERVER-125617 Enable path arrayness for calibration suite (#52748)
GitOrigin-RevId: fe0e214a00a15bb6992bda16a21a022a65fb2401
2026-04-28 19:12:15 +00:00
..
join_output SERVER-123922 Add execution-time input file to speed up INLJ/HJ cost model testing (#51680) 2026-04-14 09:15:28 +00:00
.gitignore SERVER-123922 Add execution-time input file to speed up INLJ/HJ cost model testing (#51680) 2026-04-14 09:15:28 +00:00
BUILD.bazel SERVER-120217 Update to using rules python (#48943) 2026-03-06 01:52:51 +00:00
calibration_settings.py SERVER-119383 Build collections and indexes for join cost model tuning (#47983) 2026-02-16 10:06:28 +00:00
ce_data_settings.py SERVER-106620 Remove remaining ABT references as well as unused code from cost model scripts (#42075) 2025-10-07 17:52:55 +00:00
ce_generate_data.py SERVER-106620 Remove remaining ABT references as well as unused code from cost model scripts (#42075) 2025-10-07 17:52:55 +00:00
common.py SERVER-106620 Remove remaining ABT references as well as unused code from cost model scripts (#42075) 2025-10-07 17:52:55 +00:00
config.py SERVER-106620 Remove remaining ABT references as well as unused code from cost model scripts (#42075) 2025-10-07 17:52:55 +00:00
cost_estimator.py SERVER-113632 Create workload for incremental filter leaf cost (#43737) 2025-11-13 16:17:32 +00:00
data_generator.py SERVER-104999 port motor to pymongo async (#36697) 2025-05-30 13:07:39 +00:00
database_instance.py SERVER-120017 Initial calibration of HJ vs INLJ (#48386) 2026-03-20 09:41:42 +00:00
execution_tree_classic.py SERVER-113632 Create workload for incremental filter leaf cost (#43737) 2025-11-13 16:17:32 +00:00
execution_tree_sbe.py SERVER-106251 Modify cost model calibration code to be able to parse classic execution trees (#37438) 2025-06-20 20:55:36 +00:00
experiment.py SERVER-115023: Fix invalid escape sequences for Python 3.13 compatibi… (#46796) 2026-01-26 15:14:01 +00:00
join_calibration_settings.py SERVER-125617 Enable path arrayness for calibration suite (#52748) 2026-04-28 19:12:15 +00:00
join_plotting.py SERVER-123922 Add execution-time input file to speed up INLJ/HJ cost model testing (#51680) 2026-04-14 09:15:28 +00:00
join_start.py SERVER-125617 Enable path arrayness for calibration suite (#52748) 2026-04-28 19:12:15 +00:00
join_workload_execution.py SERVER-123922 Add execution-time input file to speed up INLJ/HJ cost model testing (#51680) 2026-04-14 09:15:28 +00:00
mongod_manager.py SERVER-120017 Initial calibration of HJ vs INLJ (#48386) 2026-03-20 09:41:42 +00:00
OWNERS.yml SERVER-103079 Replace granular QO code owners teams with a single coarse team (#34298) 2025-04-10 06:25:19 +00:00
parameters_extractor_classic.py SERVER-113632 Create workload for incremental filter leaf cost (#43737) 2025-11-13 16:17:32 +00:00
qsn_calibrator.py SERVER-113679 Take cost of SORT spilling into account (#43934) 2025-11-13 21:23:43 +00:00
qsn_costing_parameters.py SERVER-113632 Create workload for incremental filter leaf cost (#43737) 2025-11-13 16:17:32 +00:00
query_solution_tree.py SERVER-106251 Modify cost model calibration code to be able to parse classic execution trees (#37438) 2025-06-20 20:55:36 +00:00
random_generator.py SERVER-119383 Build collections and indexes for join cost model tuning (#47983) 2026-02-16 10:06:28 +00:00
README.md SERVER-124136 Format markdown via prettier: wrap lines and use width of 100 (#52231) 2026-04-21 19:20:11 +00:00
requirements.txt SERVER-104999 port motor to pymongo async (#36697) 2025-05-30 13:07:39 +00:00
start.py SERVER-113632 Update expected COLLSCAN & FETCH filter (#43986) 2025-11-14 14:50:02 +00:00
workload_execution.py SERVER-113679 Take cost of SORT spilling into account (#43934) 2025-11-13 21:23:43 +00:00

Cost Model Calibrator

Getting Started

1) Setup Mongod

First, prepare the MongoDB server:

  1. Activate the standard virtual environment:
source python3-venv/bin/activate
  1. Build server with optimizations (makes doc insertion faster):
(python3-venv) bazel build --config=opt install-devcore
  1. Run mongod instance (only for CBR calibration, because join_start.py manages mongod's lifecycle itself):
(python3-venv) bazel-bin/install-mongod/bin/mongod --setParameter internalMeasureQueryExecutionTimeInNanoseconds=true

2) Setup Cost Model Calibrator

In another terminal:

  1. Navigate to the cost model directory:
cd buildscripts/cost_model
  1. Set up Python alias to use MongoDB toolchain:
alias python=/opt/mongodbtoolchain/v4/bin/python3
  1. Deactivate any existing Python environment (if needed):
deactivate
  1. Create new virtual environment:
/opt/mongodbtoolchain/v4/bin/python3 -m venv cm
  1. Activate the new environment:
source cm/bin/activate
  1. Install required packages:
(cm) python -m pip install -r requirements.txt
  1. Run the calibrator:
  • For CBR cost model calibration:
    (cm) python start.py
    
  • For JOO cost model calibration:
    (cm) python join_start.py
    
    To skip the constant calibration (warm scan, CPU, sequential I/O, random I/O) and only run the join algorithm comparison:
    (cm) python join_start.py --join-only
    
    To iterate quickly on cost model changes, reuse pre-recorded execution times from a previous full run. This skips actual query execution, only running queryPlanner explains to collect fresh cost estimates:
    (cm) python join_start.py --execution-times join_output/join_times_in-cache.csv join_output/join_times_exceeds-cache.csv
    

Note: For CBR calibration, the first time it will take a while since it has to generate the data. Afterwards, as long as you aren't modifying the collections, you can comment out await generator.populate_collections() in start.py - this will make it a lot faster.

  1. When done, deactivate the environment:
(cm) deactivate

Install New Packages

  1. Install the package:
(cm) python -m pip install <package_name>
  1. Update requirements.txt:
(cm) python -m pip freeze > requirements.txt