Query Performance Monitor
Monitor query runtime performance and alert on regressions.
Overview
Catch query performance regressions before they blow your SLAs or warehouse budget. The query performance monitor tracks SQL execution times and alerts on runtime changes relative to a baseline, individual slow queries, or consecutive threshold breaches.
Reference scopeThis page covers MaC YAML configuration. For how query performance monitoring works, see Query Performance.
Query performance monitors have no schedule field โ they run on Monte Carlo's internal collection cycle. MaC key: query_performance. This monitor does not target a specific table โ it uses a selection array to filter which queries are monitored by database, schema, or table.
A query performance monitor requires exactly threealert_conditionsโ one each forruntime,runtime_change, andconsecutive_breach_count. You tune (or effectively disable, by setting a high threshold) each metric, but all three must be present orapplyfails withThere must be exactly 3 comparisons.
Quick Start
montecarlo:
query_performance:
- name: core_model_runtime
description: Alert when core dbt model queries slow down significantly
warehouse: prod-snowflake
selection:
- field_name: database
values:
- ANALYTICS
- field_name: schema
values:
- CORE
alert_conditions:
- metric: runtime
threshold: 300000
- metric: runtime_change
threshold: 2.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
- metric: consecutive_breach_count
threshold: 3
domains:
- my-domainConfiguration
array of objects ยท required
Each entry defines a performance metric and the threshold that triggers an alert. You must supply exactly three entries โ one each for runtime, runtime_change, and consecutive_breach_count. This schema uses additionalProperties: false โ only the fields listed below are accepted.
alert_conditions:
- metric: runtime
threshold: 300000
- metric: runtime_change
threshold: 2.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
- metric: consecutive_breach_count
threshold: 3Properties
metric โ performance metric to evaluate
enum ยท required
Accepted values: runtime ยท runtime_change ยท consecutive_breach_count (lowercase)
Each of the three values must appear exactly once across the alert_conditions array. See Available Metrics for threshold meanings and required companion fields.
metric: runtime_changethreshold โ threshold value
number ยท required
Meaning depends on the metric: multiplier for runtime_change, milliseconds for runtime, count for consecutive_breach_count. Must be greater than 0.
threshold: 2.0baseline_agg_function โ baseline aggregation function
enum ยท required, for runtime_change
Accepted values: MEDIAN ยท AVG (uppercase)
Aggregation function for the historical baseline.
baseline_agg_function: MEDIANthreshold_lookback_minutes โ lookback window in minutes
integer ยท one of threshold_lookback_minutes or threshold_lookback_data_points is required for runtime_change
Historical lookback window in minutes for computing the baseline. Mutually exclusive with threshold_lookback_data_points โ set exactly one.
threshold_lookback_minutes: 10080threshold_lookback_data_points โ lookback window in data points
integer ยท optional
Number of historical data points for the baseline. Mutually exclusive with threshold_lookback_minutes.
threshold_lookback_data_points: 100comparison_type โ comparison type
string ยท optional
Use QUERY_PERFORMANCE if set.
comparison_type: QUERY_PERFORMANCEoperator โ comparison operator
enum ยท optional
Accepted values: GT
Only GT is supported.
operator: GTarray of objects ยท optional
Filters which queries are included in performance monitoring. Each entry narrows the scope by a specific dimension. Multiple entries are combined with AND logic.
selection:
- field_name: database
values:
- ANALYTICS
- field_name: schema
values:
- COREProperties
field_name โ dimension to filter on
string ยท required
Common values: database, schema, table.
field_name: databasevalues โ allowed values for the dimension
array of strings ยท optional
Queries matching any value are included.
values:
- ANALYTICS
- PRODUCTIONoperator โ filter operator
enum ยท optional ยท default: include
Accepted values: include ยท exclude (lowercase)
Whether matching queries are included in or excluded from the monitored scope.
operator: includearray of strings (exactly one entry) ยท required, on all accounts created after January 2025
Set default_domain in montecarlo.yml to avoid repeating it on every monitor.
domains:
- my-domainstring ยท optional (yes if multiple warehouses)
Warehouse UUID or name. Overrides default_resource from montecarlo.yml.
warehouse: prod-snowflakestring ยท required
Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one โ incident history does not transfer.
name: core_model_runtime_regressionstring ยท required
Max 512 characters.
description: Detect runtime regressions in core analytics modelsstring ยท optional
Visible in the Monte Carlo UI. Not included in notifications.
notes: Owned by the analytics team. Reviewed quarterly.array of strings ยท optional
Audience names linking this monitor to channels defined in Notifications as Code.
audiences:
- data-engineering-alerts
- etl-opsarray of strings ยท optional
Separate audiences for run-failure notifications. Falls back to audiences if not set.
failure_audiences:
- oncall-alertsboolean ยท optional
notify_run_failure: trueenum ยท optional
Accepted values: P1 ยท P2 ยท P3 ยท P4 ยท P5
priority: P2array of objects ยท optional
| Property | Type | Required | Description |
|---|---|---|---|
name | string | yes | Tag key |
value | string | no | Tag value |
tags:
- name: team
value: analytics
- name: environment
value: productionenum ยท optional
Accepted values: ACCURACY ยท COMPLETENESS ยท CONSISTENCY ยท TIMELINESS ยท UNIQUENESS ยท VALIDITY
data_quality_dimension: TIMELINESSboolean ยท optional ยท default: false
Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics โ always include it if you want the monitor to stay in draft.
is_draft: truestring ยท optional
Include the UUID of an existing monitor to update it instead of creating a new one.
uuid: 0dae7702-0950-45c7-909c-7e183bddca19Deprecated fields
| Field | Use instead |
|---|---|
resource | warehouse |
domain | domains |
domain_uuids | domains |
labels | audiences |
notify_rule_run_failure | notify_run_failure |
Available Metrics
All three metrics must be present in every monitor (see the note above). Tune each threshold to the scope you care about.
| Metric | Description | Required Fields | Threshold Meaning |
|---|---|---|---|
runtime_change | Detects queries whose runtime has increased relative to a historical baseline. | baseline_agg_function, plus either threshold_lookback_minutes or threshold_lookback_data_points | Multiplier. A threshold of 2.0 means "alert when runtime exceeds 2x the baseline." |
runtime | Alerts when any query execution exceeds a fixed duration. | None beyond threshold | Milliseconds. A threshold of 300000 means "alert on queries running longer than 5 minutes." |
consecutive_breach_count | Alerts when a query breaches its threshold for N consecutive runs. | None beyond threshold | Count. A threshold of 3 means "alert after 3 consecutive breaches." |
Examples
Runtime change detection with median baseline
Tuned to flag queries in the CORE schema that take more than 2x their median runtime over the past week. All three metrics are configured (required); the runtime and consecutive_breach_count thresholds are set permissively so runtime_change drives the alerting.
montecarlo:
query_performance:
- name: core_model_runtime_regression
description: Detect runtime regressions in core analytics models
warehouse: prod-snowflake
selection:
- field_name: database
values:
- ANALYTICS
- field_name: schema
values:
- CORE
alert_conditions:
- metric: runtime_change
threshold: 2.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
- metric: runtime
threshold: 600000
- metric: consecutive_breach_count
threshold: 5
audiences:
- data-engineering-alerts
priority: P2
data_quality_dimension: TIMELINESS
tags:
- name: team
value: analytics
domains:
- my-domainAbsolute runtime threshold
Tuned to flag any query in the billing schema that takes longer than 10 minutes (600000 ms). The runtime_change and consecutive_breach_count conditions are still required, so they are included with permissive thresholds.
montecarlo:
query_performance:
- name: billing_slow_queries
description: Alert on billing queries exceeding 10 minutes
warehouse: prod-snowflake
selection:
- field_name: database
values:
- PRODUCTION
- field_name: schema
values:
- BILLING
alert_conditions:
- metric: runtime
threshold: 600000
- metric: runtime_change
threshold: 3.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
- metric: consecutive_breach_count
threshold: 5
priority: P3
domains:
- my-domainConsecutive breach detection
Tuned to alert after a query breaches its runtime threshold 3 times in a row, reducing noise from one-off spikes. As always, all three metrics are present.
montecarlo:
query_performance:
- name: etl_consecutive_slowdowns
description: Alert after 3 consecutive slow runs in ETL pipeline queries
warehouse: prod-snowflake
selection:
- field_name: database
values:
- RAW
alert_conditions:
- metric: consecutive_breach_count
threshold: 3
- metric: runtime
threshold: 600000
- metric: runtime_change
threshold: 2.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
audiences:
- etl-ops
priority: P2
domains:
- my-domainTroubleshooting
Scheduling and scope
- Adding a
schedulefield. Query performance monitors run on Monte Carlo's internal collection cycle. There is no schedule field. Adding one causes a validation error. - Using
tableinstead ofselection. Query performance monitors do not have atablefield. Useselectionwithfield_name: tableto target specific tables.
Alert conditions and metrics
- Supplying fewer than three
alert_conditions. A query performance monitor must define exactly three conditions โ one each forruntime,runtime_change, andconsecutive_breach_count. Any other count fails withThere must be exactly 3 comparisons. Tune thresholds permissively for metrics you don't want to drive alerts. - Using uppercase metric names or lowercase
baseline_agg_function. Metric values are lowercase (runtime,runtime_change,consecutive_breach_count);baseline_agg_functionvalues are uppercase (MEDIAN,AVG). Getting the case wrong fails validation. - Omitting
baseline_agg_functionforruntime_change. This metric requires a baseline aggregation function (MEDIANorAVG). Omitting it causes a validation error. It also requires one ofthreshold_lookback_minutesorthreshold_lookback_data_points. - Setting both
threshold_lookback_minutesandthreshold_lookback_data_points. These are mutually exclusive. Use one or the other to define the baseline window. - Passing extra fields in
alert_conditions. The alert conditions schema for query performance monitors hasadditionalProperties: false. Including a field that belongs to other monitor types (likefields) causes a validation error. Note thatoperatoris a recognized field but only acceptsGThere โ supplyingoperator: AUTOfails with an enum error, not anadditionalPropertiesone. - Setting
thresholdto 0 or negative. All query performance metrics require a threshold greater than 0.
Other fields and update semantics
- Wrong
tagsformat. Tags must be objects withnameand optionalvaluekeys. Writingtags: ["my-tag"]fails validation. - Forgetting PUT semantics on updates. When updating a monitor by including
uuid, every field you omit reverts to its default โ it is not left unchanged. Always specify the complete desired configuration.
