发布日期: 2025-04-08
版本号: v22.0.0-rc1

Vitess v22.0.0 版本发布,包含多项重大变更和优化。主要更新包括:

  1. 废弃和删除内容

    • 废弃了部分 VTGate 指标和 CLI 标志
    • 移除了 gh-ost 和 pt-osc 在线 DDL 策略支持
    • 删除了过时的 vttablet 指标和 CLI 标志
  2. 新增指标

    • VTGate 新增查询执行、路由等监控指标
    • VTTablet 新增表行数、索引大小等数据库指标
  3. 配置变更

    • VTOrc 支持动态配置更新
    • VTGate 配置键名调整以匹配标志名称
  4. VTOrc 改进

    • 新增磁盘停滞恢复功能
    • 支持在集群监控中指定键范围
  5. 新功能支持

    • 更高效的 JSON 复制
    • 支持 LAST_INSERT_ID(x) 语法
    • 连接池新增最大空闲连接数配置
    • 新增仅记录错误查询的日志模式
  6. 优化改进

    • 预处理语句支持延迟优化
    • 新增 RPC 接口获取事务状态
    • 改进半同步监控机制
    • 事务错误处理优化
  7. 默认版本更新

    • MySQL 默认版本升级至 8.0.40
    • Docker 镜像基于 Debian Bookworm
  8. 其他改进

    • 拓扑读取并发控制优化
    • VTTablet ACL 重载机制改进
    • VTAdmin 升级至 Node.js v22.13.1 LTS

该版本共合并了 457 个 Pull Requests,完整变更日志可在 GitHub 查看。

更新内容 (中文)

详见原始内容

更新内容 (原始)

Release of Vitess v22.0.0

Summary

Table of Contents

Major Changes

Deprecations

Metrics

Component Metric Name Deprecation PR
vtgate QueriesProcessed #17727
vtgate QueriesRouted #17727
vtgate QueriesProcessedByTable #17727
vtgate QueriesRoutedByTable #17727

CLI Flags

Component Flag Name Notes Deprecation PR
vttablet twopc_enable Usage of TwoPC commit will be determined by the transaction_mode set on VTGate via flag or session variable. #17279
vtgate grpc-send-session-in-streaming Session will be sent as part of response on StreamExecute API call. #17907

Deletions

Metrics

Component Metric Name Was Deprecated In Deprecation PR
vttablet QueryCacheLength v21.0.0 #16289
vttablet QueryCacheSize v21.0.0 #16289
vttablet QueryCacheCapacity v21.0.0 #16289
vttablet QueryCacheEvictions v21.0.0 #16289
vttablet QueryCacheHits v21.0.0 #16289
vttablet QueryCacheMisses v21.0.0 #16289

CLI Flags

Component Flag Name Was Deprecated In Deprecation PR
vttablet queryserver-enable-settings-pool v21.0.0 #16280
vttablet remove-sharded-auto-increment v21.0.0 #16860
vttablet disable_active_reparents v20.0.0 #14871
vtgate, vtcombo, vtctld healthcheck-dial-concurrency v21.0.0 #16378

gh-ost and pt-osc Online DDL strategies

Vitess no longer recognizes the gh-ost and pt-osc (pt-online-schema-change) Online DDL strategies. The vitess strategy is the recommended way to make schema changes at scale. mysql and direct strategies continue to be supported.

These vttablet flags have been removed:

  • --gh-ost-path
  • --pt-osc-path

The use of gh-ost and pt-osc as strategies as follows, yields an error:

$ vtctldclient ApplySchema --ddl-strategy="gh-ost" ...
$ vtctldclient ApplySchema --ddl-strategy="pt-osc" ...

New Metrics

VTGate

Name Dimensions Description PR
QueryExecutions Query, Plan, Tablet Number of queries executed. #17727
QueryRoutes Query, Plan, Tablet Number of vttablets the query was executed on. #17727
QueryExecutionsByTable Query, Table Queries executed at vtgate, with counts recorded per table. #17727
VStreamsCount Keyspace, ShardName, TabletType Number of active vstream. #17858
VStreamsEventsStreamed Keyspace, ShardName, TabletType Number of events sent across all vstreams. #17858
VStreamsEndedWithErrors Keyspace, ShardName, TabletType Number of vstreams that ended with errors. #17858
CommitModeTimings Mode Timing metrics for commit (Single, Multi, TwoPC). #16939
CommitUnresolved N/A Counter for failure after Prepare. #16939

The work done in #17727 introduces new metrics for queries. Via this work we have deprecated several vtgate metrics, please see the Deprecated Metrics section. Here is an example on how to use them:

Query: select t1.a, t2.b from t1 join t2 on t1.id = t2.id
Shards: 2
Sharding Key: id for both tables

Metrics Published:
1. QueryExecutions – {select, scatter, primary}, 1
2. QueryRoutes – {select, scatter, primary}, 2
3. QueryExecutionsByTable – {select, t1}, 1 and {select, t2}, 1

VTTablet

Name Dimensions Description PR
TableRows Table Estimated number of rows in the table. #17570
TableClusteredIndexSize Table Byte size of the clustered index (i.e. row data). #17570
IndexCardinality Table, Index Estimated number of unique values in the index #17570
IndexBytes Table, Index Byte size of the index. #17570
UnresolvedTransaction ManagerType Number of events sent across all vstreams. #16939
CommitPreparedFail FailureType Number of vstreams that ended with errors. #16939
RedoPreparedFail FailureType Timing metrics for commit (Single, Multi, TwoPC) #16939

Config File Changes

VTOrc

The configuration file for VTOrc has been updated to now support dynamic fields. The old --config parameter has been removed. The alternative is to use the --config-file parameter. The configuration can now be provided in json, yaml or any other format that viper supports.

The following fields can be dynamically changed -

  1. instance-poll-time
  2. prevent-cross-cell-failover
  3. snapshot-topology-interval
  4. reasonable-replication-lag
  5. audit-to-backend
  6. audit-to-syslog
  7. audit-purge-duration
  8. wait-replicas-timeout
  9. tolerable-replication-lag
  10. topo-information-refresh-duration
  11. recovery-poll-duration
  12. allow-emergency-reparent
  13. change-tablets-with-errant-gtid-to-drained

To upgrade to the newer version of the configuration file, first switch to using the flags in your current deployment before upgrading. Then you can switch to using the configuration file in the newer release.

VTGate

The Viper configuration keys for the following flags has been changed to match their flag names. Previously they had a discovery prefix instead of it being part of the name.

Flag Name Old Configuration Key New Configuration Key
discovery_low_replication_lag discovery.low_replication_lag discovery_low_replication_lag
discovery_high_replication_lag_minimum_serving discovery.high_replication_lag_minimum_serving discovery_high_replication_lag_minimum_serving
discovery_min_number_serving_vttablets discovery.min_number_serving_vttablets discovery_min_number_serving_vttablets
discovery_legacy_replication_lag_algorithm discovery.legacy_replication_lag_algorithm discovery_legacy_replication_lag_algorithm

To upgrade to the newer version of the configuration keys, first switch to using the flags in your current deployment before upgrading. Then you can switch to using the new configuration keys in the newer release.


VTOrc

Stalled Disk Recovery

VTOrc can now identify and recover from stalled disk errors. VTTablets test whether the disk is writable and they send this information in the full status output to VTOrc. If the disk is not writable on the primary tablet, VTOrc will attempt to recover the cluster by promoting a new primary. This is useful in scenarios where the disk is stalled and the primary vttablet is unable to accept writes because of it.

To opt into this feature, --enable-primary-disk-stalled-recovery flag has to be specified on VTOrc, and --disk-write-dir flag has to be specified on the vttablets. --disk-write-interval and --disk-write-timeout flags can be used to configure the polling interval and timeout respectively.

KeyRanges in --clusters_to_watch

VTOrc now supports specifying keyranges in the --clusters_to_watch flag. This means that there is no need to restart a VTOrc instance with a different flag value when you reshard a keyspace.

For example, if a VTOrc is configured to watch ks/-80, then it would watch all the shards that fall under the keyrange -80. If a reshard is performed and -80 is split into new shards -40 and 40-80, the VTOrc instance will automatically start watching the new shards without needing a restart. In the previous logic, specifying ks/-80 for the flag would mean that VTOrc would watch only 1 (or no) shard. In the new system, since we interpret -80 as a key range, it can watch multiple shards as described in the example.

Users can continue to specify exact keyranges. The new feature is backward compatible.


New Default Versions

MySQL 8.0.40

The default major MySQL version used by our vitess/lite:latest image is going from 8.0.30 to 8.0.40. This change was brought by #17552.

VTGate also advertises MySQL version 8.0.40 by default instead of 8.0.30 if no explicit version is set. The users can set the mysql_server_version flag to advertise the correct version.

⚠️ Upgrading to this release with vitess-operator:

If you are using the vitess-operator, considering that we are bumping the patch version of MySQL 80 from 8.0.30 to 8.0.40, you will have to manually upgrade:

  1. Add innodb_fast_shutdown=0 to your extra cnf in your YAML file.
  2. Apply this file.
  3. Wait for all the pods to be healthy.
  4. Then change your YAML file to use the new Docker Images (vitess/lite:v22.0.0).
  5. Remove innodb_fast_shutdown=0 from your extra cnf in your YAML file.
  6. Apply this file.

This is the last time this will be needed in the 8.0.x series, as starting with MySQL 8.0.35 it is possible to upgrade and downgrade between 8.0.x versions without needing to run innodb_fast_shutdown=0.

Docker vitess/lite images with Debian Bookworm

The base system now uses Debian Bookworm instead of Debian Bullseye for the vitess/lite images. This change was brought by #17552.


New Support

More Efficient JSON Replication

In #7345 we added support for --binlog-row-value-options=PARTIAL_JSON. You can read more about this feature added to MySQL 8.0 here.

If you are using MySQL 8.0 or later and using JSON columns, you can now enable this MySQL feature across your Vitess cluster(s) to lower the disk space needed for binary logs and improve the CPU and memory usage in both mysqld (standard intrashard MySQL replication) and vttablet (VReplication) without losing any capabilities or features.

LAST_INSERT_ID(x)

In #17408 and #17409, we added the ability to use LAST_INSERT_ID(x) in Vitess directly at vtgate. This improvement allows certain queries—like SELECT last_insert_id(123); or SELECT last_insert_id(count(*)) ...—to be handled without relying on MySQL for the final value.

Limitations:

  • When using LAST_INSERT_ID(x) in ordered queries (e.g., SELECT last_insert_id(col) FROM table ORDER BY foo), MySQL sets the session’s last-insert-id value according to the last row returned. Vitess does not guarantee the same behavior.

Maximum Idle Connections in the Pool

In #17443 we introduced a new configurable max-idle-count parameter for connection pools. This allows you to specify the maximum number of idle connections retained in each connection pool to optimize performance and resource efficiency.

You can control idle connection retention for the query server’s query pool, stream pool, and transaction pool with the following flags: • –queryserver-config-query-pool-max-idle-count: Defines the maximum number of idle connections retained in the query pool. • –queryserver-config-stream-pool-max-idle-count: Defines the maximum number of idle connections retained in the stream pool. • –queryserver-config-txpool-max-idle-count: Defines the maximum number of idle connections retained in the transaction pool.

This feature ensures that, during traffic spikes, idle connections are available for faster responses, while minimizing overhead in low-traffic periods by limiting the number of idle connections retained. It helps strike a balance between performance, efficiency, and cost.

Filtering Query logs on Error

The querylog-mode setting can be configured to error to log only queries that result in errors. This option is supported in both VTGate and VTTablet.


Optimization

Prepared Statement

Prepared statements now benefit from Deferred Optimization, enabling parameter-aware query plans. Initially, a baseline plan is created at prepare-time, and on first execution, a more efficient parameter-optimized plan is generated. Subsequent executions dynamically switch between these plans based on input values, improving query performance while ensuring correctness.


RPC Changes

These are the RPC changes made in this release -

  1. GetTransactionInfo RPC has been added to both VtctldServer, and TabletManagerClient interface. These RPCs are used to facilitate the users in reading the state of an unresolved distributed transaction. This can be useful in debugging what went wrong and how to fix the problem.

Prefer not promoting a replica that is currently taking a backup

Emergency reparents now prefer not promoting replicas that are currently taking backups with a backup engine other than builtin. Note that if there’s only one suitable replica to promote, and it is taking a backup, it will still be promoted.

For planned reparents, hosts taking backups with a backup engine other than builtin are filtered out of the list of valid candidates. This means they will never get promoted - not even if there’s no other candidates.

Note that behavior for builtin backups remains unchanged: a replica that is currently taking a builtin backup will never be promoted, neither by planned nor by emergency reparents.


Semi-sync monitor in vttablet

A new component has been added to the vttablet binary to monitor the semi-sync status of primary vttablets. We’ve observed cases where a brief network disruption can cause the primary to get stuck indefinitely waiting for semi-sync ACKs. In rare scenarios, this can block reparent operations and render the primary unresponsive. More information can be found in the issues #17709 and #17749.

To address this, the new component continuously monitors the semi-sync status. If the primary becomes stuck on semi-sync ACKs, it generates writes to unblock it. If this fails, VTOrc is notified of the issue and initiates an emergency reparent operation.

The monitoring interval can be adjusted using the --semi-sync-monitor-interval flag, which defaults to 10 seconds.


Wrapped fatal transaction errors

When a query fails while being in a transaction, due to the transaction no longer being valid (e.g. PRS, rollout, primary down, etc), the original error is now wrapped around a VT15001 error.

For non-transactional queries that produce a VT15001, VTGate will try to rollback and clear the transaction. Any new queries on the same connection will fail with a VT09032 error, until a ROLLBACK is received to acknowledge that the transaction was automatically rolled back and cleared by VTGate.

VT09032 is returned to clients to avoid applications blindly sending queries to VTGate thinking they are still in a transaction.

This change was introduced by #17669.


Minor Changes

--topo_read_concurrency behaviour changes

The --topo_read_concurrency flag was added to all components that access the topology and the provided limit is now applied separately for each global or local cell (default 32).

All topology read calls (Get, GetVersion, List and ListDir) now respect this per-cell limit. Previous to this version a single limit was applied to all cell calls and it was not respected by many topology calls.


VTTablet

CLI Flags

  • twopc_abandon_age flag now supports values in the time.Duration format (e.g., 1s, 2m, 1h). While the flag will continue to accept float values (interpreted as seconds) for backward compatibility, float inputs are deprecated and will be removed in a future release.

  • --consolidator-query-waiter-cap flag to set the maximum number of clients allowed to wait on the consolidator. The default value is set to 0 for unlimited wait. Users can adjust this value based on the performance of VTTablet to avoid excessive memory usage and the risk of being OOMKilled, particularly in Kubernetes deployments.

ACL enforcement and reloading

When a tablet is started with --enforce-tableacl-config it will exit with an error if the contents of the file are not valid. After the changes made in #17485 the tablet will no longer exit when reloading the contents of the file after receiving a SIGHUP. When the file contents are invalid on reload the tablet will now log an error and the active in-memory ACLs remain in effect.


VTAdmin

vtadmin-web updated to node v22.13.1 (LTS)

Building vtadmin-web now requires node >= v22.13.0 (LTS). Breaking changes from v20 to v22 can be found at https://nodejs.org/en/blog/release/v22.13.0 – with no known issues that apply to VTAdmin. Full details on the node v20.12.2 release can be found at https://nodejs.org/en/blog/release/v22.13.1.


The entire changelog for this release can be found here.

The release includes 457 merged Pull Requests.

Thanks to all our contributors: @GrahamCampbell, @GuptaManan100, @L3o-pold, @akagami-harsh, @anirbanmu, @app/dependabot, @app/vitess-bot, @arthmis, @arthurschreiber, @beingnoble03, @c-r-dev, @corbantek, @dbussink, @deepthi, @derekperkins, @ejortegau, @frouioui, @garfthoffman, @gmpify, @gopoto, @harshit-gangal, @huochexizhan, @jeefy, @jwangace, @kbutz, @lmorduch, @mattlord, @mattrobenolt, @maxenglander, @mcrauwel, @mounicasruthi, @niladrix719, @notfelineit, @rafer, @rohit-nayak-ps, @rvrangel, @shailpujan88, @shanth96, @shlomi-noach, @siadat, @systay, @timvaillancourt, @twthorn, @vitess-bot, @vmg, @wiebeytec, @wukuai

下载链接