发布日期: 2025-04-04
版本号: v1.15.4

Dapr 1.15.4版本主要修复了以下问题:

  1. 解决了工作流运行时性能随时间下降的问题,通过重构调度器连接池逻辑防止任务执行在失效连接上;
  2. 修正了远程Actor调用时对500状态码的错误重试机制;
  3. 修复了全局Actor启用配置问题,现在sidecar注入器会正确遵守相关配置;
  4. 防止了Actor启动缓慢时提醒操作导致的panic;
  5. 移除了Sentry中的客户端速率限制器,解决了大规模部署时的冷启动问题;
  6. 在sidecar注入器中为MetalBear mirrord操作器添加了服务账户支持;
  7. 修复了调度器客户端连接修剪问题,现在会正确关闭失效连接。

这些修复涉及工作流性能、Actor调用、配置处理、稳定性等多个方面,提升了Dapr运行时的可靠性和性能表现。

更新内容 (中文)

Dapr 1.15.4

本次更新包含以下错误修复:

修复工作流运行时性能随时间下降的问题

问题

多次运行工作流应用会导致工作流运行时性能在多轮运行后显著下降。

影响

工作流应用无法按时完成。

根本原因

调度器客户端(daprd)连接未能从特定命名空间的appID/actorTypes集合的连接池中正确修剪。这会导致作业/Actor提醒被发送到不再活跃的陈旧客户端连接上,进而导致作业失败并进入失败策略重试循环。

解决方案

重构调度器连接池逻辑,正确修剪陈旧连接,防止作业在陈旧连接上执行并导致失败策略循环。

修复远程Actor调用500重试问题

问题

跨主机Actor调用返回500 HTTP头响应码时,请求会被重试5次。

影响

正常操作下返回500 HTTP头响应码的服务会导致请求响应缓慢,并对同一请求多次调用服务。

根本原因

Actor引擎将500 HTTP头响应码视为可重试错误,而非返回非200状态码的成功请求。

解决方案

从可重试错误列表中移除500 HTTP头响应码。

修复全局Actor启用配置

问题

当通过Helm设置global.actors.enabled=false或环境变量ACTORS_ENABLED=false时,Dapr边车仍会尝试连接placement服务,导致就绪探针失败并重复记录连接placement失败的错误日志。修复此问题

影响

Dapr边车会就绪探针失败并记录类似错误:

Failed to connect to placement dns:///dapr-placement-server.dapr-system.svc.cluster.local:50005: failed to create placement client: rpc error: code = Unavailable desc = last resolver error: produced zero addresses

根本原因

边车注入器在设置placement服务连接时未正确遵守全局Actor启用配置。

解决方案

边车注入器现在正确遵守global.actors.enabled helm配置和ACTORS_ENABLED环境变量。当设置为false时,不会尝试连接placement服务,允许边车在没有Actor功能的情况下成功启动。

防止Actor启动缓慢时提醒操作导致的崩溃

问题

当Actor启动过程中提醒操作超时时,Dapr运行时HTTP服务器会发生崩溃。

影响

HTTP服务器崩溃导致性能下降。

根本原因

Dapr运行时在提醒服务未初始化前就尝试使用它。

解决方案

正确返回错误信息表明Actor运行时未及时就绪以处理提醒操作。

从Sentry移除客户端速率限制器

问题

大量Dapr部署冷启动耗时很长,甚至导致某些崩溃循环。

影响

大规模Dapr部署完全启动所需时间与小规模部署相比呈非线性增长。

根本原因

Sentry Kubernetes客户端配置了速率限制器,当所有新Dapr部署同时启动时会耗尽限制,导致许多客户端长时间等待。

解决方案

从Sentry Kubernetes客户端移除客户端速率限制。

在边车注入器中允许MetalBear mirrord operator的服务账户

问题

mirrord operator不在dapr边车注入器的服务账户白名单中。

影响

copy_target模式下运行mirrord会导致pod初始化时不包含dapr容器。

根本原因

mirrord operator不在dapr边车注入器的服务账户白名单中。

解决方案

将mirrord operator加入dapr边车注入器的服务账户白名单。

修复调度器客户端连接修剪问题

问题

Daprd会尝试连接陈旧的调度器地址。

影响

网络资源使用和服务网格边车的错误报告。

根本原因

Daprd不会关闭与已不存在主机的调度器gRPC连接。

解决方案

Daprd现在会在调度器主机不再活跃列表时关闭与这些主机的连接。

更新内容 (原始)

Dapr 1.15.4

This update includes bug fixes:

Fix degradation of Workflow runtime performance over time

Problem

Running a Workflow app multiple times would cause the performance of the Workflow runtime to degrade significantly over multiple runs.

Impact

Workflow applications would not complete in a timely manner.

Root cause

There was an issue whereby Scheduler client (daprd) connections where not properly pruned from the connection pool for a given Namespace’s appID/actorTypes set. This would lead to jobs/actor reminders being sent to stale client connections that were no longer active. This caused Jobs to fail, and enter failure policy retry loops.

Solution

Refactor the Scheduler connection pool logic to properly prune stale connections to prevent job execution occurring on stale connections and causing failure policy loops.

Fix remote Actor invocation 500 retry

Problem

An actor invocation across hosts which result in a 500 HTTP header response code would result in the request being retried 5 times.

Impact

Services which return a 500 HTTP header response code would result in requests under normal operation to return slowly, and request the service on the same request multiple times.

Root cause

The Actor engine considered a 500 HTTP header response code to be a retriable error, rather than a successful request which returned a non-200 status code.

Solution

Remove the 500 HTTP header response code from the list of retriable errors.

Problem

Fix Global Actors Enabled Configuration

Problem

When global.actors.enabled was set to false via Helm or the environment variable ACTORS_ENABLED=false, the Dapr sidecar would still attempt to connect to the placement service, causing readiness probe failures and repeatedly logged errors about failing to connect to placement. Fixes this issue.

Impact

Dapr sidecars would fail their readiness probes and log errors like:

Failed to connect to placement dns:///dapr-placement-server.dapr-system.svc.cluster.local:50005: failed to create placement client: rpc error: code = Unavailable desc = last resolver error: produced zero addresses

Root cause

The sidecar injector was not properly respecting the global actors enabled configuration when setting up the placement service connection.

Solution

The sidecar injector now properly respects the global.actors.enabled helm configuration and ACTORS_ENABLED environment variable. When set to false, it will not attempt to connect to the placement service, allowing the sidecar to start successfully without actor functionality.

Prevent panic of reminder operations on slow Actor Startup

Problem

The Dapr runtime HTTP server would panic if a reminder operation timed out while an Actor was starting up.

Impact

The HTTP server would panic, causing degraded performance.

Root cause

The Dapr runtime would attempt to use the reminder service before it was initialized.

Solution

Correctly return an errors that the actor runtime was not ready in time for the reminder operation.

Remove client-side rate limiter from Sentry

Problem

A cold start of many Dapr deployments would take a long time, and even cause some crash loops.

Impact

A large Dapr deployment would take a non-linear more amount of time that a smaller one to completely roll out.

Root cause

The Sentry Kubernetes client was configured with a rate limiter which would be exhausted when services all new Dapr deployment at once, cause many client to wait significantly.

Solution

Remove the client-side rate limiting from the Sentry Kubernetes client.

Allow Service Account for MetalBear mirrord operator in sidecar injector

Problem

Mirrord Operator is not on the allow list of Service Accounts for the dapr sidecar injector.

Impact

Running mirrord in copy_target mode would cause the pod to initalise without the dapr container.

Root cause

Mirrord Operator is not on the allow list of Service Accounts for the dapr sidecar injector.

Solution

Add the Mirrord Operator into the allow list of Service Accounts for the dapr sidecar injector.

Fix Scheduler Client connection pruning

Problem

Daprd would attempt to connect to stale Scheduler addresses.

Impact

Network resource usage and error reporting from service mesh sidecars.

Root cause

Daprd would not close Scheduler gRPC connections to hosts which no longer exist.

Solution

Daprd now closes connections to Scheduler hosts when they are no longer in the list of active hosts.

下载链接