longhorn v1.8.0-rc2 版本更新介绍
发布日期: 2025-01-02
版本号: v1.8.0-rc2
本次发布解决了多个问题,包括功能改进、性能优化和Bug修复。主要亮点包括:v2卷支持数据本地性、SPDK卷支持实时迁移、多备份存储支持、v2卷基于快照校验和的增量副本重建、v2卷支持加密、支持通过Helm控制器安装Longhorn Helm图表、v2卷支持自动修复、v2卷支持DR卷、Talos对v2数据引擎的支持等。此外,还修复了多个Bug,如卷升级失败、备份存储问题、数据丢失、卷挂载失败等。性能方面,进行了v1.7.0的性能基准测试,并对未来Longhorn性能进行了调查。其他改进包括加密卷在线扩展支持、文档更新、CVE问题修复等。贡献者包括COLDTURNIP、ChanYiLin、DamiaSan、PhanLe1010等。
更新内容 (中文)
详见原始内容
更新内容 (原始)
DON’T UPGRADE from/to any RC/Preview/Sprint releases because the operation is not supported.
Resolved Issues in this release
Highlight
- [IMPROVEMENT] v2 volume supports data locality 9371 - @derekbit @roger-ryao
- [FEATURE] SPDK volume supports live migration 6361 - @PhanLe1010 @chriscchien @roger-ryao
- [FEATURE] Multiple backup stores support 5411 - @mantissahz @roger-ryao
- [FEATURE] Support v2 volume delta replica rebuilding based on snapshot checksum: SPDK part 5573 - @DamiaSan @roger-ryao
- [FEATURE] Support backing image for v2 volume 6341 - @yangchiu @ChanYiLin @chriscchien
- [FEATURE] Support v2 volume delta replica rebuilding based on snapshot checksum: data plane and control plane 9488 - @shuo-wu
- [FEATURE] Make CPU core configurable for v2 data engine 8835 - @yangchiu @derekbit
- [FEATURE] Support encrypted v2 volumes 7355 - @mantissahz
- [FEATURE] Support/verify installing Longhorn Helm chart using helm-controller which is built in k3s and rke2 9506 - @yangchiu @james-munson
- [FEATURE] v2 volume supports auto salvage 8430 - @c3y1huang @chriscchien
- [FEATURE] v2 volumes supports DR volume 6613 - @c3y1huang @roger-ryao
- [FEATURE] Talos support for v2 data engine 7791 - @yangchiu @c3y1huang
Feature
- [FEATURE] Missing option to modify backup-task retain value from UI while editing recurring job 8810 - @houhoucoop @roger-ryao
- [FEATURE] Support Data Engine Option For Backing Image in CSI flow 10084 -
- [UI][FEATURE] Multiple backup stores support 8647 - @a110605 @roger-ryao
- [UI][FEATURE] Support V2 Backing Image in UI 9880 - @yangchiu @chriscchien @houhoucoop
- [FEATURE] Add periodic HugePages (2Mi) configuration check to ensure v2 data engine compatibility 9983 - @yangchiu @jangseon-ryu
- [FEATURE][UI] Fill in the secret and secret namespace when restoring the BackupBackingImage 9490 - @chriscchien @houhoucoop
- [FEATURE] Add kernel module check for data engine v2 9915 - @jangseon-ryu @roger-ryao
- [FEATURE]: Automatic online filesystem resize for RWX volumes. 9736 - @yangchiu @james-munson
- [FEATURE] Add recurring job label to backup metrics or empty value 9429 - @ChanYiLin @roger-ryao
- [FEATURE] Longhorn CLI supports darwin platform 9532 - @derekbit @chriscchien
Improvement
- [IMPROVEMENT] Separate the default backup target settings from default settings 10089 - @mantissahz
- [IMPROVEMENT] No need to start
tgtd
in an instance-manager pod for v2 data engine 9941 - @derekbit @roger-ryao - [IMPROVEMENT] Configure the log level of other system and user managed components via longhorn manager setting 6702 - @james-munson @roger-ryao
- [IMPROVEMENT] Longhorn CLI should install
cryptsetup
9315 - @mantissahz @roger-ryao - [IMPROVEMENT] V2 Data engine avoids changing the data of a lvol when decoupling it from 9922 - @DamiaSan @roger-ryao
- [IMPROVEMENT] Collect and display disk space usage for the backing images 8757 - @ChanYiLin @roger-ryao
- [IMPROVEMENT] Check NFS versions in /etc/nfsmount.conf instead 9830 - @COLDTURNIP @roger-ryao
- [IMPROVEMENT] Change misleading error message to warning level 9916 - @yangchiu @derekbit
- [IMPROVEMENT] No need to redirect
spdk_tgt
log to/var/log/spdk_tgt.log
9926 - @derekbit @chriscchien - [UI][IMPROVEMENT] Make backup wait until there is no backup being delete and Add the progress time 8750 - @a110605
- [IMPROVEMENT] Allow specify data engine version for the default storageclass during Helm installation 9584 - @shuo-wu @chriscchien
- [IMPROVEMENT] Building longhorn-manager takes long time 8744 - @derekbit @chriscchien
- [IMPROVEMENT] Reject strict-local + RWX volume creation 6735 - @COLDTURNIP @chriscchien
- [IMPROVEMENT] Restored volume or other operations should be allowed on the old and running instance-manager pods 9383 - @derekbit @chriscchien
- [IMPROVEMENT] Make backup deletion async and force backup creation wait until there is no backup being delete 8746 - @ChanYiLin @roger-ryao
- [IMPROVEMENT][UI] Attach table search keyword to URL on Backup pages 9974 - @chriscchien @houhoucoop
- [IMPROVEMENT] Busrt ISCSI Connection Errors, and IM Pod Restarting to make LH Volume disconnection 9851 - @ChanYiLin @chriscchien
- [IMPROVEMENT] Why is it not possible to change the replica count in v2 longhorn volume? 9805 - @hookak @chriscchien
- [IMPROVEMENT][UI] Add backupBackingImage table in backup page with tabs 8956 - @chriscchien @houhoucoop
- [IMPROVEMENT] Decrease the reconnect delay of v2 volume nvme initiator 9818 - @derekbit @chriscchien
- [UI][IMPROVEMENT] Improve the volume size information on UI 8843 - @yangchiu @houhoucoop
- [IMPROVEMENT] Add dmsetup and dmcrypt utilities check in cli 8217 - @COLDTURNIP @roger-ryao
- [UI IMPROVEMENT] Collect and display disk space usage for the backing images 9325 - @chriscchien @houhoucoop
- [IMPROVEMENT] Check kernel module
dm_crypt
on host machines 9153 - @mantissahz @roger-ryao - [IMPROVEMENT] Logging the reason why the instance manager pod is going to be deleted. 9886 - @derekbit @chriscchien
- [IMPROVEMENT] Remove isCLIAPIVersionOne related codes 7191 - @derekbit @chriscchien
- [IMPROVEMENT] Prevent Volume Resize Stuck 6633 - @c3y1huang @roger-ryao
- [IMPROVEMENT] Expose EngineImage CLIAPIVersion and ControllerAPIVersion 6531 - @chriscchien @houhoucoop
- [IMPROVEMENT] Update longhorn-share-manager’s nfs-ganesha to v6.2 9614 - @derekbit @chriscchien
- [UI][IMPROVEMENT] Display
ReUploadedDataSize
andNewlyUploadedDataSize
of each backup on Backup page 8975 - @yangchiu @houhoucoop - [IMPROVEMENT] Remove the default CSI component images in longhorn-manager 9580 - @yangchiu @c3y1huang
- [IMPROVEMENT] the
if-not-present
volume backup policy should also check if the latest backup is up-to-date 6027 - @c3y1huang @chriscchien - [IMPROVEMENT] longhorn/go-common-libs should support Darwin 9487 - @Yu-Jack @chriscchien
- [IMPROVEMENT] Enable Prometheus metrics of the CSI sidecar components 7938 - @yangchiu @ChanYiLin
- [IMPROVEMENT] Talos support for environment check in longhorn manager 9558 - @yangchiu @c3y1huang
- [IMPROVEMENT] Refactor condition icon and text in edit node and disk modal 9238 - @a110605
- [IMPROVEMENT] Read-only volume monitoring check 8508 - @ChanYiLin @chriscchien
- [IMPROVEMENT] Fix contradicting node status events 7738 - @yangchiu @ejweber
- [IMPROVEMENT] Informative warning message for the failed backup/restore lock acquisition 8713 - @ChanYiLin @roger-ryao
- [IMPROVEMENT] backup backing image should store the secret and secret namespace if it is encrypted 8884 - @yangchiu @ChanYiLin
- [IMPROVEMENT] Start searching for the new available port from the last allocated port instead of starting from 0th port. 8598 - @james-munson @chriscchien
- [IMPROVEMENT] Resilience handling for the last replica timeout 8711 - @yangchiu @ejweber
- [IMPROVEMENT] update
azure-sdk-for-go
version to stable version 8965 - @mantissahz @chriscchien - [IMPROVEMENT]
toomanysnapshots
UI element not prominent enough to prevent runaway snapshots 6560 - @a110605 @roger-ryao
Bug
- [BUG] Engine Upgrade to 1.7.1 fails on volumes with strict-local data locality 9389 - @yangchiu @james-munson
- [BUG] Default disk registration sets Storage Reserved to 0 instead of using storage-reserved-percentage-for-default-disk value (Engine v2 block type only) 9871 - @jangseon-ryu @chriscchien
- [BUG] Detached Volume Stuck in Attached State During Node Eviction 9781 - @c3y1huang @chriscchien
- [BUG] Test Case
test_setting
Randomly Fails 10077 - @mantissahz - [BUG] Data Not Retained After Restoring System Backup 10058 - @mantissahz @chriscchien @roger-ryao
- [BUG][v1.8.0-rc1] Default backup target is periodically cleared 10043 - @mantissahz @roger-ryao
- [BUG] race condition of bdev map get could lead to replica being deleted from the spdk server 9953 - @derekbit @shuo-wu @roger-ryao
- [BUG] [v1.8.0-rc1] Uninstallation fail if having backing images, the instance-manager pod stuck at terminating 10044 - @ChanYiLin @chriscchien
- [BUG] v2 Volume Data Locality
Disabled
Behavior Incorrectly (Switches toBest-Effort
After Node Drain) 9591 - @mantissahz @roger-ryao - [BUG] [v1.8.0-rc1] v2 backing image disk download randomly stuck at File processing is not started 10075 - @ChanYiLin @chriscchien
- [BUG] Test case
test_csi_encrypted_block_volume
fails on GKE COS_CONTAINERD 10049 - @c3y1huang @chriscchien - [BUG] Test case test_system_backup_and_restore_volume_with_backingimage fails 10057 - @mantissahz @chriscchien
- [BUG] longhorn-manager seems to crash rpm-DB on the host by continously calling rpm -q … 10019 - @COLDTURNIP @roger-ryao
- [BUG][v1.8.x] v2 volume could get stuck in attaching/attached/detaching/detached loop after cluster restart 10033 - @derekbit @chriscchien
- [BUG] Backing Image Creation with Multiple Copies Fails in Loop 9976 - @ChanYiLin @roger-ryao
- [BUG] Webhook servers initialization blocks longhorn-manager from running 10054 - @c3y1huang @chriscchien
- [BUG] Missing
fromBackup
Parameter in API Request When Restoring Multiple Files from Backup List 10050 - @a110605 @roger-ryao - [BUG] Unable to backup backing image through Longhorn UI 10023 - @a110605 @ChanYiLin @chriscchien
- [BUG] Unable to update volume backup target in volume detail page 10045 - @a110605 @chriscchien
- [BUG] Unable to delete backing image backup through UI 10047 - @chriscchien @houhoucoop
- [BUG] Frequent Failures When Uploading 512MB Backing Image 9975 - @ChanYiLin @roger-ryao
- [BUG] V2 volume checksum kept changing 9998 - @PhanLe1010
- [BUG][v1.8.0-rc1] Error Status When Setting BackupTarget to AWS S3 10026 - @mantissahz @roger-ryao
- [BUG] Unable to restore latest backup from backup page 10046 - @mantissahz @roger-ryao
- [BUG] Raw format backing image unable to transfer to other nodes to maintain minumum number of copies in Talos v1.8.3 9882 - @yangchiu @ChanYiLin
- [BUG][v1.8.x] Unable to add block disk after node deleted and added back 10035 - @yangchiu @derekbit
- [BUG] Engine stuck in “stopped” state, prevent volume attach 9938 - @ChanYiLin
- [TASK] Fix checksum inconsistent issue after closing the file handler 9876 - @ChanYiLin
- [BUG] Test case
test_backup_volume_list
failed: failed to find backup inbv.backupList().data
9987 - @yangchiu @mantissahz - [BUG] BackupBackingImage Create failed because it failed to get the BackingImage 10020 - @ChanYiLin @chriscchien
- [BUG] CSI plugin pod keep crashing util the backup volume appears when creation a backup via the CSI snapshotter 10008 - @mantissahz @chriscchien @roger-ryao
- [BUG] Error creating backup for v2 volume 10013 - @yangchiu @ChanYiLin
- [BUG] Backup does not appear on the backup page 9985 - @yangchiu @mantissahz
- [BUG] volume FailedMount - Input/output error 9939 - @yangchiu @PhanLe1010
- [BUG] (v2 volume) orphan longhorn device and dm device on the node when IM pod crash 9959 - @c3y1huang @chriscchien
- [BUG] Test case
Stopped replicas on deleted nodes should not be counted as healthy replicas when draining nodes
fails 9616 - @yangchiu @derekbit - [BUG] Test case test_node_eviction_multiple_volume failed to reschedule replicas after volume detached 9857 - @yangchiu @c3y1huang
- [BUG] Fast-failover lease needs a fail-safe mechanism to clear delinquent state. 9093 - @yangchiu @james-munson
- [BUG] Fail to create v2 volume from VolumeSnapshot with Source:snapshotHandle as Backup 9965 - @derekbit @chriscchien
- [BUG] v2 volume restoration fail if the backup is created by csi-snapshotter 9962 - @derekbit @chriscchien
- [BUG] v2 volume stuck in crash loop due to replica r.ActiveChain is out of sync 9942 - @shuo-wu @roger-ryao
- [BUG] Old backups are not cleaned up after timeout 8319 - @yangchiu @mantissahz
- [BUG] CLI check preflight glosses over absence of NFS installation. 9495 - @COLDTURNIP @roger-ryao
- [BUG] Backport lvol flush implementation fron
spdk:v24.01
tospdk:v24.09
and fix memory leak 8730 - @yangchiu @DamiaSan - [BUG] v2 volume cannot be attach after upgrading 9943 - @derekbit @roger-ryao
- [BUG] Longhorn is unable to provision volumes larger than 20TB 9221 - @derekbit @chriscchien @DamiaSan
- [BUG] Longhorn v2 volume is stuck in detaching/attaching loop forever if replica crash 9919 - @derekbit @chriscchien
- [BUG] v2 volume may enter ERROR state after deleting an instance manager containing a replica 9874 - @derekbit @roger-ryao
- [BUG] DR volume fails to reattach and faulted after node stop and start during incremental restore 9752 - @c3y1huang @roger-ryao
- [BUG] Backup progress should not add block failed to upload to successful count 9791 - @derekbit @chriscchien
- [BUG] v2 volume faulted during degraded availability restore 9852 - @c3y1huang @chriscchien
- [BUG] Longhorn volume cannot attach to new node permanently if shutdown the current attached node. 9854 - @yangchiu @PhanLe1010
- [BUG] Replica deletion fails at the end of longhorn-spdk-engine unit test 9121 - @shuo-wu @chriscchien
- [BUG] UnknowOS Message in Longhorn Node Condition on RHEL 9829 - @yangchiu @mantissahz @jangseon-ryu
- [BUG] Failed to inspect the backup backing image information if NFS backup target URL with options 9702 - @yangchiu @mantissahz
- [BUG] Longhorn keeps resetting my storageClass 9391 - @yangchiu @mantissahz
- [BUG] Pre-upgrade pod should event the reason for any failures. 9569 - @yangchiu @james-munson
- [BUG] nvme cli somehow shows the error
malloc(): unsorted double linked list corrupted
7693 - @DamiaSan @roger-ryao - [BUG] Host machine is somehow reboot when running into IO timeout 7697 - @DamiaSan @roger-ryao
- [BUG]
spdk_tgt
is somehow crashed 7559 - @DamiaSan - [BUG] longhornctl install preflight –operating-system=cos failed on COS_CONTAINERD 9664 - @c3y1huang @chriscchien
- [BUG] PV Annotation Isn’t Updated After Creating An Oversize Volume 9514 - @c3y1huang @chriscchien
- [BUG] System Restore Stuck at Pending due to Tolerations not Applied 8656 - @yangchiu @c3y1huang
- [BUG] All Backups are lost in the Backup Target if the NFS Service Disconnects and Reconnects again 9530 - @yangchiu @mantissahz
- [BUG] Instance manager missing required selector labels after manager crash 9464 - @c3y1huang @chriscchien
- [BUG] Single Replica Node Down test cases fail 9622 - @yangchiu @c3y1huang
- [BUG] Disks modal broken layout 9629 - @a110605 @roger-ryao
- [BUG] Volume creation violated numberOfReplicas spec causing an unexpected engine image reference count 8573 - @yangchiu @ChanYiLin
- [BUG] kubectl drain node is blocked by unexpected orphan engine processes 6552 - @yangchiu @PhanLe1010 @ejweber
- [BUG] Remove unnecessary restart of RWX workload. 9095 - @yangchiu @james-munson
- [BUG] Fix test case test_rwx_delete_share_manager_pod failure after changes to RWX workload restart. 9504 - @yangchiu @james-munson
- [BUG] longhorn-spdk-engine fail to build 9517 - @derekbit
- [BUG] Volume stuck in degraded 9279 - @yangchiu @PhanLe1010
- [BUG] Resize RWX PVC Fails 6050 - @james-munson @roger-ryao
- [BUG] Longhorn did not close and open encrypted volumes correctly when the service k3s-agent restarted for a while 9385 - @yangchiu @mantissahz
- [BUG] Faulted RWX volume upon creation 7802 - @yangchiu @james-munson
- [BUG] Accidentally encountered a single replica volume backup stuck at progress 17% indefinitely after a node rebooted 9168 - @ChanYiLin
- [BUG] Longhorn thinks node is unschedulable 9011 - @c3y1huang @roger-ryao
- [BUG] System Backup Fails and DR Volume Enters Attach-Detach Loop When Volume Backup Policy is Set to
Always
9330 - @c3y1huang @roger-ryao - [BUG] v1 volume replica rebuld fail after upgade from v1.7.0 to v1.7.1-rc1 9331 - @yangchiu @PhanLe1010
- [BUG] LH fails silently when node has attached volumes 9183 - @yangchiu @ejweber @w13915984028
- [BUG] Fix security issues in v1.7.1 RC images 9354 - @c3y1huang
- [BUG] Some volumes stuck in “Attaching” state after upgrade to 1.7.0 9267 - @ChanYiLin @roger-ryao
- [BUG] [Backupstore] Need to close the reader after downloading files for the Azure backup store driver. 9281 - @yangchiu @mantissahz
- [BUG] v2-data-engine setting validator doesn’t take disabled nodes into account when checking hugepages 9319 - @tserong @roger-ryao
- [BUG] error logs appeared in uninstallation job 9303 - @ChanYiLin @chriscchien
- [BUG] Incorrect NFS endpoint after enable/disable storage network for RWX volume 9272 - @Vicente-Cheng @roger-ryao
- [BUG][v1.7.x-head] Test case
test_dr_volume_with_backup_block_deletion_abort_during_backup_in_progress
failed due tofailed lock *.lck type 1 acquisition
9037 - @ChanYiLin @chriscchien - [BUG] Test case test_support_bundle_should_not_timeout timed out 8507 - @c3y1huang
Performance
- [TASK] An Investigation for Future Longhorn performance 9552 - @PhanLe1010
- [TASK] Performance benchmark of v1.7.0 9350 - @derekbit
Misc
- [IMPROVEMENT] Encrypted volume online expansion support after k8s 1.29 9902 - @COLDTURNIP @mantissahz @chriscchien
- [DOC] Add an important note about checking items before upgrade 9440 - @james-munson @chriscchien
- [TASK] Fix CVE issues for v1.8.0 9895 - @c3y1huang
- [TASK] Verify Longhorn v1.8.0 on Talos v1.8.3 9764 - @yangchiu @roger-ryao
- [TASK] Bump nfs-ganesha to v6.3 9979 - @james-munson @chriscchien
- [TASK] No need to add
kind/test
tickets to Longhorn Sprint board 9873 - @yangchiu - [TASK] Creation of longhorn-v24.09 branch 9700 - @DamiaSan @roger-ryao
- [DOC] Emphasize the backupstore cannot set retention policy 9795 - @derekbit @roger-ryao
- [FEATURE] Inquiry: Storage Reserved Change for Block Type Disks in Engine v2 9842 - @derekbit @jangseon-ryu @chriscchien
- [DOC] Use experimental for beta features and preview for alpha features 9794 - @derekbit
- [TASK] Install the latest grpc_health_probe at build time 9714 - @yangchiu @c3y1huang
- [TASK] Upgrade nvme-cli to v2.10.2 9739 - @derekbit
- [TASK] Fix CVE issues in support bundle 9658 - @yangchiu @c3y1huang
- [TASK] Update CSI components 9561 - @c3y1huang
- [DOC] Clarify RWX volume expansion documentation 6216 - @james-munson @roger-ryao
- [DOC] Update
Kubernetes Version
inBest Practices
9497 - @derekbit - [TASK] Check Longhorn compatibility with more kind of Linux distros 4900 - @ChanYiLin @roger-ryao
- [TASK] Down size longhorn CLI 8995 - @c3y1huang
Contributors
- @COLDTURNIP
- @ChanYiLin
- @DamiaSan
- @PhanLe1010
- @Vicente-Cheng
- @Yu-Jack
- @a110605
- @c3y1huang
- @chriscchien
- @derekbit
- @ejweber
- @hookak
- @houhoucoop
- @innobead
- @james-munson
- @jangseon-ryu
- @mantissahz
- @roger-ryao
- @shuo-wu
- @tserong
- @w13915984028
- @yangchiu