gorse v0.3.2 版本更新介绍
发布日期: 2022-02-02
版本号: v0.3.2
Gorse新增通过环境变量修改配置的支持,提供8个环境变量分别对应不同配置项(如GORSE_CACHE_STORE对应缓存数据库,GORSE_MASTER_PORT对应主端口)。引入实验性IVF和HNSW向量索引搜索功能,可通过enable_xxx_index参数启用,通过xxx_index_recall设置预期召回率,xxx_index_fit_epoch控制训练轮次,在精度与性能间取得平衡。修复内容包括Redis分批写入优化、空JSON字符串支持、推荐API偏移量溢出问题及仪表板布局改进。升级需注意:实验性索引功能会增加内存消耗但提升吞吐量,启用后需通过redis-cli删除旧版缓存(item_neighbors和user_neighbors键值)。所有实验功能默认关闭,需在配置文件中手动开启并设置参数。
更新内容 (中文)
特性
- 支持通过环境变量修改配置(#359)。 提供以下 8 个环境变量:
环境变量 | 配置项 | |
---|---|---|
GORSE_CACHE_STORE | cache_store |
缓存数据库 |
GORSE_DATA_STORE | data_store |
持久化数据库 |
GORSE_MASTER_PORT | port |
主节点端口 |
GORSE_MASTER_HOST | host |
主节点地址 |
GORSE_MASTER_HTTP_PORT | http_port |
HTTP API 端口 |
GORSE_MASTER_HTTP_HOST | http_host |
HTTP API 地址 |
GORSE_MASTER_JOBS | n_jobs |
工作线程数 |
GORSE_SERVER_API_KEY | api_key |
RESTful API 密钥 |
- (实验性)支持基于 IVF 的邻近搜索(#363)。
- (实验性)支持基于 HNSW 的推荐项搜索(#368)。
修复
- 拆分大型 Redis 写入为批量操作(#352)。
- 允许关系型数据库中的空 JSON 字符串(#355)。
- 修复推荐 API 中的偏移量溢出问题(#365)。
- 优化仪表盘概览页布局(#369)。
升级指南
- 配置项: 基于 IVF 的邻近搜索和基于 HNSW 的推荐项搜索为实验性功能。
- 使用
enable_xxx_index
开启实验性功能。 xxx_index_recall
是近似搜索的期望召回率。xxx_index_fit_epoch
是索引适配的轮次数。
- 使用
这些近似向量搜索索引通过可接受的精度(召回率)换取更大吞吐量。索引构建器会尝试在 xxx_index_fit_epoch
轮次内达到 xxx_index_recall
。当达到 xxx_index_recall
或 xxx_index_fit_epoch
任一条件时,构建过程即停止。基于索引的搜索比暴力搜索消耗更多内存,其构建过程也需要额外时间,但与带来的收益相比可忽略不计。
# 启用使用向量索引进行近似物品邻近搜索
enable_item_neighbor_index = false
# 近似物品邻近搜索的最小召回率
item_neighbor_index_recall = 0.8
# 近似物品邻近搜索向量索引的最大适配轮次
item_neighbor_index_fit_epoch = 3
# 启用使用向量索引进行近似用户邻近搜索
enable_user_neighbor_index = false
# 近似用户邻近搜索的最小召回率
user_neighbor_index_recall = 0.8
# 近似用户邻近搜索向量索引的最大适配轮次
user_neighbor_index_fit_epoch = 3
# 启用使用向量索引进行近似协同过滤推荐
enable_collaborative_index = false
# 近似协同过滤推荐的最小召回率
collaborative_index_recall = 0.9
# 近似协同过滤推荐向量索引的最大适配轮次
collaborative_index_fit_epoch = 3
- Redis: 移除不兼容的陈旧缓存。
redis-cli KEYS \"item_neighbors*\" | xargs redis-cli DEL
redis-cli KEYS \"user_neighbors*\" | xargs redis-cli DEL
更新内容 (原始)
Feature
- Support modify configuration via environment variables (#359). There are 8 environment variables available:
Environment Variable | Configuration | |
---|---|---|
GORSE_CACHE_STORE | cache_store |
database for caching. |
GORSE_DATA_STORE | data_store |
database for persist data. |
GORSE_MASTER_PORT | port |
master port |
GORSE_MASTER_HOST | host |
master host |
GORSE_MASTER_HTTP_PORT | http_port |
HTTP API port |
GORSE_MASTER_HTTP_HOST | http_host |
HTTP API host |
GORSE_MASTER_JOBS | n_jobs |
number of working jobs |
GORSE_SERVER_API_KEY | api_key |
secret key for RESTful APIs |
- (Experimental) Support IVF-based neighborhood searching (#363).
- (Experimental) Support HNSW based recommended items searching (#368).
Fix
- Split large Redis writes to batches (#352).
- Allow empty JSON string in relational databases (#355).
- Fix offset overflow in recommend API (#365).
- Optimize dashboard overview page layout (#369).
Upgrade Guide
- Configuration: IVF-based neighborhood searching and HNSW based recommended items searching are experimental features.
- Use
enable_xxx_index
to enable experimental features. xxx_index_recall
is the expected recall of approximate searching.xxx_index_fit_epoch
is the number of epochs to adapt indices.
- Use
These approximate vector searching indices trade tolerable accuracy (recall) with larger throughput. The index builder tries to reach xxx_index_recall
in xxx_index_fit_epoch
. The building process will stop when xxx_index_recall
or xxx_index_fit_epoch
reached. Index-based searching costs more memory than brute force searching and its building process costs additional time, but they are negligible compared to the benefits.
# Enable approximate item neighbor searching using vector index.
enable_item_neighbor_index = false
# Minimal recall for approximate item neighbor searching.
item_neighbor_index_recall = 0.8
# Maximal number of fit epochs for approximate item neighbor searching vector index.
item_neighbor_index_fit_epoch = 3
# Enable approximate user neighbor searching using vector index.
enable_user_neighbor_index = false
# Minimal recall for approximate user neighbor searching.
user_neighbor_index_recall = 0.8
# Maximal number of fit epochs for approximate user neighbor searching vector index.
user_neighbor_index_fit_epoch = 3
# Enable approximate collaborative filtering recommend using vector index.
enable_collaborative_index = false
# Minimal recall for approximate collaborative filtering recommend.
collaborative_index_recall = 0.9
# Maximal number of fit epochs for approximate collaborative filtering recommend vector index.
collaborative_index_fit_epoch = 3
- Redis: Remove incompatible stale cache.
redis-cli KEYS "item_neighbors*" | xargs redis-cli DEL
redis-cli KEYS "user_neighbors*" | xargs redis-cli DEL