发布日期: 2022-02-02
版本号: v0.3.2

Gorse新增通过环境变量修改配置的支持,提供8个环境变量分别对应不同配置项(如GORSE_CACHE_STORE对应缓存数据库,GORSE_MASTER_PORT对应主端口)。引入实验性IVF和HNSW向量索引搜索功能,可通过enable_xxx_index参数启用,通过xxx_index_recall设置预期召回率,xxx_index_fit_epoch控制训练轮次,在精度与性能间取得平衡。修复内容包括Redis分批写入优化、空JSON字符串支持、推荐API偏移量溢出问题及仪表板布局改进。升级需注意:实验性索引功能会增加内存消耗但提升吞吐量,启用后需通过redis-cli删除旧版缓存(item_neighbors和user_neighbors键值)。所有实验功能默认关闭,需在配置文件中手动开启并设置参数。

更新内容 (中文)

特性

  • 支持通过环境变量修改配置(#359)。 提供以下 8 个环境变量:
环境变量 配置项
GORSE_CACHE_STORE cache_store 缓存数据库
GORSE_DATA_STORE data_store 持久化数据库
GORSE_MASTER_PORT port 主节点端口
GORSE_MASTER_HOST host 主节点地址
GORSE_MASTER_HTTP_PORT http_port HTTP API 端口
GORSE_MASTER_HTTP_HOST http_host HTTP API 地址
GORSE_MASTER_JOBS n_jobs 工作线程数
GORSE_SERVER_API_KEY api_key RESTful API 密钥
  • (实验性)支持基于 IVF 的邻近搜索(#363)。
  • (实验性)支持基于 HNSW 的推荐项搜索(#368)。

修复

  • 拆分大型 Redis 写入为批量操作(#352)。
  • 允许关系型数据库中的空 JSON 字符串(#355)。
  • 修复推荐 API 中的偏移量溢出问题(#365)。
  • 优化仪表盘概览页布局(#369)。

升级指南

  • 配置项: 基于 IVF 的邻近搜索和基于 HNSW 的推荐项搜索为实验性功能。
    • 使用 enable_xxx_index 开启实验性功能。
    • xxx_index_recall 是近似搜索的期望召回率。
    • xxx_index_fit_epoch 是索引适配的轮次数。

这些近似向量搜索索引通过可接受的精度(召回率)换取更大吞吐量。索引构建器会尝试在 xxx_index_fit_epoch 轮次内达到 xxx_index_recall。当达到 xxx_index_recallxxx_index_fit_epoch 任一条件时,构建过程即停止。基于索引的搜索比暴力搜索消耗更多内存,其构建过程也需要额外时间,但与带来的收益相比可忽略不计。

# 启用使用向量索引进行近似物品邻近搜索
enable_item_neighbor_index = false

# 近似物品邻近搜索的最小召回率
item_neighbor_index_recall = 0.8

# 近似物品邻近搜索向量索引的最大适配轮次
item_neighbor_index_fit_epoch = 3

# 启用使用向量索引进行近似用户邻近搜索
enable_user_neighbor_index = false

# 近似用户邻近搜索的最小召回率
user_neighbor_index_recall = 0.8

# 近似用户邻近搜索向量索引的最大适配轮次
user_neighbor_index_fit_epoch = 3

# 启用使用向量索引进行近似协同过滤推荐
enable_collaborative_index = false

# 近似协同过滤推荐的最小召回率
collaborative_index_recall = 0.9

# 近似协同过滤推荐向量索引的最大适配轮次
collaborative_index_fit_epoch = 3
  • Redis: 移除不兼容的陈旧缓存。
redis-cli KEYS \"item_neighbors*\" | xargs redis-cli DEL
redis-cli KEYS \"user_neighbors*\" | xargs redis-cli DEL

更新内容 (原始)

Feature

  • Support modify configuration via environment variables (#359). There are 8 environment variables available:
Environment Variable Configuration
GORSE_CACHE_STORE cache_store database for caching.
GORSE_DATA_STORE data_store database for persist data.
GORSE_MASTER_PORT port master port
GORSE_MASTER_HOST host master host
GORSE_MASTER_HTTP_PORT http_port HTTP API port
GORSE_MASTER_HTTP_HOST http_host HTTP API host
GORSE_MASTER_JOBS n_jobs number of working jobs
GORSE_SERVER_API_KEY api_key secret key for RESTful APIs
  • (Experimental) Support IVF-based neighborhood searching (#363).
  • (Experimental) Support HNSW based recommended items searching (#368).

Fix

  • Split large Redis writes to batches (#352).
  • Allow empty JSON string in relational databases (#355).
  • Fix offset overflow in recommend API (#365).
  • Optimize dashboard overview page layout (#369).

Upgrade Guide

  • Configuration: IVF-based neighborhood searching and HNSW based recommended items searching are experimental features.
    • Use enable_xxx_index to enable experimental features.
    • xxx_index_recall is the expected recall of approximate searching.
    • xxx_index_fit_epoch is the number of epochs to adapt indices.

These approximate vector searching indices trade tolerable accuracy (recall) with larger throughput. The index builder tries to reach xxx_index_recall in xxx_index_fit_epoch. The building process will stop when xxx_index_recall or xxx_index_fit_epoch reached. Index-based searching costs more memory than brute force searching and its building process costs additional time, but they are negligible compared to the benefits.

# Enable approximate item neighbor searching using vector index.
enable_item_neighbor_index = false

# Minimal recall for approximate item neighbor searching.
item_neighbor_index_recall = 0.8

# Maximal number of fit epochs for approximate item neighbor searching vector index.
item_neighbor_index_fit_epoch = 3

# Enable approximate user neighbor searching using vector index.
enable_user_neighbor_index = false

# Minimal recall for approximate user neighbor searching.
user_neighbor_index_recall = 0.8

# Maximal number of fit epochs for approximate user neighbor searching vector index.
user_neighbor_index_fit_epoch = 3

# Enable approximate collaborative filtering recommend using vector index.
enable_collaborative_index = false

# Minimal recall for approximate collaborative filtering recommend.
collaborative_index_recall = 0.9

# Maximal number of fit epochs for approximate collaborative filtering recommend vector index.
collaborative_index_fit_epoch = 3
  • Redis: Remove incompatible stale cache.
redis-cli KEYS "item_neighbors*" | xargs redis-cli DEL
redis-cli KEYS "user_neighbors*" | xargs redis-cli DEL

下载链接