meilisearch v1.12.0 版本更新介绍
发布日期: 2024-12-23
版本号: v1.12.0
Meilisearch v1.12 版本带来了显著的性能提升和新功能。索引速度大幅优化,大型数据集索引耗时减少近半,文档插入速度提升两倍以上,增量更新速度提升四倍。新增了
facetSearch
和prefixSearch
索引设置,允许通过关闭部分搜索功能换取更快的索引速度。引入全新的/batches
API端点,支持批量任务状态查询与进度跟踪,任务对象新增了批次标识字段。其他改进包括反向排序任务查询参数、短语匹配位置优化、新增Prometheus监控指标,以及日语文档检索能力的增强。修复了主键长度限制、分词不一致、停用词失效等关键问题。所有官方集成工具将在48小时内适配新版本,部分SDK功能可能滞后,鼓励开发者通过提交issue或PR参与功能完善。该版本还包含多项依赖更新、性能测试优化和内部工具改进。
更新内容 (中文)
Meilisearch v1.12 实现了显著的索引速度提升,处理大型数据集时索引时间几乎减半。本次版本还引入了新的设置项,可用于自定义并进一步加快索引速度。
🧰 所有官方 Meilisearch 集成(包括 SDK、客户端及其他工具)均兼容此版本。新版本发布后 4 至 48 小时内将完成集成部署。
部分 SDK 可能未包含所有新功能。详细信息请查阅项目仓库。若您需要的功能未在所选 SDK 中实现?请创建 issue 告知我们,或提交 PR 实现该功能获取开源贡献积分(我们将不胜感激 ❤️)。
新功能与更新 🔥
索引速度优化
全局索引时间显著缩短!
- 在小型设备上保持或优于原有性能
- 在多核且具备良好 IO 性能的大型设备上,Meilisearch v1.12 相比 v1.11 速度大幅提升:
- 原始文档插入任务提速超两倍
- 大型数据库增量更新文档提速超四倍
- 部分工作负载的向量生成速度提升达 1.5 倍
新索引器还加速了任务取消操作。
由 @dureuill、@ManyTheFish 和 @Kerollmops 在 #4900 实现。
新增索引设置:通过 facetSearch
与 prefixSearch
提升索引速度
v1.12 引入两项新索引设置:facetSearch
和 prefixSearch
。
这两个设置允许跳过部分索引流程,从而显著提升索引速度,但可能在某些场景下影响搜索体验。
由 @ManyTheFish 在 #5091 实现
facetSearch
该设置用于切换分面搜索:
curl \\
-X PUT 'http:\/\/localhost:7700\/indexes\/books\/settings\/facet-search' \\
-H 'Content-Type: application\/json' \\
--data-binary 'true'
facetSearch
默认值为 true
。设置为 false
时将禁用索引中所有可筛选属性的分面搜索功能。
prefixSearch
该设置用于配置索引的前缀搜索能力:
curl \\
-X PUT 'http:\/\/localhost:7700\/indexes\/books\/settings\/prefix-search' \\
-H 'Content-Type: application\/json' \\
--data-binary 'disabled'
prefixSearch
接受以下值:
\"indexingTime\"
:在索引期间启用前缀处理(默认行为)\"disabled\"
:完全禁用前缀搜索
禁用前缀搜索后,查询 he
将不再匹配 hello
。这可能会显著影响搜索结果相关性,但能加速索引过程。
新增 API 路由:\/batches
新端点 \/batches
可用于查询任务批次信息。
GET
\/batches
返回批次对象列表:
curl -X GET 'http:\/\/localhost:7700\/batches'
此端点支持与 GET
\/tasks
相同的查询参数,用于筛选批次。参数作用于批次内任务而非批次本身。例如 GET \/batches?uid=0
返回包含 taskUid
为 0
的任务批次,而非 batchUid
为 0
的批次。
也可通过 GET
\/batches\/:uid
查询单个批次对象:
curl -X GET 'http:\/\/localhost:7700\/batches\/BATCH_UID'
\/batches\/:uid
不接受参数。
批次对象包含以下字段:
{
\"uid\": 160,
\"progress\": {
\"steps\": [
{
\"currentStep\": \"processing tasks\",
\"finished\": 0,
\"total\": 2
},
{
\"currentStep\": \"indexing\",
\"finished\": 2,
\"total\": 3
},
{
\"currentStep\": \"extracting words\",
\"finished\": 3,
\"total\": 13
},
{
\"currentStep\": \"document\",
\"finished\": 12300,
\"total\": 19546
}
],
\"percentage\": 37.986263
},
\"details\": {
\"receivedDocuments\": 19547,
\"indexedDocuments\": null
},
\"stats\": {
\"totalNbTasks\": 1,
\"status\": {
\"processing\": 1
},
\"types\": {
\"documentAdditionOrUpdate\": 1
},
\"indexUids\": {
\"mieli\": 1
}
},
\"duration\": null,
\"startedAt\": \"2024-12-12T09:44:34.124726733Z\",
\"finishedAt\": null
}
任务对象新增 batchUid
字段,结合 \/batches\/:uid
可查询特定批次信息:
{
\"uid\": 154,
\"batchUid\": 142,
\"indexUid\": \"movies_test2\",
\"status\": \"succeeded\",
\"type\": \"documentAdditionOrUpdate\",
\"canceledBy\": null,
\"details\": {
\"receivedDocuments\": 1,
\"indexedDocuments\": 1
},
\"error\": null,
\"duration\": \"PT0.027766819S\",
\"enqueuedAt\": \"2024-12-02T14:07:34.974430765Z\",
\"startedAt\": \"2024-12-02T14:07:34.99021667Z\",
\"finishedAt\": \"2024-12-02T14:07:35.017983489Z\"
}
由 @irevoire 在 #5060、#5070、#5080 实现
其他改进
GET
\/tasks
新增reverse
查询参数:设为true
时按任务创建时间倒序返回(从旧到新)由 @irevoire 在 #5048 实现- 启用
showMatchesPosition
时,短语搜索返回整个短语的单一位置 @flevi29 在 #4928 实现 - 新增 Prometheus 指标 @PedroTurik 在 #5044 实现
- 当查询匹配数组字段时,
_matchesPosition
新增indices
字段标识匹配数组元素 @LukasKalbertodt 在 #5005 实现 - ⚠️ 重大变更:
vectorStore
的_vectors
字段不再包含于字段分布统计(原值错误且无实际用途)作为 #4900 的组成部分实现 - 错误信息中新增索引名称 @airycanon 在 #5056 实现
修复 🐞
- 主键超过 512 字节时返回合适错误 @flevi29 在 #4930 修复
- 修复数字分词不一致问题 @dqkqd 在 https://github.com/meilisearch/charabia/pull/311 修复
- 修复向量生成失败时的分页问题 @dureuill 在 https://github.com/meilisearch/meilisearch/pull/5063 修复
- 修复停用词被忽略的问题 @ManyTheFish 在 #5062 修复
- 修复使用
attributesToSearchOn
时的短语搜索问题 @ManyTheFish 在 #5062 修复
其他
- 依赖更新
- 基准测试适配新目录结构 @Kerollmops 在 #5021 更新
- 修复基准测试 @irevoire 在 #5037 修复
- 升级 Swatinem/rust-cache 2.7.1 → 2.7.5 @5030
- 升级 charabia v0.9.2 @ManyTheFish 在 #5098 更新
- 更新 mini-dashboard 至 v0.2.16 @curquiza 在 #5102 更新
- CI 与测试
- 优化
delete_index.rs
性能 @DerTimonius 在 #4963 改进 - 优化
create_index.rs
性能 @DerTimonius 在 #4962 改进 - 优化
get_documents.rs
性能 @PedroTurik 在 #5025 改进 - 优化
formatted.rs
性能 @PedroTurik 在 #5043 改进 - 修复 CI 中不稳定测试路径 @Kerollmops 在 #5049 修复
- 优化
- 其他
- 恢复 Kawaii 图标 @Kerollmops 在 #5017 调整
- Dockerfile 添加图片来源标注 @wuast94 在 #4990 添加
- 代码复杂度优化至子目录 @Kerollmops 在 #5016 重构
- 内部工具:支持 v1.10 → v1.11 离线升级 @irevoire 在 #5034 实现
- 内部工具:支持 v1.11 → v1.12 离线升级 @ManyTheFish 在 #5146 实现
- 支持平假名查询匹配片假名词汇 @tats-u 在 https://github.com/meilisearch/charabia/pull/312 实现
- 改进 LMDB 写入错误处理 @Kerollmops 在 https://github.com/meilisearch/meilisearch/pull/5089 优化
❤️ 特别感谢外部贡献者:
- Meilisearch: @airycanon, @DerTimonius, @flevi29, @LukasKalbertodt, @PedroTurik, @wuast94
- Charabia: @dqkqd @tats-u
更新内容 (原始)
Meilisearch v1.12 introduces significant indexing speed improvements, almost halving the time required to index large datasets. This release also introduces new settings to customize and potentially further increase indexing speed.
🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.
Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we’ll love you for that ❤️).
New features and updates 🔥
Improve indexing speed
Indexing time is improved across the board!
- Performance is maintained or better on smaller machines
- On bigger machines with multiple cores and good IO, Meilisearch v1.12 is much faster than Meilisearch v1.11
- More than twice as fast for raw document insertion tasks.
- More than x4 as fast for incrementally updating documents in a large database.
- Embeddings generation was also improved up to x1.5 for some workloads.
The new indexer also makes task cancellation faster.
Done by @dureuill, @ManyTheFish, and @Kerollmops in #4900.
New index settings: use facetSearch
and prefixSearch
to improve indexing speed
v1.12 introduces two new index settings: facetSearch
and prefixSearch
.
Both settings allow you to skip parts of the indexing process. This leads to significant improvements to indexing speed, but may negatively impact search experience in some use cases.
Done by @ManyTheFish in #5091
facetSearch
Use this setting to toggle facet search:
curl \
-X PUT 'http://localhost:7700/indexes/books/settings/facet-search' \
-H 'Content-Type: application/json' \
--data-binary 'true'
The default value for facetSearch
is true
. When set to false
, this setting disables facet search for all filterable attributes in an index.
prefixSearch
Use this setting to configure the ability to search a word by prefix on an index:
curl \
-X PUT 'http://localhost:7700/indexes/books/settings/prefix-search' \
-H 'Content-Type: application/json' \
--data-binary 'disabled'
prefixSearch
accepts one of the following values:
"indexingTime"
: enables prefix processing during indexing. This is the default Meilisearch behavior"disabled"
: deactivates prefix search completely
Disabling prefix search means the query he
will no longer match the word hello
. This may significantly impact search result relevancy, but speeds up the indexing process.
New API route: /batches
The new /batches
endpoint allow you to query information about task batches.
GET
/batches
returns a list of batch objects:
curl -X GET 'http://localhost:7700/batches'
This endpoint accepts the same parameters as GET
/tasks
route, allowing you to narrow down which batches you want to see. Parameters used with GET
/batches
apply to the tasks, not the batches themselves. For example, GET /batches?uid=0
returns batches containing tasks with a taskUid
of 0
, not batches with a batchUid
of 0
.
You may also query GET
/batches/:uid
to retrieve information about a single batch object:
curl -X GET 'http://localhost:7700/batches/BATCH_UID'
/batches/:uid
does not accept any parameters.
Batch objects contain the following fields:
{
"uid": 160,
"progress": {
"steps": [
{
"currentStep": "processing tasks",
"finished": 0,
"total": 2
},
{
"currentStep": "indexing",
"finished": 2,
"total": 3
},
{
"currentStep": "extracting words",
"finished": 3,
"total": 13
},
{
"currentStep": "document",
"finished": 12300,
"total": 19546
}
],
"percentage": 37.986263
},
"details": {
"receivedDocuments": 19547,
"indexedDocuments": null
},
"stats": {
"totalNbTasks": 1,
"status": {
"processing": 1
},
"types": {
"documentAdditionOrUpdate": 1
},
"indexUids": {
"mieli": 1
}
},
"duration": null,
"startedAt": "2024-12-12T09:44:34.124726733Z",
"finishedAt": null
}
Additionally, task objects now include a new field, batchUid
. Use this field together with /batches/:uid
to retrieve data on a specific batch.
{
"uid": 154,
"batchUid": 142,
"indexUid": "movies_test2",
"status": "succeeded",
"type": "documentAdditionOrUpdate",
"canceledBy": null,
"details": {
"receivedDocuments": 1,
"indexedDocuments": 1
},
"error": null,
"duration": "PT0.027766819S",
"enqueuedAt": "2024-12-02T14:07:34.974430765Z",
"startedAt": "2024-12-02T14:07:34.99021667Z",
"finishedAt": "2024-12-02T14:07:35.017983489Z"
}
Done by @irevoire in #5060, #5070, #5080
Other improvements
- New query parameter for
GET
/tasks
:reverse
. Ifreverse
is set totrue
, tasks will be returned in reversed order, from oldest to newest tasks. Done by @irevoire in #5048 - Phrase searches with
showMatchesPosition
set totrue
give a single location for the whole phrase @flevi29 in #4928 - New Prometheus metrics by @PedroTurik in #5044
- When a query finds matching terms in document fields with array values, Meilisearch now includes an
indices
field to_matchesPosition
specifying which array elements contain the matches by @LukasKalbertodt in #5005 - ⚠️ Breaking
vectorStore
change: field distribution no longer contains_vectors
. Its value used to be incorrect, and there is no current use case for the fixed, most likely empty, value. Done as part of #4900 - Improve error message by adding index name in #5056 by @airycanon
Fixes 🐞
- Return appropriate error when primary key is greater than 512 bytes, by @flevi29 in #4930
- Fix issue where numbers were segmented in different ways depending on tokenizer, by @dqkqd in https://github.com/meilisearch/charabia/pull/311
- Fix pagination when embedding fails by @dureuill in https://github.com/meilisearch/meilisearch/pull/5063
- Fix issue causing Meilisearch to ignore stop words in some cases by @ManyTheFish in #5062
- Fix phrase search with
attributesToSearchOn
in #5062 by @ManyTheFish
Misc
- Dependencies updates
- Update benchmarks to match the new crates subfolder by @Kerollmops in #5021
- Fix the benchmarks by @irevoire in #5037
- Bump Swatinem/rust-cache from 2.7.1 to 2.7.5 in #5030
- Update charabia v0.9.2 by @ManyTheFish in #5098
- Update mini-dashboard to v0.2.16 version by @curquiza in #5102
- CIs and tests
- Improve performance of
delete_index.rs
by @DerTimonius in #4963 - Improve performance of
create_index.rs
by @DerTimonius in #4962 - Improve performance of
get_documents.rs
by @PedroTurik in #5025 - Improve performance of
formatted.rs
by @PedroTurik in #5043 - Fix the path used in the flaky tests CI by @Kerollmops in #5049
- Improve performance of
- Misc
- Rollback the Meilisearch Kawaii logo by @Kerollmops in #5017
- Add image source label to Dockerfile by @wuast94 in #4990
- Hide code complexity into a subfolder by @Kerollmops in #5016
- Internal tool: implement offline upgrade from v1.10 to v1.11 by @irevoire in #5034
- Internal tool: implement offline upgrade from v1.11 to v1.12 by @ManyTheFish in #5146
- Meilisearch is now able to retrieve Katakana words from a Hiragana query by @tats-u in https://github.com/meilisearch/charabia/pull/312
- Improve error handling when writing into LMDB by @Kerollmops in https://github.com/meilisearch/meilisearch/pull/5089
❤️ Thanks again to our external contributors:
- Meilisearch: @airycanon, @DerTimonius, @flevi29, @LukasKalbertodt, @PedroTurik, @wuast94
- Charabia: @dqkqd @tats-u