发布日期: 2024-12-23
版本号: v1.12.0

Meilisearch v1.12 版本带来了显著的性能提升和新功能。索引速度大幅优化，大型数据集索引耗时减少近半，文档插入速度提升两倍以上，增量更新速度提升四倍。新增了facetSearch和prefixSearch索引设置，允许通过关闭部分搜索功能换取更快的索引速度。引入全新的/batchesAPI端点，支持批量任务状态查询与进度跟踪，任务对象新增了批次标识字段。其他改进包括反向排序任务查询参数、短语匹配位置优化、新增Prometheus监控指标，以及日语文档检索能力的增强。修复了主键长度限制、分词不一致、停用词失效等关键问题。所有官方集成工具将在48小时内适配新版本，部分SDK功能可能滞后，鼓励开发者通过提交issue或PR参与功能完善。该版本还包含多项依赖更新、性能测试优化和内部工具改进。

更新内容（中文）

Meilisearch v1.12 实现了显著的索引速度提升，处理大型数据集时索引时间几乎减半。本次版本还引入了新的设置项，可用于自定义并进一步加快索引速度。

🧰 所有官方 Meilisearch 集成（包括 SDK、客户端及其他工具）均兼容此版本。新版本发布后 4 至 48 小时内将完成集成部署。

部分 SDK 可能未包含所有新功能。详细信息请查阅项目仓库。若您需要的功能未在所选 SDK 中实现？请创建 issue 告知我们，或提交 PR 实现该功能获取开源贡献积分（我们将不胜感激 ❤️）。

新功能与更新 🔥

索引速度优化

全局索引时间显著缩短！

在小型设备上保持或优于原有性能
在多核且具备良好 IO 性能的大型设备上，Meilisearch v1.12 相比 v1.11 速度大幅提升：
- 原始文档插入任务提速超两倍
- 大型数据库增量更新文档提速超四倍
- 部分工作负载的向量生成速度提升达 1.5 倍

新索引器还加速了任务取消操作。

由 @dureuill、@ManyTheFish 和 @Kerollmops 在 #4900 实现。

新增索引设置：通过 `facetSearch` 与 `prefixSearch` 提升索引速度

v1.12 引入两项新索引设置：facetSearch 和 prefixSearch。

这两个设置允许跳过部分索引流程，从而显著提升索引速度，但可能在某些场景下影响搜索体验。

由 @ManyTheFish 在 #5091 实现

`facetSearch`

该设置用于切换分面搜索：

curl \\
  -X PUT 'http:\/\/localhost:7700\/indexes\/books\/settings\/facet-search' \\
  -H 'Content-Type: application\/json' \\
  --data-binary 'true'

facetSearch 默认值为 true。设置为 false 时将禁用索引中所有可筛选属性的分面搜索功能。

`prefixSearch`

该设置用于配置索引的前缀搜索能力：

curl \\
  -X PUT 'http:\/\/localhost:7700\/indexes\/books\/settings\/prefix-search' \\
  -H 'Content-Type: application\/json' \\
  --data-binary 'disabled'

prefixSearch 接受以下值：

\"indexingTime\"：在索引期间启用前缀处理（默认行为）
\"disabled\"：完全禁用前缀搜索

禁用前缀搜索后，查询 he 将不再匹配 hello。这可能会显著影响搜索结果相关性，但能加速索引过程。

新增 API 路由：`\/batches`

新端点 \/batches 可用于查询任务批次信息。

GET \/batches 返回批次对象列表：

curl  -X GET 'http:\/\/localhost:7700\/batches'

此端点支持与 GET \/tasks 相同的查询参数，用于筛选批次。参数作用于批次内任务而非批次本身。例如 GET \/batches?uid=0 返回包含 taskUid 为 0 的任务批次，而非 batchUid 为 0 的批次。

也可通过 GET \/batches\/:uid 查询单个批次对象：

curl  -X GET 'http:\/\/localhost:7700\/batches\/BATCH_UID'

\/batches\/:uid 不接受参数。

批次对象包含以下字段：

{
  \"uid\": 160,
  \"progress\": {
    \"steps\": [
      {
        \"currentStep\": \"processing tasks\",
        \"finished\": 0,
        \"total\": 2
      },
      {
        \"currentStep\": \"indexing\",
        \"finished\": 2,
        \"total\": 3
      },
      {
        \"currentStep\": \"extracting words\",
        \"finished\": 3,
        \"total\": 13
      },
      {
        \"currentStep\": \"document\",
        \"finished\": 12300,
        \"total\": 19546
      }
    ],
    \"percentage\": 37.986263
  },
  \"details\": {
    \"receivedDocuments\": 19547,
    \"indexedDocuments\": null
  },
  \"stats\": {
    \"totalNbTasks\": 1,
    \"status\": {
      \"processing\": 1
    },
    \"types\": {
      \"documentAdditionOrUpdate\": 1
    },
    \"indexUids\": {
      \"mieli\": 1
    }
  },
  \"duration\": null,
  \"startedAt\": \"2024-12-12T09:44:34.124726733Z\",
  \"finishedAt\": null
}

任务对象新增 batchUid 字段，结合 \/batches\/:uid 可查询特定批次信息：

{
  \"uid\": 154,
  \"batchUid\": 142,
  \"indexUid\": \"movies_test2\",
  \"status\": \"succeeded\",
  \"type\": \"documentAdditionOrUpdate\",
  \"canceledBy\": null,
  \"details\": {
    \"receivedDocuments\": 1,
    \"indexedDocuments\": 1
  },
  \"error\": null,
  \"duration\": \"PT0.027766819S\",
  \"enqueuedAt\": \"2024-12-02T14:07:34.974430765Z\",
  \"startedAt\": \"2024-12-02T14:07:34.99021667Z\",
  \"finishedAt\": \"2024-12-02T14:07:35.017983489Z\"
}

由 @irevoire 在 #5060、#5070、#5080 实现

其他改进

GET \/tasks 新增 reverse 查询参数：设为 true 时按任务创建时间倒序返回（从旧到新）由 @irevoire 在 #5048 实现
启用 showMatchesPosition 时，短语搜索返回整个短语的单一位置 @flevi29 在 #4928 实现
新增 Prometheus 指标 @PedroTurik 在 #5044 实现
当查询匹配数组字段时，_matchesPosition 新增 indices 字段标识匹配数组元素 @LukasKalbertodt 在 #5005 实现
⚠️ 重大变更：vectorStore 的 _vectors 字段不再包含于字段分布统计（原值错误且无实际用途）作为 #4900 的组成部分实现
错误信息中新增索引名称 @airycanon 在 #5056 实现

修复 🐞

主键超过 512 字节时返回合适错误 @flevi29 在 #4930 修复
修复数字分词不一致问题 @dqkqd 在 https://github.com/meilisearch/charabia/pull/311 修复
修复向量生成失败时的分页问题 @dureuill 在 https://github.com/meilisearch/meilisearch/pull/5063 修复
修复停用词被忽略的问题 @ManyTheFish 在 #5062 修复
修复使用 attributesToSearchOn 时的短语搜索问题 @ManyTheFish 在 #5062 修复

其他

依赖更新
- 基准测试适配新目录结构 @Kerollmops 在 #5021 更新
- 修复基准测试 @irevoire 在 #5037 修复
- 升级 Swatinem/rust-cache 2.7.1 → 2.7.5 @5030
- 升级 charabia v0.9.2 @ManyTheFish 在 #5098 更新
- 更新 mini-dashboard 至 v0.2.16 @curquiza 在 #5102 更新
CI 与测试
- 优化 delete_index.rs 性能 @DerTimonius 在 #4963 改进
- 优化 create_index.rs 性能 @DerTimonius 在 #4962 改进
- 优化 get_documents.rs 性能 @PedroTurik 在 #5025 改进
- 优化 formatted.rs 性能 @PedroTurik 在 #5043 改进
- 修复 CI 中不稳定测试路径 @Kerollmops 在 #5049 修复
其他
- 恢复 Kawaii 图标 @Kerollmops 在 #5017 调整
- Dockerfile 添加图片来源标注 @wuast94 在 #4990 添加
- 代码复杂度优化至子目录 @Kerollmops 在 #5016 重构
- 内部工具：支持 v1.10 → v1.11 离线升级 @irevoire 在 #5034 实现
- 内部工具：支持 v1.11 → v1.12 离线升级 @ManyTheFish 在 #5146 实现
- 支持平假名查询匹配片假名词汇 @tats-u 在 https://github.com/meilisearch/charabia/pull/312 实现
- 改进 LMDB 写入错误处理 @Kerollmops 在 https://github.com/meilisearch/meilisearch/pull/5089 优化

❤️ 特别感谢外部贡献者：

Meilisearch: @airycanon, @DerTimonius, @flevi29, @LukasKalbertodt, @PedroTurik, @wuast94
Charabia: @dqkqd @tats-u

更新内容（原始）

Meilisearch v1.12 introduces significant indexing speed improvements, almost halving the time required to index large datasets. This release also introduces new settings to customize and potentially further increase indexing speed.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we’ll love you for that ❤️).

New features and updates 🔥

Improve indexing speed

Indexing time is improved across the board!

Performance is maintained or better on smaller machines
On bigger machines with multiple cores and good IO, Meilisearch v1.12 is much faster than Meilisearch v1.11
- More than twice as fast for raw document insertion tasks.
- More than x4 as fast for incrementally updating documents in a large database.
- Embeddings generation was also improved up to x1.5 for some workloads.

The new indexer also makes task cancellation faster.

Done by @dureuill, @ManyTheFish, and @Kerollmops in #4900.

New index settings: use `facetSearch` and `prefixSearch` to improve indexing speed

v1.12 introduces two new index settings: facetSearch and prefixSearch.

Both settings allow you to skip parts of the indexing process. This leads to significant improvements to indexing speed, but may negatively impact search experience in some use cases.

Done by @ManyTheFish in #5091

`facetSearch`

Use this setting to toggle facet search:

curl \
  -X PUT 'http://localhost:7700/indexes/books/settings/facet-search' \
  -H 'Content-Type: application/json' \
  --data-binary 'true'

The default value for facetSearch is true. When set to false, this setting disables facet search for all filterable attributes in an index.

`prefixSearch`

Use this setting to configure the ability to search a word by prefix on an index:

curl \
  -X PUT 'http://localhost:7700/indexes/books/settings/prefix-search' \
  -H 'Content-Type: application/json' \
  --data-binary 'disabled'

prefixSearch accepts one of the following values:

"indexingTime": enables prefix processing during indexing. This is the default Meilisearch behavior
"disabled": deactivates prefix search completely

Disabling prefix search means the query he will no longer match the word hello. This may significantly impact search result relevancy, but speeds up the indexing process.

New API route: `/batches`

The new /batches endpoint allow you to query information about task batches.

GET /batches returns a list of batch objects:

curl  -X GET 'http://localhost:7700/batches'

This endpoint accepts the same parameters as GET /tasks route, allowing you to narrow down which batches you want to see. Parameters used with GET /batches apply to the tasks, not the batches themselves. For example, GET /batches?uid=0 returns batches containing tasks with a taskUid of 0 , not batches with a batchUid of 0.

You may also query GET /batches/:uid to retrieve information about a single batch object:

curl  -X GET 'http://localhost:7700/batches/BATCH_UID'

/batches/:uid does not accept any parameters.

Batch objects contain the following fields:

{
  "uid": 160,
  "progress": {
    "steps": [
      {
        "currentStep": "processing tasks",
        "finished": 0,
        "total": 2
      },
      {
        "currentStep": "indexing",
        "finished": 2,
        "total": 3
      },
      {
        "currentStep": "extracting words",
        "finished": 3,
        "total": 13
      },
      {
        "currentStep": "document",
        "finished": 12300,
        "total": 19546
      }
    ],
    "percentage": 37.986263
  },
  "details": {
    "receivedDocuments": 19547,
    "indexedDocuments": null
  },
  "stats": {
    "totalNbTasks": 1,
    "status": {
      "processing": 1
    },
    "types": {
      "documentAdditionOrUpdate": 1
    },
    "indexUids": {
      "mieli": 1
    }
  },
  "duration": null,
  "startedAt": "2024-12-12T09:44:34.124726733Z",
  "finishedAt": null
}

Additionally, task objects now include a new field, batchUid. Use this field together with /batches/:uid to retrieve data on a specific batch.

{
  "uid": 154,
  "batchUid": 142,
  "indexUid": "movies_test2",
  "status": "succeeded",
  "type": "documentAdditionOrUpdate",
  "canceledBy": null,
  "details": {
    "receivedDocuments": 1,
    "indexedDocuments": 1
  },
  "error": null,
  "duration": "PT0.027766819S",
  "enqueuedAt": "2024-12-02T14:07:34.974430765Z",
  "startedAt": "2024-12-02T14:07:34.99021667Z",
  "finishedAt": "2024-12-02T14:07:35.017983489Z"
}

Done by @irevoire in #5060, #5070, #5080

Other improvements

New query parameter for GET /tasks: reverse. If reverse is set to true, tasks will be returned in reversed order, from oldest to newest tasks. Done by @irevoire in #5048
Phrase searches withshowMatchesPosition set to true give a single location for the whole phrase @flevi29 in #4928
New Prometheus metrics by @PedroTurik in #5044
When a query finds matching terms in document fields with array values, Meilisearch now includes an indices field to _matchesPosition specifying which array elements contain the matches by @LukasKalbertodt in #5005
⚠️ Breaking vectorStore change: field distribution no longer contains _vectors. Its value used to be incorrect, and there is no current use case for the fixed, most likely empty, value. Done as part of #4900
Improve error message by adding index name in #5056 by @airycanon

Fixes 🐞

Return appropriate error when primary key is greater than 512 bytes, by @flevi29 in #4930
Fix issue where numbers were segmented in different ways depending on tokenizer, by @dqkqd in https://github.com/meilisearch/charabia/pull/311
Fix pagination when embedding fails by @dureuill in https://github.com/meilisearch/meilisearch/pull/5063
Fix issue causing Meilisearch to ignore stop words in some cases by @ManyTheFish in #5062
Fix phrase search with attributesToSearchOn in #5062 by @ManyTheFish

Misc

Dependencies updates
- Update benchmarks to match the new crates subfolder by @Kerollmops in #5021
- Fix the benchmarks by @irevoire in #5037
- Bump Swatinem/rust-cache from 2.7.1 to 2.7.5 in #5030
- Update charabia v0.9.2 by @ManyTheFish in #5098
- Update mini-dashboard to v0.2.16 version by @curquiza in #5102
CIs and tests
- Improve performance of delete_index.rs by @DerTimonius in #4963
- Improve performance of create_index.rs by @DerTimonius in #4962
- Improve performance of get_documents.rs by @PedroTurik in #5025
- Improve performance of formatted.rs by @PedroTurik in #5043
- Fix the path used in the flaky tests CI by @Kerollmops in #5049
Misc
- Rollback the Meilisearch Kawaii logo by @Kerollmops in #5017
- Add image source label to Dockerfile by @wuast94 in #4990
- Hide code complexity into a subfolder by @Kerollmops in #5016
- Internal tool: implement offline upgrade from v1.10 to v1.11 by @irevoire in #5034
- Internal tool: implement offline upgrade from v1.11 to v1.12 by @ManyTheFish in #5146
- Meilisearch is now able to retrieve Katakana words from a Hiragana query by @tats-u in https://github.com/meilisearch/charabia/pull/312
- Improve error handling when writing into LMDB by @Kerollmops in https://github.com/meilisearch/meilisearch/pull/5089

❤️ Thanks again to our external contributors:

Meilisearch: @airycanon, @DerTimonius, @flevi29, @LukasKalbertodt, @PedroTurik, @wuast94
Charabia: @dqkqd @tats-u

meilisearch v1.12.0 版本更新介绍

更新内容（中文）

新功能与更新 🔥

索引速度优化

新增索引设置：通过 `facetSearch` 与 `prefixSearch` 提升索引速度

`facetSearch`

`prefixSearch`

新增 API 路由：`\/batches`

其他改进

修复 🐞

其他

更新内容（原始）

New features and updates 🔥

Improve indexing speed

New index settings: use `facetSearch` and `prefixSearch` to improve indexing speed

`facetSearch`

`prefixSearch`

New API route: `/batches`

Other improvements

Fixes 🐞

Misc

下载链接

相关文章

最近文章

分类

标签

友情链接

其它

更新内容 （中文）

新功能与更新 🔥

索引速度优化

新增索引设置：通过 facetSearch 与 prefixSearch 提升索引速度

facetSearch

prefixSearch

新增 API 路由：\/batches

其他改进

修复 🐞

其他

更新内容 （原始）

New features and updates 🔥

Improve indexing speed

New index settings: use facetSearch and prefixSearch to improve indexing speed

facetSearch

prefixSearch

New API route: /batches

Other improvements

Fixes 🐞

Misc

下载链接

相关文章

最近文章

分类

标签

友情链接

其它

更新内容（中文）

新增索引设置：通过 `facetSearch` 与 `prefixSearch` 提升索引速度

`facetSearch`

`prefixSearch`

新增 API 路由：`\/batches`

更新内容（原始）

New index settings: use `facetSearch` and `prefixSearch` to improve indexing speed

`facetSearch`

`prefixSearch`

New API route: `/batches`