发布日期: 2024-10-28
版本号: v1.11.0

Meilisearch v1.11版本推出了多项重要更新：AI搜索功能通过二进制量化和API调整得到优化，为未来功能稳定奠定基础。该版本要求AI搜索必须携带hybrid.embedder参数，并改用OpenAI的text-embedding-3-small作为默认模型。新增的二进制量化选项可大幅提升高维度数据索引性能，但会降低搜索相关性且不可逆。联邦搜索新增分面统计功能，支持按索引独立或合并统计。实验性功能新增STARTS WITH过滤操作符，需手动开启。此外，改进了多语言支持（包括德语和土耳其语），修复了分面截断错误、任务取消异常等问题，并优化了日志和界面显示。所有官方集成工具将在4-48小时内适配新版本，部分SDK功能可能延迟，鼓励用户通过提交Issue或PR参与完善。

更新内容（中文）

详见原始内容

更新内容（原始）

Meilisearch v1.11 introduces AI-powered search performance improvements thanks to binary quantization and various usage changes, all of which are steps towards a future stabilization of the feature. We have also improved federated search usage following user feedback.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we’ll love you for that ❤️).

New features and updates 🔥

Experimental - AI-powered search improvements

This release is Meilisearch’s first step towards stabilizing AI-powered search and introduces a few breaking changes to its API. Consult the PRD for full usage details.

Done by @dureuill in #4906, #4920, #4892, and #4938.

⚠️ Breaking changes

When performing AI-powered searches, hybrid.embedder is now a mandatory parameter in GET and POST /indexes/{:indexUid}/search
As a consequence, it is now mandatory to pass hybrid even for pure semantic searches
embedder is now a mandatory parameter in GET and POST /indexes/{:indexUid}/similar
Meilisearch now ignores semanticRatio and performs a pure semantic search for queries that include vector but not q

Addition & improvements

The default model for OpenAI is now text-embedding-3-small instead of text-embedding-ada-002
This release introduces a new embedder option: documentTemplateMaxBytes. Meilisearch will truncate a document’s template text when it goes over the specified limit
Fields in documentTemplate include a new field.is_searchable property. The default document template now filters out both empty fields and fields not in the searchable attributes list:

v1.11:

{% for field in fields %}
  {% if field.is_searchable and not field.value == nil %}
    {{ field.name }}: {{ field.value }}\n
  {% endif %}
{% endfor %}

v1.10:

{% for field in fields %}
  {{ field.name }}: {{ field.value }}\n
{% endfor %}

Embedders using the v1.10 document template will continue working as before. The new default document template will only work with newly created embedders.

Vector database indexing performance improvements

v1.11 introduces a new embedder option, binaryQuantized:

curl \
  -X PATCH 'http://localhost:7700/indexes/movies/settings' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "embedders": {
      "image2text": {
        "binaryQuantized": true
      }
    }
  }'

Enable binary quantization to convert embeddings of floating point numbers into embeddings of boolean values. This will negatively impact the relevancy of AI-powered searches but significantly improve performance in large collections with more than 100 dimensions.

In our benchmarks, this reduced the size of the database by a factor of 10 and divided the indexing time by a factor of 6 with little impact on search times.

[!WARNING] Enabling this feature will update all of your vectors to contain only 1s or -1s, significantly impacting relevancy.

You cannot revert this option once you enable it. Before setting binaryQuantized to true, Meilisearch recommends testing it in a smaller or duplicate index in a development environment.

Done by @irevoire in #4941.

Federated search improvements

This release adds two new federated search options, facetsByIndex and mergeFacets. These allow you to request a federated search for facet distributions and stats data.

To obtain facet distribution and stats for each separate index, use facetsByIndex when querying the POST /multi-search endpoint:

POST /multi-search
{
  "federation": {
    "limit": 20,
    "offset": 0,
	"facetsByIndex": {
	  "movies": ["title", "id"],
	  "comics": ["title"],
	}
  },
  "queries": [
    {
      "q": "Batman",
      "indexUid": "movies"
    },
    {
      "q": "Batman",
      "indexUid": "comics"
    }
  ]
}

The multi-search response will include a new field, facetsByIndex with facet data separated per index:

{
  "hits": […],
  …
  "facetsByIndex": {
      "movies": {
        "distribution": {
          "title": {
            "Batman returns": 1
          },
          "id": {
            "42": 1
          }
        },
        "stats": {
          "id": {
            "min": 42,
            "max": 42
          }
        }
      },
     …
  }
}

To obtain facet distribution and stats for all indexes merged into a single, use both facetsByIndex and mergeFacets when querying the POST /multi-search endpoint:

POST /multi-search
{

  "federation": {
    "limit": 20,
    "offset": 0,
	  "facetsByIndex": {
	    "movies": ["title", "id"],
	    "comics": ["title"],
	  },
	  "mergeFacets": {
	    "maxValuesPerFacet": 10,
	  }
  }
  "queries": [
    {
      "q": "Batman",
      "indexUid": "movies"
    },
    {
      "q": "Batman",
      "indexUid": "comics"
    }
  ]
}

The response includes two new fields, facetDistribution and facetStarts:

{
  "hits": […],
  …
  "facetDistribution": {
    "title": {
      "Batman returns": 1
      "Batman: the killing joke":
    },
    "id": {
      "42": 1
    }
  },
  "facetStats": {
    "id": {
      "min": 42,
      "max": 42
    }
  }
}

Done by @dureuill in #4929.

Experimental — New `STARTS WITH` filter operator

Enable the experimental feature to use the STARTS WITH filter operator:

curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "containsFilter": true
  }'

Use the STARTS WITH operator when filtering:

curl \
  -X POST http://localhost:7700/indexes/movies/search \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "filter": "hero STARTS WITH spider"
  }'

🗣️ This is an experimental feature, and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @Kerollmops in #4939.

Other improvements

Language support and localizedAttributes settings by @ManyTheFish in #4937
- Add ISO-639-1 variants
- Convert ISO-639-1 into ISO-639-3
Add a German language tokenizer by @luflow in meilisearch/charabia#303 and in #4945
Improve Turkish language support by @tkhshtsh0917 in meilisearch/charabia#305 and in #4957
Upgrade “batch failed” log to error level in #4955 by @dureuill.
Update the search UI: remove the forced capitalized fields, by @curquiza in #4993

Fixes 🐞

⚠️ When using federated search, query.facets was silently ignored at the query level, but should not have been. It now returns the appropriate error. Use federation.facetsByIndex instead if you want facets to be applied during federated search.
Prometheus /metrics return the route pattern instead of the real route when returning the HTTP requests total by @irevoire in #4839
Truncate values at the end of a list of facet values when the number of facet values is larger than maxValuesPerFacet. For example, setting maxValuesPerFacet to 2 could result in ["blue", "red", "yellow"], being truncated to ["blue", "yellow"] instead of [“blue”, “red”]`. By @dureuill in #4929
Improve the task cancellation when vectors are used, by @irevoire in #4971
Swedish support: the characters å, ä, ö are no longer normalized to a and o. By @ManyTheFish in #4945
Update rhai to fix an internal error when updating documents with a function (experimental) by @irevoire in #4960
Fix the bad experimental search queue size by @irevoire in #4992
Do not send empty edit document by function by @irevoire in #5001
Display vectors when no custom vectors were ever provided by @dureuill in #5008

Misc

Dependencies updates
- Security dependency upgrade: bump quinn-proto from 0.11.3 to 0.11.8 by @dependabot in #4911
CIs and tests
- Make the tests run faster by @irevoire in #4808
Documentation
- Fix broken links in README by @iornstein in #4943
Misc
- Allow Meilitool to upgrade from v1.9 to v1.10 without a dump in some conditions, by @dureuill in #4912
- Fix bench by adding embedder by @dureuill in #4954
- Revamp analytics by @irevoire in #5011

❤️ Thanks again to our external contributors:

Meilisearch: @iornstein.
Charabia: @luflow, @tkhshtsh0917.

meilisearch v1.11.0 版本更新介绍

更新内容（中文）

更新内容（原始）

New features and updates 🔥

Experimental - AI-powered search improvements

⚠️ Breaking changes

Addition & improvements

Vector database indexing performance improvements

Federated search improvements

Facet distribution and stats for federated searches

Facet information by index

Merged facet information

Experimental — New `STARTS WITH` filter operator

Other improvements

Fixes 🐞

Misc

下载链接

相关文章

最近文章

分类

标签

友情链接

其它

更新内容 （中文）

更新内容 （原始）

New features and updates 🔥

Experimental - AI-powered search improvements

⚠️ Breaking changes

Addition & improvements

Vector database indexing performance improvements

Federated search improvements

Facet distribution and stats for federated searches

Facet information by index

Merged facet information

Experimental — New STARTS WITH filter operator

Other improvements

Fixes 🐞

Misc

下载链接

相关文章

最近文章

分类

标签

友情链接

其它

更新内容（中文）

更新内容（原始）

Experimental — New `STARTS WITH` filter operator