mirror of
https://github.com/pgvector/pgvector.git
synced 2026-06-06 05:51:21 +08:00
Improved filtering section [skip ci]
This commit is contained in:
34
README.md
34
README.md
@@ -427,25 +427,51 @@ Note: `%` is only populated during the `loading tuples` phase
|
||||
|
||||
## Filtering
|
||||
|
||||
There are a few ways to index nearest neighbor queries with a `WHERE` clause
|
||||
There are a few ways to index nearest neighbor queries with a `WHERE` clause.
|
||||
|
||||
```sql
|
||||
SELECT * FROM items WHERE category_id = 123 ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
|
||||
```
|
||||
|
||||
Create an index on one [or more](https://www.postgresql.org/docs/current/indexes-multicolumn.html) of the `WHERE` columns for exact search
|
||||
A good place to start is creating an index on the filter column. This can provide fast, exact nearest neighbor search in many cases. Postgres has a number of [index types](https://www.postgresql.org/docs/current/indexes-types.html) for this: B-tree (default), hash, GiST, SP-GiST, GIN, and BRIN.
|
||||
|
||||
```sql
|
||||
CREATE INDEX ON items (category_id);
|
||||
```
|
||||
|
||||
Or a [partial index](https://www.postgresql.org/docs/current/indexes-partial.html) on the vector column for approximate search
|
||||
For multiple columns, consider a [multicolumn index](https://www.postgresql.org/docs/current/indexes-multicolumn.html).
|
||||
|
||||
```sql
|
||||
CREATE INDEX ON items (store_id, category_id);
|
||||
```
|
||||
|
||||
Exact indexes work well for conditions that match a small fraction of rows. For larger fractions, [approximate indexes](#indexing) can work better.
|
||||
|
||||
```sql
|
||||
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
|
||||
```
|
||||
|
||||
With approximate indexes, filtering is applied after the index is scanned (known as post-filtering). If a condition matches 10% of rows, with HNSW and the default `hnsw.ef_search` of 40, only 4 rows will match on average. For more rows, increase `hnsw.ef_search`.
|
||||
|
||||
```sql
|
||||
SET hnsw.ef_search = 200;
|
||||
```
|
||||
|
||||
Starting with 0.8.0, you can enable [iterative index scans](#iterative-index-scans), which will automatically scan more of the index when needed.
|
||||
|
||||
```sql
|
||||
SET hnsw.iterative_scan = strict_order;
|
||||
```
|
||||
|
||||
You can also create different approximate indexes for each value (or groups of values).
|
||||
|
||||
If filtering by a few distinct values, use [partial indexing](https://www.postgresql.org/docs/current/indexes-partial.html).
|
||||
|
||||
```sql
|
||||
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WHERE (category_id = 123);
|
||||
```
|
||||
|
||||
Use [partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) for approximate search on many different values of the `WHERE` columns
|
||||
If filtering by many different values, use [partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html).
|
||||
|
||||
```sql
|
||||
CREATE TABLE items (embedding vector(3), category_id int) PARTITION BY LIST(category_id);
|
||||
|
||||
Reference in New Issue
Block a user