Improved filtering section [skip ci]

This commit is contained in:
Andrew Kane
2024-10-28 12:08:27 -07:00
parent c1161f8889
commit fe6ec03dac

View File

@@ -427,25 +427,51 @@ Note: `%` is only populated during the `loading tuples` phase
## Filtering
There are a few ways to index nearest neighbor queries with a `WHERE` clause
There are a few ways to index nearest neighbor queries with a `WHERE` clause.
```sql
SELECT * FROM items WHERE category_id = 123 ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
```
Create an index on one [or more](https://www.postgresql.org/docs/current/indexes-multicolumn.html) of the `WHERE` columns for exact search
A good place to start is creating an index on the filter column. This can provide fast, exact nearest neighbor search in many cases. Postgres has a number of [index types](https://www.postgresql.org/docs/current/indexes-types.html) for this: B-tree (default), hash, GiST, SP-GiST, GIN, and BRIN.
```sql
CREATE INDEX ON items (category_id);
```
Or a [partial index](https://www.postgresql.org/docs/current/indexes-partial.html) on the vector column for approximate search
For multiple columns, consider a [multicolumn index](https://www.postgresql.org/docs/current/indexes-multicolumn.html).
```sql
CREATE INDEX ON items (store_id, category_id);
```
Exact indexes work well for conditions that match a small fraction of rows. For larger fractions, [approximate indexes](#indexing) can work better.
```sql
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
```
With approximate indexes, filtering is applied after the index is scanned (known as post-filtering). If a condition matches 10% of rows, with HNSW and the default `hnsw.ef_search` of 40, only 4 rows will match on average. For more rows, increase `hnsw.ef_search`.
```sql
SET hnsw.ef_search = 200;
```
Starting with 0.8.0, you can enable [iterative index scans](#iterative-index-scans), which will automatically scan more of the index when needed.
```sql
SET hnsw.iterative_scan = strict_order;
```
You can also create different approximate indexes for each value (or groups of values).
If filtering by a few distinct values, use [partial indexing](https://www.postgresql.org/docs/current/indexes-partial.html).
```sql
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WHERE (category_id = 123);
```
Use [partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) for approximate search on many different values of the `WHERE` columns
If filtering by many different values, use [partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html).
```sql
CREATE TABLE items (embedding vector(3), category_id int) PARTITION BY LIST(category_id);