diff --git a/README.md b/README.md index 75a6cb5..a9aff94 100644 --- a/README.md +++ b/README.md @@ -427,25 +427,51 @@ Note: `%` is only populated during the `loading tuples` phase ## Filtering -There are a few ways to index nearest neighbor queries with a `WHERE` clause +There are a few ways to index nearest neighbor queries with a `WHERE` clause. ```sql SELECT * FROM items WHERE category_id = 123 ORDER BY embedding <-> '[3,1,2]' LIMIT 5; ``` -Create an index on one [or more](https://www.postgresql.org/docs/current/indexes-multicolumn.html) of the `WHERE` columns for exact search +A good place to start is creating an index on the filter column. This can provide fast, exact nearest neighbor search in many cases. Postgres has a number of [index types](https://www.postgresql.org/docs/current/indexes-types.html) for this: B-tree (default), hash, GiST, SP-GiST, GIN, and BRIN. ```sql CREATE INDEX ON items (category_id); ``` -Or a [partial index](https://www.postgresql.org/docs/current/indexes-partial.html) on the vector column for approximate search +For multiple columns, consider a [multicolumn index](https://www.postgresql.org/docs/current/indexes-multicolumn.html). + +```sql +CREATE INDEX ON items (store_id, category_id); +``` + +Exact indexes work well for conditions that match a small fraction of rows. For larger fractions, [approximate indexes](#indexing) can work better. + +```sql +CREATE INDEX ON items USING hnsw (embedding vector_l2_ops); +``` + +With approximate indexes, filtering is applied after the index is scanned (known as post-filtering). If a condition matches 10% of rows, with HNSW and the default `hnsw.ef_search` of 40, only 4 rows will match on average. For more rows, increase `hnsw.ef_search`. + +```sql +SET hnsw.ef_search = 200; +``` + +Starting with 0.8.0, you can enable [iterative index scans](#iterative-index-scans), which will automatically scan more of the index when needed. + +```sql +SET hnsw.iterative_scan = strict_order; +``` + +You can also create different approximate indexes for each value (or groups of values). + +If filtering by a few distinct values, use [partial indexing](https://www.postgresql.org/docs/current/indexes-partial.html). ```sql CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WHERE (category_id = 123); ``` -Use [partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) for approximate search on many different values of the `WHERE` columns +If filtering by many different values, use [partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html). ```sql CREATE TABLE items (embedding vector(3), category_id int) PARTITION BY LIST(category_id);