Improved filtering section [skip ci]

2026-07-23 04:20:56 +08:00 · 2024-10-28 12:08:27 -07:00
parent c1161f8889
commit fe6ec03dac
1 changed files with 30 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -427,25 +427,51 @@ Note: `%` is only populated during the `loading tuples` phase

 ## Filtering

-There are a few ways to index nearest neighbor queries with a `WHERE` clause
+There are a few ways to index nearest neighbor queries with a `WHERE` clause.

 ```sql
 SELECT * FROM items WHERE category_id = 123 ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
 ```

-Create an index on one [or more](https://www.postgresql.org/docs/current/indexes-multicolumn.html) of the `WHERE` columns for exact search
+A good place to start is creating an index on the filter column. This can provide fast, exact nearest neighbor search in many cases. Postgres has a number of [index types](https://www.postgresql.org/docs/current/indexes-types.html) for this: B-tree (default), hash, GiST, SP-GiST, GIN, and BRIN.

 ```sql
 CREATE INDEX ON items (category_id);
 ```

-Or a [partial index](https://www.postgresql.org/docs/current/indexes-partial.html) on the vector column for approximate search
+For multiple columns, consider a [multicolumn index](https://www.postgresql.org/docs/current/indexes-multicolumn.html).
+
+```sql
+CREATE INDEX ON items (store_id, category_id);
+```
+
+Exact indexes work well for conditions that match a small fraction of rows. For larger fractions, [approximate indexes](#indexing) can work better.
+
+```sql
+CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
+```
+
+With approximate indexes, filtering is applied after the index is scanned (known as post-filtering). If a condition matches 10% of rows, with HNSW and the default `hnsw.ef_search` of 40, only 4 rows will match on average. For more rows, increase `hnsw.ef_search`.
+
+```sql
+SET hnsw.ef_search = 200;
+```
+
+Starting with 0.8.0, you can enable [iterative index scans](#iterative-index-scans), which will automatically scan more of the index when needed.
+
+```sql
+SET hnsw.iterative_scan = strict_order;
+```
+
+You can also create different approximate indexes for each value (or groups of values).
+
+If filtering by a few distinct values, use [partial indexing](https://www.postgresql.org/docs/current/indexes-partial.html).

 ```sql
 CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WHERE (category_id = 123);
 ```

-Use [partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) for approximate search on many different values of the `WHERE` columns
+If filtering by many different values, use [partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html).

 ```sql
 CREATE TABLE items (embedding vector(3), category_id int) PARTITION BY LIST(category_id);