From 49e05fb5bab37402c2b586fd7748d3cf0b8470b2 Mon Sep 17 00:00:00 2001 From: Andrew Kane Date: Sat, 28 Sep 2024 13:13:10 -0700 Subject: [PATCH] Updated readme [skip ci] --- README.md | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index 6061068..b3f9ddf 100644 --- a/README.md +++ b/README.md @@ -459,6 +459,26 @@ SET hnsw.streaming = on; SET ivfflat.streaming = on; ``` +However, there are some important caveats. + +### Streaming Caveats + +With streaming queries, it’s possible for rows to be slightly out of order by distance. For strict ordering, use: + +```sql +WITH approx_order AS MATERIALIZED ( + SELECT *, embedding <-> '[1,2,3]' AS distance FROM items WHERE ... ORDER BY distance LIMIT 5 +) SELECT * FROM approx_order ORDER BY distance; +``` + +For distance filters, use a CTE and place the filter outside it. + +```sql +WITH approx_order AS MATERIALIZED ( + SELECT *, embedding <-> '[1,2,3]' AS distance FROM items WHERE ... ORDER BY distance LIMIT 5 +) SELECT * FROM approx_order WHERE distance < 0.1 ORDER BY distance; +``` + ### Streaming Options Since scanning a large portion of the index is expensive, there are options to control when the scan ends. @@ -492,24 +512,6 @@ Specify the max number of probes SET ivfflat.max_probes = 100; ``` -### Streaming Caveats - -With streaming queries, it’s possible for rows to be slightly out of order by distance. For strict ordering, use: - -```sql -WITH approx_order AS MATERIALIZED ( - SELECT *, embedding <-> '[1,2,3]' AS distance FROM items WHERE ... ORDER BY distance LIMIT 5 -) SELECT * FROM approx_order ORDER BY distance; -``` - -Distance filters should be placed outside the CTE for best performance. - -```sql -WITH approx_order AS MATERIALIZED ( - SELECT *, embedding <-> '[1,2,3]' AS distance FROM items WHERE ... ORDER BY distance LIMIT 5 -) SELECT * FROM approx_order WHERE distance < 0.1 ORDER BY distance; -``` - ## Half-Precision Vectors *Added in 0.7.0*