Added section on subvector indexing [skip ci]

This commit is contained in:
Andrew Kane
2024-04-13 18:18:12 -07:00
parent 8a4845b52e
commit 0c9ad67a1c

View File

@@ -528,6 +528,30 @@ SELECT id, content FROM items, plainto_tsquery('hello search') query
You can use [Reciprocal Rank Fusion](https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search_rrf.py) or a [cross-encoder](https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search.py) to combine results.
## Subvector Indexing
*Unreleased*
Use expression indexing to index subvectors
```sql
CREATE INDEX ON items USING hnsw ((subvector(embedding, 1, 3)::vector(3)) vector_cosine_ops);
```
Get the nearest neighbors by cosine distance
```sql
SELECT * FROM items ORDER BY subvector(embedding, 1, 3)::vector(3) <=> subvector('[1,2,3,4,5]'::vector, 1, 3) LIMIT 5;
```
Re-rank by the full vectors for better recall
```sql
SELECT * FROM (
SELECT * FROM items ORDER BY subvector(embedding, 1, 3)::vector(3) <=> subvector('[1,2,3,4,5]'::vector, 1, 3) LIMIT 20
) ORDER BY embedding <=> '[1,2,3,4,5]' LIMIT 5;
```
## Performance
### Tuning