Commit Graph

355 Commits

Author SHA1 Message Date
Andrew Kane
67e648b13e Added another test for binary_quantize function [skip ci] 2025-08-29 01:09:18 -07:00
Andrew Kane
656b059258 Added tests for iterative index scan with empty index - resolves #679 [skip ci] 2025-02-13 16:28:14 -08:00
Andrew Kane
258215ad97 Improved test [skip ci] 2024-10-28 15:54:39 -07:00
Andrew Kane
fb87b6da91 Fixed test 2024-10-28 13:55:26 -07:00
Andrew Kane
6a30c1e824 Fixed test [skip ci] 2024-10-27 21:10:51 -07:00
Andrew Kane
f9d627c9a9 Updated default value of hnsw.scan_mem_multiplier [skip ci] 2024-10-27 21:05:04 -07:00
Andrew Kane
38f42820be Added test for hnsw.scan_mem_multiplier [skip ci] 2024-10-27 20:05:58 -07:00
Andrew Kane
857d716d9e Renamed iterative_search to iterative_scan 2024-10-27 14:02:22 -07:00
Andrew Kane
78b877bdaf Revert "Renamed iterative_search to iterative_scan"
This reverts commit 7043cce893.
2024-10-24 20:32:07 -07:00
Andrew Kane
7043cce893 Renamed iterative_search to iterative_scan 2024-10-24 20:31:43 -07:00
Andrew Kane
ac6576e53a Added hnsw.search_mem_multiplier option 2024-10-24 18:02:20 -07:00
Andrew Kane
e718eb8da4 Updated range and defaults for iterative search parameters 2024-10-21 20:38:50 -07:00
Andrew Kane
049972a4a3 Improved test output [skip ci] 2024-10-13 17:22:49 -07:00
Andrew Kane
61027645e9 Improved test output [skip ci] 2024-10-13 17:21:38 -07:00
Andrew Kane
a41b327b33 Speed up test [skip ci] 2024-10-13 17:12:12 -07:00
Andrew Kane
7f735ebd9b Added test for strict order [skip ci] 2024-10-13 17:04:03 -07:00
Andrew Kane
388e42f6e6 Fixed flaky test [skip ci] 2024-10-11 15:48:19 -07:00
Andrew Kane
d1ebb8db73 Use -1 for no limit for ivfflat.max_probes [skip ci] 2024-10-11 11:43:32 -07:00
Andrew Kane
a3a20f9816 Simplified GUC names [skip ci] 2024-10-11 11:18:01 -07:00
Andrew Kane
b26a21b848 Added regression tests for iterative search [skip ci] 2024-10-11 11:07:11 -07:00
Andrew Kane
2dc392ed6c Updated GUC names [skip ci] 2024-10-10 23:50:11 -07:00
Andrew Kane
961cb17d80 Added iterative search for HNSW [skip ci] 2024-10-10 18:14:39 -07:00
Andrew Kane
c91ed7b2c3 Added iterative search for IVFFlat [skip ci] 2024-10-10 18:12:27 -07:00
Andrew Kane
08d0340655 Improved IVFFlat vacuum test [skip ci] 2024-10-10 14:24:26 -07:00
Andrew Kane
77688b4309 Improve total cost for cost estimation (#686) 2024-10-08 12:42:03 -07:00
Andrew Kane
158d9340bc Added distance filters to cost tests [skip ci] 2024-09-28 14:50:23 -07:00
Andrew Kane
b8c27914d4 Improved test [skip ci] 2024-09-25 15:58:41 -07:00
Andrew Kane
2d85af51a8 Added test for IVFFlat costs [skip ci] 2024-09-25 15:53:34 -07:00
Andrew Kane
e0ad441306 Added test for costs [skip ci] 2024-09-25 15:49:03 -07:00
Jonathan S. Katz
2df9f24aad Update HNSW cost estimatation to utilize search and index info (#682)
Previously, the cost estimation formula for a HNSW index scan utilized
a methodology that only factored in the entry level for an HNSW scan
and the "m" index parameter, which reflects the number of tuples (or
vectors) to scan at each step of a HNSW graph traversal. While this
would bias the PostgreSQL query planner to choose an HNSW index scan
over other available paths, this could lead to potential suboptimal
index selection, for example, choosing to use a HNSW index instead of
an available B-tree index that has better selectivity.

The number of tuples scanned during HNSW graph traversal is principally
influenced by these factors:

 * The number of tuples stored in the index
 * `m` - the number of tuples that are scanned in each step of the graph
   traversal
 * `hnsw.ef_search` - which influences the total number of steps it
   takes for the scan to converge on the approximated nearest neighbors

Through testing different source models for vectors, we also observed
that the correlation of vectors in mdoels would impact this convergence.
For this first iteration, we've opted to hardcode a constant scaling
factor and set it to `0.55`, though a future commit may turn this into
a configurable parameter.

The high-level formula for estimating the cost of a HNSW index scan is
as such:

```
(entryLevel * m) + (layer0TuplesMax * layer0Selectivity)
```

where

- `(entryLevel * m)` is the lower bound of tuples to scan, as it
accounts for the graph traversal to layer 0 (L0). (L1 and above has an ef=1)
- `layer0TuplesMax` is an estimate of the maximum number of tuples to
scan at L0. This accounts for tuples that may end up being discarded due
to them already being visited. Testing shows that the number of steps
until converge is similar to the value of `hnsw.ef_search`, thus we can
estimate tuples max at `hnsw.ef_search * m * 2`
- `layer0Selectivity` - estimates the percentage of tuples that will
actually be scanned during the index traversal, multipled by the scaling
factor

In addition to the `m` build parameter and `hsnw.ef_search`, costs
estimates can be influenced by standard PostgreSQL costing parameters,
though adjusting those (e.g. `random_page_cost`) should be done with
care.

Co-authored-by: @ankane
2024-09-25 14:01:33 -07:00
Andrew Kane
8e979ed377 Do not adjust index selectivity based on probes [skip ci] 2024-09-25 13:48:24 -07:00
Andrew Kane
77b3d1f2a8 Added test for join with attribute filtering [skip ci] 2024-09-24 23:21:34 -07:00
Andrew Kane
ecd0738728 Improved test [skip ci] 2024-09-24 23:13:30 -07:00
Andrew Kane
62ffc3641c Added test for join [skip ci] 2024-09-24 23:12:27 -07:00
Andrew Kane
3fb05eb847 Added casts for arrays to sparsevec - #604
Co-authored-by: Narek Galstyan <narekg@berkeley.edu>
Co-authored-by: Di Qi <di@lantern.dev>
2024-09-19 19:17:05 -07:00
Andrew Kane
020d3edaa9 Changed warnings to errors for TAP tests [skip ci] 2024-07-27 12:28:03 -07:00
Andrew Kane
1e9e355175 Updated TAP tests to use PostgreSQL::Test packages [skip ci] 2024-07-27 12:24:15 -07:00
Andrew Kane
8684c2ba62 Updated formatting [skip ci] 2024-07-27 08:25:09 -07:00
Andrew Kane
8c5a4bfb6c Fixed failed to add index item error with sparsevec - fixes #625 2024-07-19 13:54:36 -07:00
Andrew Kane
24c8a2ff40 Fixed flaky tests [skip ci] 2024-04-29 13:54:30 -07:00
Andrew Kane
6df583a6f6 Fixed regression test for vector type 2024-04-29 13:48:04 -07:00
Andrew Kane
b52beefbc6 Added basic fuzz testing for input functions 2024-04-27 10:49:45 -07:00
Andrew Kane
6f2afb16ff Use consistent error message for sparsevec index out of bounds [skip ci] 2024-04-26 17:27:09 -07:00
Andrew Kane
1e94907179 Improved sparsevec error messages [skip ci] 2024-04-26 17:11:11 -07:00
Andrew Kane
c9fb66d54d Fixed flaky tests 2024-04-26 13:20:27 -07:00
Andrew Kane
48e68e5e42 Improved HNSW recall tests - #535 2024-04-26 13:08:48 -07:00
Andrew Kane
78d32943ac Added test for halfvec sum 2024-04-25 22:03:34 -07:00
Andrew Kane
cf494f15ac Added aggregate test for halfvec [skip ci] 2024-04-25 21:42:10 -07:00
Andrew Kane
1475c06902 Reordered TAP tests [skip ci] 2024-04-25 21:08:55 -07:00
Andrew Kane
7140a18283 Reordered TAP tests [skip ci] 2024-04-25 21:04:23 -07:00