Commit Graph

184 Commits

Author SHA1 Message Date
Andrew Kane
fb87b6da91 Fixed test 2024-10-28 13:55:26 -07:00
Andrew Kane
6a30c1e824 Fixed test [skip ci] 2024-10-27 21:10:51 -07:00
Andrew Kane
857d716d9e Renamed iterative_search to iterative_scan 2024-10-27 14:02:22 -07:00
Andrew Kane
78b877bdaf Revert "Renamed iterative_search to iterative_scan"
This reverts commit 7043cce893.
2024-10-24 20:32:07 -07:00
Andrew Kane
7043cce893 Renamed iterative_search to iterative_scan 2024-10-24 20:31:43 -07:00
Andrew Kane
ac6576e53a Added hnsw.search_mem_multiplier option 2024-10-24 18:02:20 -07:00
Andrew Kane
e718eb8da4 Updated range and defaults for iterative search parameters 2024-10-21 20:38:50 -07:00
Andrew Kane
049972a4a3 Improved test output [skip ci] 2024-10-13 17:22:49 -07:00
Andrew Kane
61027645e9 Improved test output [skip ci] 2024-10-13 17:21:38 -07:00
Andrew Kane
a41b327b33 Speed up test [skip ci] 2024-10-13 17:12:12 -07:00
Andrew Kane
7f735ebd9b Added test for strict order [skip ci] 2024-10-13 17:04:03 -07:00
Andrew Kane
388e42f6e6 Fixed flaky test [skip ci] 2024-10-11 15:48:19 -07:00
Andrew Kane
a3a20f9816 Simplified GUC names [skip ci] 2024-10-11 11:18:01 -07:00
Andrew Kane
2dc392ed6c Updated GUC names [skip ci] 2024-10-10 23:50:11 -07:00
Andrew Kane
961cb17d80 Added iterative search for HNSW [skip ci] 2024-10-10 18:14:39 -07:00
Andrew Kane
c91ed7b2c3 Added iterative search for IVFFlat [skip ci] 2024-10-10 18:12:27 -07:00
Andrew Kane
08d0340655 Improved IVFFlat vacuum test [skip ci] 2024-10-10 14:24:26 -07:00
Andrew Kane
77688b4309 Improve total cost for cost estimation (#686) 2024-10-08 12:42:03 -07:00
Andrew Kane
158d9340bc Added distance filters to cost tests [skip ci] 2024-09-28 14:50:23 -07:00
Andrew Kane
b8c27914d4 Improved test [skip ci] 2024-09-25 15:58:41 -07:00
Andrew Kane
2d85af51a8 Added test for IVFFlat costs [skip ci] 2024-09-25 15:53:34 -07:00
Andrew Kane
e0ad441306 Added test for costs [skip ci] 2024-09-25 15:49:03 -07:00
Jonathan S. Katz
2df9f24aad Update HNSW cost estimatation to utilize search and index info (#682)
Previously, the cost estimation formula for a HNSW index scan utilized
a methodology that only factored in the entry level for an HNSW scan
and the "m" index parameter, which reflects the number of tuples (or
vectors) to scan at each step of a HNSW graph traversal. While this
would bias the PostgreSQL query planner to choose an HNSW index scan
over other available paths, this could lead to potential suboptimal
index selection, for example, choosing to use a HNSW index instead of
an available B-tree index that has better selectivity.

The number of tuples scanned during HNSW graph traversal is principally
influenced by these factors:

 * The number of tuples stored in the index
 * `m` - the number of tuples that are scanned in each step of the graph
   traversal
 * `hnsw.ef_search` - which influences the total number of steps it
   takes for the scan to converge on the approximated nearest neighbors

Through testing different source models for vectors, we also observed
that the correlation of vectors in mdoels would impact this convergence.
For this first iteration, we've opted to hardcode a constant scaling
factor and set it to `0.55`, though a future commit may turn this into
a configurable parameter.

The high-level formula for estimating the cost of a HNSW index scan is
as such:

```
(entryLevel * m) + (layer0TuplesMax * layer0Selectivity)
```

where

- `(entryLevel * m)` is the lower bound of tuples to scan, as it
accounts for the graph traversal to layer 0 (L0). (L1 and above has an ef=1)
- `layer0TuplesMax` is an estimate of the maximum number of tuples to
scan at L0. This accounts for tuples that may end up being discarded due
to them already being visited. Testing shows that the number of steps
until converge is similar to the value of `hnsw.ef_search`, thus we can
estimate tuples max at `hnsw.ef_search * m * 2`
- `layer0Selectivity` - estimates the percentage of tuples that will
actually be scanned during the index traversal, multipled by the scaling
factor

In addition to the `m` build parameter and `hsnw.ef_search`, costs
estimates can be influenced by standard PostgreSQL costing parameters,
though adjusting those (e.g. `random_page_cost`) should be done with
care.

Co-authored-by: @ankane
2024-09-25 14:01:33 -07:00
Andrew Kane
8e979ed377 Do not adjust index selectivity based on probes [skip ci] 2024-09-25 13:48:24 -07:00
Andrew Kane
77b3d1f2a8 Added test for join with attribute filtering [skip ci] 2024-09-24 23:21:34 -07:00
Andrew Kane
ecd0738728 Improved test [skip ci] 2024-09-24 23:13:30 -07:00
Andrew Kane
62ffc3641c Added test for join [skip ci] 2024-09-24 23:12:27 -07:00
Andrew Kane
020d3edaa9 Changed warnings to errors for TAP tests [skip ci] 2024-07-27 12:28:03 -07:00
Andrew Kane
1e9e355175 Updated TAP tests to use PostgreSQL::Test packages [skip ci] 2024-07-27 12:24:15 -07:00
Andrew Kane
8684c2ba62 Updated formatting [skip ci] 2024-07-27 08:25:09 -07:00
Andrew Kane
8c5a4bfb6c Fixed failed to add index item error with sparsevec - fixes #625 2024-07-19 13:54:36 -07:00
Andrew Kane
24c8a2ff40 Fixed flaky tests [skip ci] 2024-04-29 13:54:30 -07:00
Andrew Kane
b52beefbc6 Added basic fuzz testing for input functions 2024-04-27 10:49:45 -07:00
Andrew Kane
c9fb66d54d Fixed flaky tests 2024-04-26 13:20:27 -07:00
Andrew Kane
48e68e5e42 Improved HNSW recall tests - #535 2024-04-26 13:08:48 -07:00
Andrew Kane
78d32943ac Added test for halfvec sum 2024-04-25 22:03:34 -07:00
Andrew Kane
cf494f15ac Added aggregate test for halfvec [skip ci] 2024-04-25 21:42:10 -07:00
Andrew Kane
1475c06902 Reordered TAP tests [skip ci] 2024-04-25 21:08:55 -07:00
Andrew Kane
7140a18283 Reordered TAP tests [skip ci] 2024-04-25 21:04:23 -07:00
Andrew Kane
7dcdaef96c Renamed TAP tests [skip ci] 2024-04-25 20:57:41 -07:00
Andrew Kane
914f9aa04a Fixed flaky test [skip ci] 2024-04-25 11:57:40 -07:00
Andrew Kane
c67dc6f9b0 Added test for bit with duplicate centers 2024-04-25 10:29:28 -07:00
Andrew Kane
c39cb25c32 Fixed flaky tests [skip ci] 2024-04-24 22:26:08 -07:00
Andrew Kane
b2f7dad8a7 Removed support for L1 distance and Jaccard distance from ivfflat due to non-optimal clustering 2024-04-22 14:11:29 -07:00
Andrew Kane
f9c071a761 Improved tests for L1 distance with halfvec 2024-04-22 13:14:45 -07:00
Andrew Kane
9f4b770db3 Added support for indexing sparsevec with L1 distance [skip ci] 2024-04-22 13:08:12 -07:00
Andrew Kane
70b299a7ff Added support for indexing halfvec with L1 distance [skip ci] 2024-04-22 13:00:59 -07:00
Andrew Kane
af9d50481d Added support for indexing L1 distance 2024-04-22 12:44:03 -07:00
Andrew Kane
77ec24641e Fixed flaky test [skip ci] 2024-04-17 00:27:37 -07:00
Andrew Kane
c361f80465 Synced recall test [skip ci] 2024-04-16 17:27:06 -07:00