Commit Graph

1576 Commits

Author SHA1 Message Date
Andrew Kane
c0f6570c4a Debug updates [skip ci] 2024-10-08 22:58:09 -07:00
Andrew Kane
1d3d0f46ac Use DEBUG1 for exceeding work_mem [skip ci] 2024-10-08 22:52:12 -07:00
Andrew Kane
caabac33b8 Fixed compilation [skip ci] 2024-09-29 21:33:07 -07:00
Andrew Kane
8b9333d468 Merge branch 'master' into hnsw-streaming 2024-09-29 15:13:51 -07:00
Andrew Kane
f2afd11257 Use sc for search candidates [skip ci] 2024-09-29 15:09:54 -07:00
Andrew Kane
cae3458329 Updated distance to use double 2024-09-29 15:06:50 -07:00
Andrew Kane
87f5e40495 Updated name [skip ci] 2024-09-29 13:46:05 -07:00
Andrew Kane
8798c1474c Updated naming [skip ci] 2024-09-29 13:45:25 -07:00
Andrew Kane
1b337ad97d Changed option to enum [skip ci] 2024-09-29 13:39:53 -07:00
Andrew Kane
0047630baf Updated readme [skip ci] 2024-09-29 10:37:07 -07:00
Andrew Kane
dc23752618 Fixed uninitialized variable [skip ci] 2024-09-28 19:18:52 -07:00
Andrew Kane
8b253359ab Improved code [skip ci] 2024-09-28 16:33:30 -07:00
Andrew Kane
8de5f55b0b Fixed assertion [skip ci] 2024-09-28 16:18:53 -07:00
Andrew Kane
351db562af Improved code [skip ci] 2024-09-28 16:07:31 -07:00
Andrew Kane
e1c2d03dba Improved cost estimation [skip ci] 2024-09-28 16:04:07 -07:00
Andrew Kane
ba8e29600b Added todo [skip ci] 2024-09-28 15:24:21 -07:00
Andrew Kane
158d9340bc Added distance filters to cost tests [skip ci] 2024-09-28 14:50:23 -07:00
Andrew Kane
49e05fb5ba Updated readme [skip ci] 2024-09-28 13:13:10 -07:00
Andrew Kane
9b42662188 Only adjust cost if scanning less than half of the tuples [skip ci] 2024-09-28 12:21:20 -07:00
Andrew Kane
7265927fd6 Updated readme [skip ci] 2024-09-28 12:07:09 -07:00
Andrew Kane
ab57217f48 Added todo [skip ci] 2024-09-28 10:07:09 -07:00
Andrew Kane
5f6e031ccc Updated changelog [skip ci] 2024-09-28 09:57:05 -07:00
Andrew Kane
5ee0471ead Updated readme [skip ci] 2024-09-28 09:23:41 -07:00
Andrew Kane
54f8d9733d Updated default Postgres version in Dockerfile [skip ci] 2024-09-27 16:19:57 -07:00
Andrew Kane
cf419f448b Updated Postgres version for Docker [skip ci] 2024-09-27 16:19:06 -07:00
Andrew Kane
8a2eebd6a4 Added note about Postgres 17 on Windows - #669 [skip ci] 2024-09-27 14:05:36 -07:00
Andrew Kane
daf9c5c743 Updated package versions in readme [skip ci] 2024-09-27 13:52:11 -07:00
Andrew Kane
2bca4e406b Restored quarterly package version for FreeBSD in readme [skip ci] 2024-09-27 13:50:57 -07:00
Andrew Kane
74020a90da Updated package versions in readme [skip ci] 2024-09-27 13:49:43 -07:00
Andrew Kane
44d8d28b40 Added note about postgresql@17 formula [skip ci] 2024-09-27 13:39:54 -07:00
Andrew Kane
1a1221f905 Merge branch 'master' into hnsw-streaming 2024-09-26 08:34:07 -07:00
Andrew Kane
54fa16e3e3 Added safety check [skip ci] 2024-09-26 08:32:44 -07:00
Andrew Kane
40c3e402c7 Removed todo [skip ci] 2024-09-25 17:29:20 -07:00
Andrew Kane
058248fdcc Improved cost code [skip ci] 2024-09-25 17:23:29 -07:00
Andrew Kane
73c5145b77 Use int for ef [skip ci] 2024-09-25 16:52:41 -07:00
Andrew Kane
ec4a23fe49 Added cost estimation [skip ci] 2024-09-25 16:45:04 -07:00
Andrew Kane
38207f5640 Merge branch 'master' into hnsw-streaming 2024-09-25 16:09:09 -07:00
Andrew Kane
46de265a24 Updated changelog [skip ci] 2024-09-25 16:03:58 -07:00
Andrew Kane
b8c27914d4 Improved test [skip ci] 2024-09-25 15:58:41 -07:00
Andrew Kane
2d85af51a8 Added test for IVFFlat costs [skip ci] 2024-09-25 15:53:34 -07:00
Andrew Kane
e0ad441306 Added test for costs [skip ci] 2024-09-25 15:49:03 -07:00
Andrew Kane
5776a4d937 Only adjust for TOAST [skip ci] 2024-09-25 15:39:56 -07:00
Andrew Kane
242a12b7d5 Added same cost adjustment to HNSW as IVFFlat since TOAST not included in seq scan cost - #682 [skip ci] 2024-09-25 15:33:57 -07:00
Andrew Kane
1370dd6e86 Removed unneeded floor and fixed comment formatting [skip ci] 2024-09-25 14:13:02 -07:00
Andrew Kane
a100dc67e5 Ran pgindent [skip ci] 2024-09-25 14:03:51 -07:00
Jonathan S. Katz
2df9f24aad Update HNSW cost estimatation to utilize search and index info (#682)
Previously, the cost estimation formula for a HNSW index scan utilized
a methodology that only factored in the entry level for an HNSW scan
and the "m" index parameter, which reflects the number of tuples (or
vectors) to scan at each step of a HNSW graph traversal. While this
would bias the PostgreSQL query planner to choose an HNSW index scan
over other available paths, this could lead to potential suboptimal
index selection, for example, choosing to use a HNSW index instead of
an available B-tree index that has better selectivity.

The number of tuples scanned during HNSW graph traversal is principally
influenced by these factors:

 * The number of tuples stored in the index
 * `m` - the number of tuples that are scanned in each step of the graph
   traversal
 * `hnsw.ef_search` - which influences the total number of steps it
   takes for the scan to converge on the approximated nearest neighbors

Through testing different source models for vectors, we also observed
that the correlation of vectors in mdoels would impact this convergence.
For this first iteration, we've opted to hardcode a constant scaling
factor and set it to `0.55`, though a future commit may turn this into
a configurable parameter.

The high-level formula for estimating the cost of a HNSW index scan is
as such:

```
(entryLevel * m) + (layer0TuplesMax * layer0Selectivity)
```

where

- `(entryLevel * m)` is the lower bound of tuples to scan, as it
accounts for the graph traversal to layer 0 (L0). (L1 and above has an ef=1)
- `layer0TuplesMax` is an estimate of the maximum number of tuples to
scan at L0. This accounts for tuples that may end up being discarded due
to them already being visited. Testing shows that the number of steps
until converge is similar to the value of `hnsw.ef_search`, thus we can
estimate tuples max at `hnsw.ef_search * m * 2`
- `layer0Selectivity` - estimates the percentage of tuples that will
actually be scanned during the index traversal, multipled by the scaling
factor

In addition to the `m` build parameter and `hsnw.ef_search`, costs
estimates can be influenced by standard PostgreSQL costing parameters,
though adjusting those (e.g. `random_page_cost`) should be done with
care.

Co-authored-by: @ankane
2024-09-25 14:01:33 -07:00
Andrew Kane
8e979ed377 Do not adjust index selectivity based on probes [skip ci] 2024-09-25 13:48:24 -07:00
Andrew Kane
4e35c6abe3 Updated readme [skip ci] 2024-09-24 23:24:48 -07:00
Andrew Kane
77b3d1f2a8 Added test for join with attribute filtering [skip ci] 2024-09-24 23:21:34 -07:00
Andrew Kane
ecd0738728 Improved test [skip ci] 2024-09-24 23:13:30 -07:00