Commit Graph

34 Commits

Author SHA1 Message Date
Andrew Kane
ba8e29600b Added todo [skip ci] 2024-09-28 15:24:21 -07:00
Andrew Kane
9b42662188 Only adjust cost if scanning less than half of the tuples [skip ci] 2024-09-28 12:21:20 -07:00
Andrew Kane
ab57217f48 Added todo [skip ci] 2024-09-28 10:07:09 -07:00
Andrew Kane
40c3e402c7 Removed todo [skip ci] 2024-09-25 17:29:20 -07:00
Andrew Kane
058248fdcc Improved cost code [skip ci] 2024-09-25 17:23:29 -07:00
Andrew Kane
73c5145b77 Use int for ef [skip ci] 2024-09-25 16:52:41 -07:00
Andrew Kane
ec4a23fe49 Added cost estimation [skip ci] 2024-09-25 16:45:04 -07:00
Andrew Kane
38207f5640 Merge branch 'master' into hnsw-streaming 2024-09-25 16:09:09 -07:00
Andrew Kane
5776a4d937 Only adjust for TOAST [skip ci] 2024-09-25 15:39:56 -07:00
Andrew Kane
242a12b7d5 Added same cost adjustment to HNSW as IVFFlat since TOAST not included in seq scan cost - #682 [skip ci] 2024-09-25 15:33:57 -07:00
Andrew Kane
1370dd6e86 Removed unneeded floor and fixed comment formatting [skip ci] 2024-09-25 14:13:02 -07:00
Andrew Kane
a100dc67e5 Ran pgindent [skip ci] 2024-09-25 14:03:51 -07:00
Jonathan S. Katz
2df9f24aad Update HNSW cost estimatation to utilize search and index info (#682)
Previously, the cost estimation formula for a HNSW index scan utilized
a methodology that only factored in the entry level for an HNSW scan
and the "m" index parameter, which reflects the number of tuples (or
vectors) to scan at each step of a HNSW graph traversal. While this
would bias the PostgreSQL query planner to choose an HNSW index scan
over other available paths, this could lead to potential suboptimal
index selection, for example, choosing to use a HNSW index instead of
an available B-tree index that has better selectivity.

The number of tuples scanned during HNSW graph traversal is principally
influenced by these factors:

 * The number of tuples stored in the index
 * `m` - the number of tuples that are scanned in each step of the graph
   traversal
 * `hnsw.ef_search` - which influences the total number of steps it
   takes for the scan to converge on the approximated nearest neighbors

Through testing different source models for vectors, we also observed
that the correlation of vectors in mdoels would impact this convergence.
For this first iteration, we've opted to hardcode a constant scaling
factor and set it to `0.55`, though a future commit may turn this into
a configurable parameter.

The high-level formula for estimating the cost of a HNSW index scan is
as such:

```
(entryLevel * m) + (layer0TuplesMax * layer0Selectivity)
```

where

- `(entryLevel * m)` is the lower bound of tuples to scan, as it
accounts for the graph traversal to layer 0 (L0). (L1 and above has an ef=1)
- `layer0TuplesMax` is an estimate of the maximum number of tuples to
scan at L0. This accounts for tuples that may end up being discarded due
to them already being visited. Testing shows that the number of steps
until converge is similar to the value of `hnsw.ef_search`, thus we can
estimate tuples max at `hnsw.ef_search * m * 2`
- `layer0Selectivity` - estimates the percentage of tuples that will
actually be scanned during the index traversal, multipled by the scaling
factor

In addition to the `m` build parameter and `hsnw.ef_search`, costs
estimates can be influenced by standard PostgreSQL costing parameters,
though adjusting those (e.g. `random_page_cost`) should be done with
care.

Co-authored-by: @ankane
2024-09-25 14:01:33 -07:00
Andrew Kane
495041e43b Added option to limit tuples [skip ci] 2024-09-22 18:10:19 -07:00
Andrew Kane
80cbd32dab Added streaming option for HNSW 2024-09-22 12:02:48 -07:00
Andrew Kane
b738ffecc1 Dropped support for Postgres 12 2024-09-19 18:13:54 -07:00
Jonathan S. Katz
05fb382031 Swap max costing values to align with upstream guidance (#658)
A feature targeted for PostgreSQL 18 (postgres/postgres@e2225346)
that makes optimizations around disabled path nodes impacted pgvector
such that PostgreSQL would choose to perform an index scan when it
should have used a different scan (e.g. `SELECT count(*) FROM table`).
Per upstream guidance[1], the recommendation is to switch to using
`get_float8_infinity()`, which achieves the same behavior in backbranches,
and can be adapated to work with the new behavior introduced in PostgreSQL 18.

[1] https://www.postgresql.org/message-id/2281822.1724441531%40sss.pgh.pa.us
2024-09-19 18:01:59 -07:00
Andrew Kane
ea99957fae Added fields to IndexAmRoutine 2024-08-22 20:39:16 -07:00
Andrew Kane
61870a0244 Fixed compilation warning with MSVC and Postgres 16 - fixes #598
Co-authored-by: Xing Guo <higuoxing@gmail.com>
2024-06-16 12:09:01 -07:00
Andrew Kane
58ec5296b0 Reduced support functions for HNSW - #527 2024-04-25 13:21:24 -07:00
Andrew Kane
47d5b2896e Improved support functions for HNSW - #527 2024-04-25 13:00:40 -07:00
Andrew Kane
3eef1ff5c2 Removed type-specific code from HNSW [skip ci] 2024-04-24 14:53:45 -07:00
Andrew Kane
0da6213a60 Moved type lookup to support functions - #527 2024-04-23 13:02:47 -07:00
Andrew Kane
f14c21748b Added support function for l2_normalize [skip ci] 2024-04-22 18:36:47 -07:00
Andrew Kane
5ba62fca84 Fixed crash with shared_preload_libraries - fixes #460 2024-02-14 17:13:30 -08:00
Heikki Linnakangas
e5d1a6bdbb Include reloptions.h directly in the .c files where it's needed
There are no references to anything that's in reloptions.h in the
header files. They need to include genam.h instead, which defines
IndexScanDesc.
2024-01-23 13:02:24 +02:00
Andrew Kane
083008c21e Added validation for GUC parameters 2024-01-22 23:55:30 -08:00
Andrew Kane
a1e526ef82 Dropped support for Postgres 11 2024-01-22 23:52:54 -08:00
Andrew Kane
2d0f162bd7 Added support for in-memory parallel index builds for HNSW
Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
2024-01-22 23:19:10 -08:00
Andrew Kane
dfee5d4045 Added support for on-disk parallel index builds for HNSW 2023-11-11 19:29:45 -08:00
Andrew Kane
b1f9519689 Get info from metapage to determine cost 2023-09-03 12:31:01 -07:00
Andrew Kane
034d4acaea Removed comment [skip ci] 2023-09-02 18:23:08 -07:00
Andrew Kane
1a0d7bccc7 Updated min ef_search to 1 [skip ci] 2023-08-10 20:47:15 -07:00
Andrew Kane
51d292c93d Added HNSW index type - #181 2023-08-08 16:42:47 -07:00