Commit Graph

49 Commits

Author SHA1 Message Date
Andrew Kane
35ab919bf5 Switched to statically-allocated IndexAmRoutine for Postgres 19 [skip ci] 2026-01-21 16:40:35 -08:00
Andrew Kane
c711da411c Improved includes for indexes 2025-12-11 15:35:37 -08:00
Andrew Kane
3f687687ee Fixed compilation error with Postgres 19 2025-09-04 15:58:23 -07:00
Julien Rouhaud
dd3a1e9137 Use NIL for empty lists (#890)
Postgres standard way to check for list emptiness is to compare a pointer to
NIL rather than NULL.
2025-08-23 03:31:22 -07:00
Andrew Kane
e575866297 Revert "Fixed warnings with Postgres 18 [skip ci]"
This reverts commit 32e95a8598.
2025-04-05 12:56:00 -07:00
Andrew Kane
35f4f7fc80 Improved warning check [skip ci] 2025-04-05 12:38:30 -07:00
Andrew Kane
32e95a8598 Fixed warnings with Postgres 18 [skip ci] 2025-04-05 12:13:38 -07:00
Andrew Kane
a03dc5b7d0 Added fields to IndexAmRoutine for Postgres 18 [skip ci] 2025-04-05 11:31:57 -07:00
Andrew Kane
c530a3c490 Updated comment [skip ci] 2024-10-28 00:56:10 -07:00
Andrew Kane
305d62146e Updated comment [skip ci] 2024-10-27 21:05:32 -07:00
Andrew Kane
f9d627c9a9 Updated default value of hnsw.scan_mem_multiplier [skip ci] 2024-10-27 21:05:04 -07:00
Andrew Kane
857d716d9e Renamed iterative_search to iterative_scan 2024-10-27 14:02:22 -07:00
Andrew Kane
78b877bdaf Revert "Renamed iterative_search to iterative_scan"
This reverts commit 7043cce893.
2024-10-24 20:32:07 -07:00
Andrew Kane
7043cce893 Renamed iterative_search to iterative_scan 2024-10-24 20:31:43 -07:00
Andrew Kane
ac6576e53a Added hnsw.search_mem_multiplier option 2024-10-24 18:02:20 -07:00
Andrew Kane
1291b12090 Added Postgres 18 to CI [skip ci] 2024-10-22 00:38:19 -07:00
Andrew Kane
bfb3a45b31 Use consistent order [skip ci] 2024-10-21 21:47:03 -07:00
Andrew Kane
e718eb8da4 Updated range and defaults for iterative search parameters 2024-10-21 20:38:50 -07:00
Andrew Kane
7484625227 Added comments [skip ci] 2024-10-11 11:59:36 -07:00
Andrew Kane
d1ebb8db73 Use -1 for no limit for ivfflat.max_probes [skip ci] 2024-10-11 11:43:32 -07:00
Andrew Kane
42af8aa1d1 Updated GUC descriptions [skip ci] 2024-10-11 11:26:27 -07:00
Andrew Kane
a3a20f9816 Simplified GUC names [skip ci] 2024-10-11 11:18:01 -07:00
Andrew Kane
2dc392ed6c Updated GUC names [skip ci] 2024-10-10 23:50:11 -07:00
Andrew Kane
961cb17d80 Added iterative search for HNSW [skip ci] 2024-10-10 18:14:39 -07:00
Andrew Kane
77688b4309 Improve total cost for cost estimation (#686) 2024-10-08 12:42:03 -07:00
Andrew Kane
5776a4d937 Only adjust for TOAST [skip ci] 2024-09-25 15:39:56 -07:00
Andrew Kane
242a12b7d5 Added same cost adjustment to HNSW as IVFFlat since TOAST not included in seq scan cost - #682 [skip ci] 2024-09-25 15:33:57 -07:00
Andrew Kane
1370dd6e86 Removed unneeded floor and fixed comment formatting [skip ci] 2024-09-25 14:13:02 -07:00
Andrew Kane
a100dc67e5 Ran pgindent [skip ci] 2024-09-25 14:03:51 -07:00
Jonathan S. Katz
2df9f24aad Update HNSW cost estimatation to utilize search and index info (#682)
Previously, the cost estimation formula for a HNSW index scan utilized
a methodology that only factored in the entry level for an HNSW scan
and the "m" index parameter, which reflects the number of tuples (or
vectors) to scan at each step of a HNSW graph traversal. While this
would bias the PostgreSQL query planner to choose an HNSW index scan
over other available paths, this could lead to potential suboptimal
index selection, for example, choosing to use a HNSW index instead of
an available B-tree index that has better selectivity.

The number of tuples scanned during HNSW graph traversal is principally
influenced by these factors:

 * The number of tuples stored in the index
 * `m` - the number of tuples that are scanned in each step of the graph
   traversal
 * `hnsw.ef_search` - which influences the total number of steps it
   takes for the scan to converge on the approximated nearest neighbors

Through testing different source models for vectors, we also observed
that the correlation of vectors in mdoels would impact this convergence.
For this first iteration, we've opted to hardcode a constant scaling
factor and set it to `0.55`, though a future commit may turn this into
a configurable parameter.

The high-level formula for estimating the cost of a HNSW index scan is
as such:

```
(entryLevel * m) + (layer0TuplesMax * layer0Selectivity)
```

where

- `(entryLevel * m)` is the lower bound of tuples to scan, as it
accounts for the graph traversal to layer 0 (L0). (L1 and above has an ef=1)
- `layer0TuplesMax` is an estimate of the maximum number of tuples to
scan at L0. This accounts for tuples that may end up being discarded due
to them already being visited. Testing shows that the number of steps
until converge is similar to the value of `hnsw.ef_search`, thus we can
estimate tuples max at `hnsw.ef_search * m * 2`
- `layer0Selectivity` - estimates the percentage of tuples that will
actually be scanned during the index traversal, multipled by the scaling
factor

In addition to the `m` build parameter and `hsnw.ef_search`, costs
estimates can be influenced by standard PostgreSQL costing parameters,
though adjusting those (e.g. `random_page_cost`) should be done with
care.

Co-authored-by: @ankane
2024-09-25 14:01:33 -07:00
Andrew Kane
b738ffecc1 Dropped support for Postgres 12 2024-09-19 18:13:54 -07:00
Jonathan S. Katz
05fb382031 Swap max costing values to align with upstream guidance (#658)
A feature targeted for PostgreSQL 18 (postgres/postgres@e2225346)
that makes optimizations around disabled path nodes impacted pgvector
such that PostgreSQL would choose to perform an index scan when it
should have used a different scan (e.g. `SELECT count(*) FROM table`).
Per upstream guidance[1], the recommendation is to switch to using
`get_float8_infinity()`, which achieves the same behavior in backbranches,
and can be adapated to work with the new behavior introduced in PostgreSQL 18.

[1] https://www.postgresql.org/message-id/2281822.1724441531%40sss.pgh.pa.us
2024-09-19 18:01:59 -07:00
Andrew Kane
ea99957fae Added fields to IndexAmRoutine 2024-08-22 20:39:16 -07:00
Andrew Kane
61870a0244 Fixed compilation warning with MSVC and Postgres 16 - fixes #598
Co-authored-by: Xing Guo <higuoxing@gmail.com>
2024-06-16 12:09:01 -07:00
Andrew Kane
58ec5296b0 Reduced support functions for HNSW - #527 2024-04-25 13:21:24 -07:00
Andrew Kane
47d5b2896e Improved support functions for HNSW - #527 2024-04-25 13:00:40 -07:00
Andrew Kane
3eef1ff5c2 Removed type-specific code from HNSW [skip ci] 2024-04-24 14:53:45 -07:00
Andrew Kane
0da6213a60 Moved type lookup to support functions - #527 2024-04-23 13:02:47 -07:00
Andrew Kane
f14c21748b Added support function for l2_normalize [skip ci] 2024-04-22 18:36:47 -07:00
Andrew Kane
5ba62fca84 Fixed crash with shared_preload_libraries - fixes #460 2024-02-14 17:13:30 -08:00
Heikki Linnakangas
e5d1a6bdbb Include reloptions.h directly in the .c files where it's needed
There are no references to anything that's in reloptions.h in the
header files. They need to include genam.h instead, which defines
IndexScanDesc.
2024-01-23 13:02:24 +02:00
Andrew Kane
083008c21e Added validation for GUC parameters 2024-01-22 23:55:30 -08:00
Andrew Kane
a1e526ef82 Dropped support for Postgres 11 2024-01-22 23:52:54 -08:00
Andrew Kane
2d0f162bd7 Added support for in-memory parallel index builds for HNSW
Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
2024-01-22 23:19:10 -08:00
Andrew Kane
dfee5d4045 Added support for on-disk parallel index builds for HNSW 2023-11-11 19:29:45 -08:00
Andrew Kane
b1f9519689 Get info from metapage to determine cost 2023-09-03 12:31:01 -07:00
Andrew Kane
034d4acaea Removed comment [skip ci] 2023-09-02 18:23:08 -07:00
Andrew Kane
1a0d7bccc7 Updated min ef_search to 1 [skip ci] 2023-08-10 20:47:15 -07:00
Andrew Kane
51d292c93d Added HNSW index type - #181 2023-08-08 16:42:47 -07:00