pgvector

mirror of https://github.com/pgvector/pgvector.git synced 2026-07-02 02:31:16 +08:00

Author	SHA1	Message	Date
Andrew Kane	35ab919bf5	Switched to statically-allocated IndexAmRoutine for Postgres 19 [skip ci]	2026-01-21 16:40:35 -08:00
Andrew Kane	c711da411c	Improved includes for indexes	2025-12-11 15:35:37 -08:00
Andrew Kane	3f687687ee	Fixed compilation error with Postgres 19	2025-09-04 15:58:23 -07:00
Julien Rouhaud	dd3a1e9137	Use NIL for empty lists (#890 ) Postgres standard way to check for list emptiness is to compare a pointer to NIL rather than NULL.	2025-08-23 03:31:22 -07:00
Andrew Kane	e575866297	Revert "Fixed warnings with Postgres 18 [skip ci]" This reverts commit `32e95a8598`.	2025-04-05 12:56:00 -07:00
Andrew Kane	35f4f7fc80	Improved warning check [skip ci]	2025-04-05 12:38:30 -07:00
Andrew Kane	32e95a8598	Fixed warnings with Postgres 18 [skip ci]	2025-04-05 12:13:38 -07:00
Andrew Kane	a03dc5b7d0	Added fields to IndexAmRoutine for Postgres 18 [skip ci]	2025-04-05 11:31:57 -07:00
Andrew Kane	c530a3c490	Updated comment [skip ci]	2024-10-28 00:56:10 -07:00
Andrew Kane	305d62146e	Updated comment [skip ci]	2024-10-27 21:05:32 -07:00
Andrew Kane	f9d627c9a9	Updated default value of hnsw.scan_mem_multiplier [skip ci]	2024-10-27 21:05:04 -07:00
Andrew Kane	857d716d9e	Renamed iterative_search to iterative_scan	2024-10-27 14:02:22 -07:00
Andrew Kane	78b877bdaf	Revert "Renamed iterative_search to iterative_scan" This reverts commit `7043cce893`.	2024-10-24 20:32:07 -07:00
Andrew Kane	7043cce893	Renamed iterative_search to iterative_scan	2024-10-24 20:31:43 -07:00
Andrew Kane	ac6576e53a	Added hnsw.search_mem_multiplier option	2024-10-24 18:02:20 -07:00
Andrew Kane	1291b12090	Added Postgres 18 to CI [skip ci]	2024-10-22 00:38:19 -07:00
Andrew Kane	bfb3a45b31	Use consistent order [skip ci]	2024-10-21 21:47:03 -07:00
Andrew Kane	e718eb8da4	Updated range and defaults for iterative search parameters	2024-10-21 20:38:50 -07:00
Andrew Kane	7484625227	Added comments [skip ci]	2024-10-11 11:59:36 -07:00
Andrew Kane	d1ebb8db73	Use -1 for no limit for ivfflat.max_probes [skip ci]	2024-10-11 11:43:32 -07:00
Andrew Kane	42af8aa1d1	Updated GUC descriptions [skip ci]	2024-10-11 11:26:27 -07:00
Andrew Kane	a3a20f9816	Simplified GUC names [skip ci]	2024-10-11 11:18:01 -07:00
Andrew Kane	2dc392ed6c	Updated GUC names [skip ci]	2024-10-10 23:50:11 -07:00
Andrew Kane	961cb17d80	Added iterative search for HNSW [skip ci]	2024-10-10 18:14:39 -07:00
Andrew Kane	77688b4309	Improve total cost for cost estimation (#686 )	2024-10-08 12:42:03 -07:00
Andrew Kane	5776a4d937	Only adjust for TOAST [skip ci]	2024-09-25 15:39:56 -07:00
Andrew Kane	242a12b7d5	Added same cost adjustment to HNSW as IVFFlat since TOAST not included in seq scan cost - #682 [skip ci]	2024-09-25 15:33:57 -07:00
Andrew Kane	1370dd6e86	Removed unneeded floor and fixed comment formatting [skip ci]	2024-09-25 14:13:02 -07:00
Andrew Kane	a100dc67e5	Ran pgindent [skip ci]	2024-09-25 14:03:51 -07:00
Jonathan S. Katz	2df9f24aad	Update HNSW cost estimatation to utilize search and index info (#682 ) Previously, the cost estimation formula for a HNSW index scan utilized a methodology that only factored in the entry level for an HNSW scan and the "m" index parameter, which reflects the number of tuples (or vectors) to scan at each step of a HNSW graph traversal. While this would bias the PostgreSQL query planner to choose an HNSW index scan over other available paths, this could lead to potential suboptimal index selection, for example, choosing to use a HNSW index instead of an available B-tree index that has better selectivity. The number of tuples scanned during HNSW graph traversal is principally influenced by these factors: * The number of tuples stored in the index * `m` - the number of tuples that are scanned in each step of the graph traversal * `hnsw.ef_search` - which influences the total number of steps it takes for the scan to converge on the approximated nearest neighbors Through testing different source models for vectors, we also observed that the correlation of vectors in mdoels would impact this convergence. For this first iteration, we've opted to hardcode a constant scaling factor and set it to `0.55`, though a future commit may turn this into a configurable parameter. The high-level formula for estimating the cost of a HNSW index scan is as such: ``` (entryLevel * m) + (layer0TuplesMax * layer0Selectivity) ``` where - `(entryLevel * m)` is the lower bound of tuples to scan, as it accounts for the graph traversal to layer 0 (L0). (L1 and above has an ef=1) - `layer0TuplesMax` is an estimate of the maximum number of tuples to scan at L0. This accounts for tuples that may end up being discarded due to them already being visited. Testing shows that the number of steps until converge is similar to the value of `hnsw.ef_search`, thus we can estimate tuples max at `hnsw.ef_search * m * 2` - `layer0Selectivity` - estimates the percentage of tuples that will actually be scanned during the index traversal, multipled by the scaling factor In addition to the `m` build parameter and `hsnw.ef_search`, costs estimates can be influenced by standard PostgreSQL costing parameters, though adjusting those (e.g. `random_page_cost`) should be done with care. Co-authored-by: @ankane	2024-09-25 14:01:33 -07:00
Andrew Kane	b738ffecc1	Dropped support for Postgres 12	2024-09-19 18:13:54 -07:00
Jonathan S. Katz	05fb382031	Swap max costing values to align with upstream guidance (#658 ) A feature targeted for PostgreSQL 18 (postgres/postgres@e2225346) that makes optimizations around disabled path nodes impacted pgvector such that PostgreSQL would choose to perform an index scan when it should have used a different scan (e.g. `SELECT count(*) FROM table`). Per upstream guidance[1], the recommendation is to switch to using `get_float8_infinity()`, which achieves the same behavior in backbranches, and can be adapated to work with the new behavior introduced in PostgreSQL 18. [1] https://www.postgresql.org/message-id/2281822.1724441531%40sss.pgh.pa.us	2024-09-19 18:01:59 -07:00
Andrew Kane	ea99957fae	Added fields to IndexAmRoutine	2024-08-22 20:39:16 -07:00
Andrew Kane	61870a0244	Fixed compilation warning with MSVC and Postgres 16 - fixes #598 Co-authored-by: Xing Guo <higuoxing@gmail.com>	2024-06-16 12:09:01 -07:00
Andrew Kane	58ec5296b0	Reduced support functions for HNSW - #527	2024-04-25 13:21:24 -07:00
Andrew Kane	47d5b2896e	Improved support functions for HNSW - #527	2024-04-25 13:00:40 -07:00
Andrew Kane	3eef1ff5c2	Removed type-specific code from HNSW [skip ci]	2024-04-24 14:53:45 -07:00
Andrew Kane	0da6213a60	Moved type lookup to support functions - #527	2024-04-23 13:02:47 -07:00
Andrew Kane	f14c21748b	Added support function for l2_normalize [skip ci]	2024-04-22 18:36:47 -07:00
Andrew Kane	5ba62fca84	Fixed crash with shared_preload_libraries - fixes #460	2024-02-14 17:13:30 -08:00
Heikki Linnakangas	e5d1a6bdbb	Include reloptions.h directly in the .c files where it's needed There are no references to anything that's in reloptions.h in the header files. They need to include genam.h instead, which defines IndexScanDesc.	2024-01-23 13:02:24 +02:00
Andrew Kane	083008c21e	Added validation for GUC parameters	2024-01-22 23:55:30 -08:00
Andrew Kane	a1e526ef82	Dropped support for Postgres 11	2024-01-22 23:52:54 -08:00
Andrew Kane	2d0f162bd7	Added support for in-memory parallel index builds for HNSW Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>	2024-01-22 23:19:10 -08:00
Andrew Kane	dfee5d4045	Added support for on-disk parallel index builds for HNSW	2023-11-11 19:29:45 -08:00
Andrew Kane	b1f9519689	Get info from metapage to determine cost	2023-09-03 12:31:01 -07:00
Andrew Kane	034d4acaea	Removed comment [skip ci]	2023-09-02 18:23:08 -07:00
Andrew Kane	1a0d7bccc7	Updated min ef_search to 1 [skip ci]	2023-08-10 20:47:15 -07:00
Andrew Kane	51d292c93d	Added HNSW index type - #181	2023-08-08 16:42:47 -07:00

49 Commits