pgvector

mirror of https://github.com/pgvector/pgvector.git synced 2026-07-04 03:30:56 +08:00

Author	SHA1	Message	Date
Andrew Kane	ba8e29600b	Added todo [skip ci]	2024-09-28 15:24:21 -07:00
Andrew Kane	9b42662188	Only adjust cost if scanning less than half of the tuples [skip ci]	2024-09-28 12:21:20 -07:00
Andrew Kane	ab57217f48	Added todo [skip ci]	2024-09-28 10:07:09 -07:00
Andrew Kane	40c3e402c7	Removed todo [skip ci]	2024-09-25 17:29:20 -07:00
Andrew Kane	058248fdcc	Improved cost code [skip ci]	2024-09-25 17:23:29 -07:00
Andrew Kane	73c5145b77	Use int for ef [skip ci]	2024-09-25 16:52:41 -07:00
Andrew Kane	ec4a23fe49	Added cost estimation [skip ci]	2024-09-25 16:45:04 -07:00
Andrew Kane	38207f5640	Merge branch 'master' into hnsw-streaming	2024-09-25 16:09:09 -07:00
Andrew Kane	5776a4d937	Only adjust for TOAST [skip ci]	2024-09-25 15:39:56 -07:00
Andrew Kane	242a12b7d5	Added same cost adjustment to HNSW as IVFFlat since TOAST not included in seq scan cost - #682 [skip ci]	2024-09-25 15:33:57 -07:00
Andrew Kane	1370dd6e86	Removed unneeded floor and fixed comment formatting [skip ci]	2024-09-25 14:13:02 -07:00
Andrew Kane	a100dc67e5	Ran pgindent [skip ci]	2024-09-25 14:03:51 -07:00
Jonathan S. Katz	2df9f24aad	Update HNSW cost estimatation to utilize search and index info (#682 ) Previously, the cost estimation formula for a HNSW index scan utilized a methodology that only factored in the entry level for an HNSW scan and the "m" index parameter, which reflects the number of tuples (or vectors) to scan at each step of a HNSW graph traversal. While this would bias the PostgreSQL query planner to choose an HNSW index scan over other available paths, this could lead to potential suboptimal index selection, for example, choosing to use a HNSW index instead of an available B-tree index that has better selectivity. The number of tuples scanned during HNSW graph traversal is principally influenced by these factors: * The number of tuples stored in the index * `m` - the number of tuples that are scanned in each step of the graph traversal * `hnsw.ef_search` - which influences the total number of steps it takes for the scan to converge on the approximated nearest neighbors Through testing different source models for vectors, we also observed that the correlation of vectors in mdoels would impact this convergence. For this first iteration, we've opted to hardcode a constant scaling factor and set it to `0.55`, though a future commit may turn this into a configurable parameter. The high-level formula for estimating the cost of a HNSW index scan is as such: ``` (entryLevel * m) + (layer0TuplesMax * layer0Selectivity) ``` where - `(entryLevel * m)` is the lower bound of tuples to scan, as it accounts for the graph traversal to layer 0 (L0). (L1 and above has an ef=1) - `layer0TuplesMax` is an estimate of the maximum number of tuples to scan at L0. This accounts for tuples that may end up being discarded due to them already being visited. Testing shows that the number of steps until converge is similar to the value of `hnsw.ef_search`, thus we can estimate tuples max at `hnsw.ef_search * m * 2` - `layer0Selectivity` - estimates the percentage of tuples that will actually be scanned during the index traversal, multipled by the scaling factor In addition to the `m` build parameter and `hsnw.ef_search`, costs estimates can be influenced by standard PostgreSQL costing parameters, though adjusting those (e.g. `random_page_cost`) should be done with care. Co-authored-by: @ankane	2024-09-25 14:01:33 -07:00
Andrew Kane	495041e43b	Added option to limit tuples [skip ci]	2024-09-22 18:10:19 -07:00
Andrew Kane	80cbd32dab	Added streaming option for HNSW	2024-09-22 12:02:48 -07:00
Andrew Kane	b738ffecc1	Dropped support for Postgres 12	2024-09-19 18:13:54 -07:00
Jonathan S. Katz	05fb382031	Swap max costing values to align with upstream guidance (#658 ) A feature targeted for PostgreSQL 18 (postgres/postgres@e2225346) that makes optimizations around disabled path nodes impacted pgvector such that PostgreSQL would choose to perform an index scan when it should have used a different scan (e.g. `SELECT count(*) FROM table`). Per upstream guidance[1], the recommendation is to switch to using `get_float8_infinity()`, which achieves the same behavior in backbranches, and can be adapated to work with the new behavior introduced in PostgreSQL 18. [1] https://www.postgresql.org/message-id/2281822.1724441531%40sss.pgh.pa.us	2024-09-19 18:01:59 -07:00
Andrew Kane	ea99957fae	Added fields to IndexAmRoutine	2024-08-22 20:39:16 -07:00
Andrew Kane	61870a0244	Fixed compilation warning with MSVC and Postgres 16 - fixes #598 Co-authored-by: Xing Guo <higuoxing@gmail.com>	2024-06-16 12:09:01 -07:00
Andrew Kane	58ec5296b0	Reduced support functions for HNSW - #527	2024-04-25 13:21:24 -07:00
Andrew Kane	47d5b2896e	Improved support functions for HNSW - #527	2024-04-25 13:00:40 -07:00
Andrew Kane	3eef1ff5c2	Removed type-specific code from HNSW [skip ci]	2024-04-24 14:53:45 -07:00
Andrew Kane	0da6213a60	Moved type lookup to support functions - #527	2024-04-23 13:02:47 -07:00
Andrew Kane	f14c21748b	Added support function for l2_normalize [skip ci]	2024-04-22 18:36:47 -07:00
Andrew Kane	5ba62fca84	Fixed crash with shared_preload_libraries - fixes #460	2024-02-14 17:13:30 -08:00
Heikki Linnakangas	e5d1a6bdbb	Include reloptions.h directly in the .c files where it's needed There are no references to anything that's in reloptions.h in the header files. They need to include genam.h instead, which defines IndexScanDesc.	2024-01-23 13:02:24 +02:00
Andrew Kane	083008c21e	Added validation for GUC parameters	2024-01-22 23:55:30 -08:00
Andrew Kane	a1e526ef82	Dropped support for Postgres 11	2024-01-22 23:52:54 -08:00
Andrew Kane	2d0f162bd7	Added support for in-memory parallel index builds for HNSW Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>	2024-01-22 23:19:10 -08:00
Andrew Kane	dfee5d4045	Added support for on-disk parallel index builds for HNSW	2023-11-11 19:29:45 -08:00
Andrew Kane	b1f9519689	Get info from metapage to determine cost	2023-09-03 12:31:01 -07:00
Andrew Kane	034d4acaea	Removed comment [skip ci]	2023-09-02 18:23:08 -07:00
Andrew Kane	1a0d7bccc7	Updated min ef_search to 1 [skip ci]	2023-08-10 20:47:15 -07:00
Andrew Kane	51d292c93d	Added HNSW index type - #181	2023-08-08 16:42:47 -07:00

34 Commits