pgvector

mirror of https://github.com/pgvector/pgvector.git synced 2026-07-22 03:57:34 +08:00

Author	SHA1	Message	Date
Andrew Kane	44d8d28b40	Added note about postgresql@17 formula [skip ci]	2024-09-27 13:39:54 -07:00
Andrew Kane	54fa16e3e3	Added safety check [skip ci]	2024-09-26 08:32:44 -07:00
Andrew Kane	46de265a24	Updated changelog [skip ci]	2024-09-25 16:03:58 -07:00
Andrew Kane	b8c27914d4	Improved test [skip ci]	2024-09-25 15:58:41 -07:00
Andrew Kane	2d85af51a8	Added test for IVFFlat costs [skip ci]	2024-09-25 15:53:34 -07:00
Andrew Kane	e0ad441306	Added test for costs [skip ci]	2024-09-25 15:49:03 -07:00
Andrew Kane	5776a4d937	Only adjust for TOAST [skip ci]	2024-09-25 15:39:56 -07:00
Andrew Kane	242a12b7d5	Added same cost adjustment to HNSW as IVFFlat since TOAST not included in seq scan cost - #682 [skip ci]	2024-09-25 15:33:57 -07:00
Andrew Kane	1370dd6e86	Removed unneeded floor and fixed comment formatting [skip ci]	2024-09-25 14:13:02 -07:00
Andrew Kane	a100dc67e5	Ran pgindent [skip ci]	2024-09-25 14:03:51 -07:00
Jonathan S. Katz	2df9f24aad	Update HNSW cost estimatation to utilize search and index info (#682 ) Previously, the cost estimation formula for a HNSW index scan utilized a methodology that only factored in the entry level for an HNSW scan and the "m" index parameter, which reflects the number of tuples (or vectors) to scan at each step of a HNSW graph traversal. While this would bias the PostgreSQL query planner to choose an HNSW index scan over other available paths, this could lead to potential suboptimal index selection, for example, choosing to use a HNSW index instead of an available B-tree index that has better selectivity. The number of tuples scanned during HNSW graph traversal is principally influenced by these factors: * The number of tuples stored in the index * `m` - the number of tuples that are scanned in each step of the graph traversal * `hnsw.ef_search` - which influences the total number of steps it takes for the scan to converge on the approximated nearest neighbors Through testing different source models for vectors, we also observed that the correlation of vectors in mdoels would impact this convergence. For this first iteration, we've opted to hardcode a constant scaling factor and set it to `0.55`, though a future commit may turn this into a configurable parameter. The high-level formula for estimating the cost of a HNSW index scan is as such: ``` (entryLevel * m) + (layer0TuplesMax * layer0Selectivity) ``` where - `(entryLevel * m)` is the lower bound of tuples to scan, as it accounts for the graph traversal to layer 0 (L0). (L1 and above has an ef=1) - `layer0TuplesMax` is an estimate of the maximum number of tuples to scan at L0. This accounts for tuples that may end up being discarded due to them already being visited. Testing shows that the number of steps until converge is similar to the value of `hnsw.ef_search`, thus we can estimate tuples max at `hnsw.ef_search * m * 2` - `layer0Selectivity` - estimates the percentage of tuples that will actually be scanned during the index traversal, multipled by the scaling factor In addition to the `m` build parameter and `hsnw.ef_search`, costs estimates can be influenced by standard PostgreSQL costing parameters, though adjusting those (e.g. `random_page_cost`) should be done with care. Co-authored-by: @ankane	2024-09-25 14:01:33 -07:00
Andrew Kane	8e979ed377	Do not adjust index selectivity based on probes [skip ci]	2024-09-25 13:48:24 -07:00
Andrew Kane	77b3d1f2a8	Added test for join with attribute filtering [skip ci]	2024-09-24 23:21:34 -07:00
Andrew Kane	ecd0738728	Improved test [skip ci]	2024-09-24 23:13:30 -07:00
Andrew Kane	62ffc3641c	Added test for join [skip ci]	2024-09-24 23:12:27 -07:00
Andrew Kane	87ac108bf7	Removed code for Postgres 12 [skip ci]	2024-09-23 15:26:31 -07:00
Andrew Kane	97cf990e0f	Free TupleDesc [skip ci]	2024-09-21 19:15:34 -07:00
Andrew Kane	55dc735e1a	Moved allocations out of GetScanItems [skip ci]	2024-09-21 19:10:25 -07:00
Andrew Kane	be4e9a9df2	Added macros for IvfflatScanList [skip ci]	2024-09-21 18:10:37 -07:00
Andrew Kane	d5e8fc96a5	Changed HnswPairingHeapNode to HnswSearchCandidate to reduce allocations and improve code	2024-09-21 12:07:44 -07:00
Andrew Kane	6d2af6d3f9	Improved code [skip ci]	2024-09-20 15:21:57 -07:00
Andrew Kane	a6ab5d07c0	Fixed CI	2024-09-19 20:50:51 -07:00
Andrew Kane	aa77346103	Improved code [skip ci]	2024-09-19 19:57:16 -07:00
Andrew Kane	b0da2d95d9	Fixed array_to_sparsevec on Windows [skip ci]	2024-09-19 19:52:16 -07:00
Andrew Kane	3fb05eb847	Added casts for arrays to sparsevec - #604 Co-authored-by: Narek Galstyan <narekg@berkeley.edu> Co-authored-by: Di Qi <di@lantern.dev>	2024-09-19 19:17:05 -07:00
Andrew Kane	b738ffecc1	Dropped support for Postgres 12	2024-09-19 18:13:54 -07:00
Heikki Linnakangas	7117513532	Add error codes to a few errors (#657 ) With elog(), you get XX000 "internal_error", which sounds scary. It's not self-evident what the right error codes for some of these errors are, but I tried to use my best judgment.	2024-09-19 18:04:23 -07:00
Andrew Kane	85d877d540	Updated changelog [skip ci]	2024-09-19 18:03:20 -07:00
Jonathan S. Katz	05fb382031	Swap max costing values to align with upstream guidance (#658 ) A feature targeted for PostgreSQL 18 (postgres/postgres@e2225346) that makes optimizations around disabled path nodes impacted pgvector such that PostgreSQL would choose to perform an index scan when it should have used a different scan (e.g. `SELECT count(*) FROM table`). Per upstream guidance[1], the recommendation is to switch to using `get_float8_infinity()`, which achieves the same behavior in backbranches, and can be adapated to work with the new behavior introduced in PostgreSQL 18. [1] https://www.postgresql.org/message-id/2281822.1724441531%40sss.pgh.pa.us	2024-09-19 18:01:59 -07:00
Andrew Kane	8e1853fbf3	Improved variable name [skip ci]	2024-09-19 15:09:40 -07:00
Andrew Kane	f9d68a061a	Simplified HnswLoadUnvisitedFromMemory [skip ci]	2024-09-19 04:39:46 -07:00
Andrew Kane	4f8ab574c9	Simplified CountElement [skip ci]	2024-09-19 04:32:38 -07:00
Andrew Kane	a15806196e	Keep scan-build happy	2024-09-19 04:02:09 -07:00
Andrew Kane	5c9429a0f8	Reduced memory usage for HNSW index scans	2024-09-19 03:27:35 -07:00
Andrew Kane	4b44d6e745	Updated changelog [skip ci]	2024-09-19 02:42:33 -07:00
Andrew Kane	16ca608f42	Updated AddToVisited to use HnswElementPtr	2024-09-19 02:41:20 -07:00
Andrew Kane	8dde14a736	Reduced memory usage for HNSW index scans Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>	2024-09-19 02:17:51 -07:00
Andrew Kane	d74d3065bc	Reduced allocations for pairing heap	2024-09-19 01:59:46 -07:00
Andrew Kane	a1b80faa67	Updated readme	2024-09-05 23:13:12 -07:00
Andrew Kane	4af5a127e0	Revert "Improved cleanup for IVFFlat index scans [skip ci]" This reverts commit `da7d3959a3`.	2024-09-02 01:52:28 -07:00
Andrew Kane	d02d71a398	Fixed CI	2024-09-02 01:44:03 -07:00
Andrew Kane	2aca04b8de	Updated links [skip ci]	2024-08-28 13:39:03 -07:00
Andrew Kane	e47984e616	Reset tuple sort for Postgres 12 [skip ci]	2024-08-24 22:10:26 -07:00
Andrew Kane	da7d3959a3	Improved cleanup for IVFFlat index scans [skip ci]	2024-08-24 21:59:44 -07:00
Andrew Kane	dadbbc3758	Renamed InitSortState to InitScanSortState [skip ci]	2024-08-24 21:53:15 -07:00
Andrew Kane	6af0a43d62	Added InitBuildSortState function [skip ci]	2024-08-24 21:50:31 -07:00
Andrew Kane	ffcb90d094	Added InitSortState function [skip ci]	2024-08-24 21:42:18 -07:00
Andrew Kane	8a312c3c8e	Added memory usage for IVFFlat index scans [skip ci]	2024-08-24 21:30:40 -07:00
Andrew Kane	5d86b177ab	Fixed -DIVFFLAT_MEMORY [skip ci]	2024-08-24 20:56:33 -07:00
Andrew Kane	ea99957fae	Added fields to IndexAmRoutine	2024-08-22 20:39:16 -07:00

1 2 3 4 5 ...

1630 Commits