pgvector

mirror of https://github.com/pgvector/pgvector.git synced 2026-07-22 03:57:34 +08:00

Author	SHA1	Message	Date
Andrew Kane	2cbd08b6c0	Moved unions and macros [skip ci]	2024-10-10 09:41:26 -07:00
Andrew Kane	fa6782985a	Added HnswQuery struct for query data	2024-10-09 23:45:47 -07:00
Andrew Kane	32ab27d72a	Added HnswSupport struct for support functions	2024-10-09 23:10:26 -07:00
Andrew Kane	064db12de7	Moved procinfo initialization for inserts [skip ci]	2024-10-09 21:59:21 -07:00
Andrew Kane	45a6eef9e0	Improved variable name [skip ci]	2024-10-09 21:52:10 -07:00
Andrew Kane	17266ed409	Use inMemory for conditionals	2024-10-09 21:49:32 -07:00
Andrew Kane	a98534e5ab	DRY HNSW procinfo	2024-10-09 21:03:18 -07:00
Andrew Kane	57c05c59a2	DRY code for forming index value	2024-10-09 20:50:17 -07:00
Andrew Kane	3126fbdb6f	Use double for distance [skip ci]	2024-10-09 17:04:25 -07:00
Andrew Kane	f4b67b078f	DRY HNSW distance calculations	2024-10-09 17:01:49 -07:00
Andrew Kane	77688b4309	Improve total cost for cost estimation (#686 )	2024-10-08 12:42:03 -07:00
Andrew Kane	d5f4a0e435	Fixed memory context leak in HnswUpdateNeighborsOnDisk - fixes #692	2024-10-08 12:21:26 -07:00
Andrew Kane	57248ba128	Use separate memory context for updating neighbors, which improves performance around 10% for larger vectors	2024-09-30 11:15:27 -07:00
Andrew Kane	ff6da4fcea	Moved logic to get update neighbor on disk to separate function	2024-09-30 10:30:01 -07:00
Andrew Kane	a8b4b6675a	Moved logic to get update index to separate function	2024-09-30 10:14:52 -07:00
Andrew Kane	d148b4e61b	Fixed insert logic	2024-09-30 09:59:12 -07:00
Andrew Kane	658d74e2f6	Use Size for memory [skip ci]	2024-09-29 23:48:58 -07:00
Andrew Kane	7ba593c492	Improved SelectNeighbors signature [skip ci]	2024-09-29 23:03:02 -07:00
Andrew Kane	525e3b81e1	Improved HnswUpdateConnection parameters [skip ci]	2024-09-29 19:47:25 -07:00
Andrew Kane	8eb8cdf0f3	Moved insert-specific code to hnswinsert.c	2024-09-29 19:44:11 -07:00
Andrew Kane	4c72f91206	Improved variable name [skip ci]	2024-09-29 19:26:15 -07:00
Andrew Kane	4ac86f62a1	Improved variable names [skip ci]	2024-09-29 19:22:35 -07:00
Andrew Kane	648dd8af78	Moved LoadElementsForInsert to separate function and removed unused code path	2024-09-29 19:12:38 -07:00
Andrew Kane	ee43ee9b16	Use HnswLoadNeighborTids for inserts	2024-09-29 18:52:12 -07:00
Andrew Kane	5ce367e18b	Removed lc from HnswUpdateConnection [skip ci]	2024-09-29 18:18:42 -07:00
Andrew Kane	f371eb119b	Removed lc from SelectNeighbors [skip ci]	2024-09-29 18:14:28 -07:00
Andrew Kane	382a25aefb	Split loading neighbor TIDs into separate function [skip ci]	2024-09-29 17:20:54 -07:00
Andrew Kane	0b6214aad6	Moved HnswLoadNeighbors to hnswinsert.c [skip ci]	2024-09-29 15:49:01 -07:00
Andrew Kane	f2afd11257	Use sc for search candidates [skip ci]	2024-09-29 15:09:54 -07:00
Andrew Kane	cae3458329	Updated distance to use double	2024-09-29 15:06:50 -07:00
Andrew Kane	dc23752618	Fixed uninitialized variable [skip ci]	2024-09-28 19:18:52 -07:00
Andrew Kane	54fa16e3e3	Added safety check [skip ci]	2024-09-26 08:32:44 -07:00
Andrew Kane	5776a4d937	Only adjust for TOAST [skip ci]	2024-09-25 15:39:56 -07:00
Andrew Kane	242a12b7d5	Added same cost adjustment to HNSW as IVFFlat since TOAST not included in seq scan cost - #682 [skip ci]	2024-09-25 15:33:57 -07:00
Andrew Kane	1370dd6e86	Removed unneeded floor and fixed comment formatting [skip ci]	2024-09-25 14:13:02 -07:00
Andrew Kane	a100dc67e5	Ran pgindent [skip ci]	2024-09-25 14:03:51 -07:00
Jonathan S. Katz	2df9f24aad	Update HNSW cost estimatation to utilize search and index info (#682 ) Previously, the cost estimation formula for a HNSW index scan utilized a methodology that only factored in the entry level for an HNSW scan and the "m" index parameter, which reflects the number of tuples (or vectors) to scan at each step of a HNSW graph traversal. While this would bias the PostgreSQL query planner to choose an HNSW index scan over other available paths, this could lead to potential suboptimal index selection, for example, choosing to use a HNSW index instead of an available B-tree index that has better selectivity. The number of tuples scanned during HNSW graph traversal is principally influenced by these factors: * The number of tuples stored in the index * `m` - the number of tuples that are scanned in each step of the graph traversal * `hnsw.ef_search` - which influences the total number of steps it takes for the scan to converge on the approximated nearest neighbors Through testing different source models for vectors, we also observed that the correlation of vectors in mdoels would impact this convergence. For this first iteration, we've opted to hardcode a constant scaling factor and set it to `0.55`, though a future commit may turn this into a configurable parameter. The high-level formula for estimating the cost of a HNSW index scan is as such: ``` (entryLevel * m) + (layer0TuplesMax * layer0Selectivity) ``` where - `(entryLevel * m)` is the lower bound of tuples to scan, as it accounts for the graph traversal to layer 0 (L0). (L1 and above has an ef=1) - `layer0TuplesMax` is an estimate of the maximum number of tuples to scan at L0. This accounts for tuples that may end up being discarded due to them already being visited. Testing shows that the number of steps until converge is similar to the value of `hnsw.ef_search`, thus we can estimate tuples max at `hnsw.ef_search * m * 2` - `layer0Selectivity` - estimates the percentage of tuples that will actually be scanned during the index traversal, multipled by the scaling factor In addition to the `m` build parameter and `hsnw.ef_search`, costs estimates can be influenced by standard PostgreSQL costing parameters, though adjusting those (e.g. `random_page_cost`) should be done with care. Co-authored-by: @ankane	2024-09-25 14:01:33 -07:00
Andrew Kane	8e979ed377	Do not adjust index selectivity based on probes [skip ci]	2024-09-25 13:48:24 -07:00
Andrew Kane	87ac108bf7	Removed code for Postgres 12 [skip ci]	2024-09-23 15:26:31 -07:00
Andrew Kane	97cf990e0f	Free TupleDesc [skip ci]	2024-09-21 19:15:34 -07:00
Andrew Kane	55dc735e1a	Moved allocations out of GetScanItems [skip ci]	2024-09-21 19:10:25 -07:00
Andrew Kane	be4e9a9df2	Added macros for IvfflatScanList [skip ci]	2024-09-21 18:10:37 -07:00
Andrew Kane	d5e8fc96a5	Changed HnswPairingHeapNode to HnswSearchCandidate to reduce allocations and improve code	2024-09-21 12:07:44 -07:00
Andrew Kane	6d2af6d3f9	Improved code [skip ci]	2024-09-20 15:21:57 -07:00
Andrew Kane	a6ab5d07c0	Fixed CI	2024-09-19 20:50:51 -07:00
Andrew Kane	aa77346103	Improved code [skip ci]	2024-09-19 19:57:16 -07:00
Andrew Kane	b0da2d95d9	Fixed array_to_sparsevec on Windows [skip ci]	2024-09-19 19:52:16 -07:00
Andrew Kane	3fb05eb847	Added casts for arrays to sparsevec - #604 Co-authored-by: Narek Galstyan <narekg@berkeley.edu> Co-authored-by: Di Qi <di@lantern.dev>	2024-09-19 19:17:05 -07:00
Andrew Kane	b738ffecc1	Dropped support for Postgres 12	2024-09-19 18:13:54 -07:00
Heikki Linnakangas	7117513532	Add error codes to a few errors (#657 ) With elog(), you get XX000 "internal_error", which sounds scary. It's not self-evident what the right error codes for some of these errors are, but I tried to use my best judgment.	2024-09-19 18:04:23 -07:00

1 2 3 4 5 ...

796 Commits