Skip to content

(diversity) Refactor attribute diversity to use id trait#1186

Draft
narendatha wants to merge 66 commits into
mainfrom
u/narendatha/attribute_diversity_refactor
Draft

(diversity) Refactor attribute diversity to use id trait#1186
narendatha wants to merge 66 commits into
mainfrom
u/narendatha/attribute_diversity_refactor

Conversation

@narendatha

Copy link
Copy Markdown
Contributor

No description provided.

Mark Hildebrand and others added 30 commits April 28, 2026 14:47
Replace kind()-based string equality checks with explicit is_match()
and get() phase-shape helpers on plugin structs. This avoids fragile
ordering assumptions and makes each plugin responsible for recognising
its own phase shape.
Co-authored-by: Copilot <copilot@github.com>
…plugins

# Conflicts:
#	diskann-benchmark/src/backend/index/benchmarks.rs
#	diskann-benchmark/src/backend/index/product.rs
#	diskann-benchmark/src/backend/index/scalar.rs
#	diskann-benchmark/src/backend/index/search/plugins.rs
#	diskann-benchmark/src/backend/index/spherical.rs
#	diskann-benchmark/src/inputs/graph_index.rs
#	diskann-benchmark/src/inputs/mod.rs
Co-authored-by: Copilot <copilot@github.com>
…ojection issue

This commit explores approaches to wire real candidate vectors into async determinant-diversity post-processing.

Current state: IN COMPILATION ERROR (intentional for analysis)

Attempted approaches:
1. Initial shim-trait FullPrecisionVectorAccessor with async get_full_precision_vector()
   - Resulted in 'implementation not general enough' at search_with() call

2. Removed explicit for<'a> post_processor::DeterminantDiversity bound
   - Still fails - the constraint is inherent in search_with() signature itself

Root cause analysis:
- search_with() requires: PP: for<'a> SearchPostProcess<S::SearchAccessor<'a>, T, O>
- This means post-processor must work for ANY accessor lifetime 'a
- But query = queries.row(query_idx) is borrowed for specific loop iteration lifetime
- These are fundamentally incompatible - a borrowed value can't satisfy for<'a> generically

Compiler errors (3 total):
- 'not general enough': implementation needed for or<'a> but found specific '0
- 'does not live long enough': queries lifetime too short for 'static requirement

Files modified:
- diskann-benchmark/src/backend/index/benchmarks.rs:
  * Removed explicit for<'a> post_processor::DeterminantDiversity constraint
  * Narrowed plugin impl to FullPrecisionProvider<f32>

- diskann-benchmark/src/backend/index/post_processor/determinant_diversity.rs:
  * Added shim trait FullPrecisionVectorAccessor
  * Async method get_full_precision_vector(&mut self, id) -> impl Future<...>

Next steps to investigate:
- Move determinant-diversity outside search_with() as post-processing reranking
- This avoids HRTB entirely by applying after candidates are returned
- Benchmark impact: measure recall/QPS with external reranking vs baseline

Related context:
- Disk index determinant-diversity works correctly (uses real vectors, shows 51-53% QPS cost)
- Shared algorithm fixed (distance-to-similarity scoring direction)
- Branch already merged with origin/main
Co-authored-by: Copilot <copilot@github.com>
…educe duplication

- Use for<'a, 'b> SearchStrategy bound (user-provided fix) to break HRTB lifetime
  projection issue in the search_with post-processor constraint
- Wire FullPrecisionVectorAccessor shim trait so async det-div post-processor fetches
  real candidate vectors instead of placeholder distances
- Populate QPS/latency metrics in async det-div benchmark path (previously all 'missing')
- Extract run_topk_timed helper to eliminate ~100 lines of duplicated loop/timing/recall
  machinery from DeterminantDiversity::run
- Update async-determinant-diversity.json example tag (async-index-build -> graph-index-build)
- Fix clippy::manual_async_fn in FullPrecisionVectorAccessor shim trait
- Use for<'a, 'b> SearchStrategy bound to resolve HRTB lifetime mismatch
- Wire FullPrecisionVectorAccessor shim trait for async det-div to fetch real vectors
- Implement DeterminantDiversity post-processor for async graph-index path
- Extract run_topk_timed helper to eliminate ~100 lines of code duplication
- Wire post_processor parameter to disk-index search pipeline
- Update search parameter handling and result counting for post-processed results
- Add TopkPostProcessor input type and necessary imports
- Populate QPS/latency metrics in async det-div benchmark path
Co-authored-by: Copilot <copilot@github.com>
- Add 4th type parameter PP to KNN with default of ()
- Add with_postprocessor() constructor for post-processor support
- Keep new() for standard KNN without post-processing
- Add accessor methods for index, queries, strategy, post_processor
- Update Search impl to work with generic KNN<DP, T, S, ()>
- Maintains backward compatibility - all existing code continues to work
- Provides foundation for DeterminantDiversity plugin refactoring
narendatha added 25 commits May 18, 2026 20:21
…plugins

# Conflicts:
#	diskann-benchmark/src/backend/index/search/knn.rs
#	diskann-benchmark/src/inputs/graph_index.rs
#	diskann-benchmark/src/inputs/mod.rs
…rrors

Indented blocks in /// comments are interpreted as Rust doctests by rustdoc; the formula lines (e.g. 'alpha_i = ...', 'similarity(d) = ...') were being compiled and failed in CI. Wrap them in 	ext fences so they remain prose.
…plugins

# Conflicts:
#	diskann-disk/src/search/provider/disk_provider.rs
…plugins

# Conflicts:
#	diskann-benchmark/src/inputs/graph_index.rs
…rix-based interface

- Updated diskann-providers determinant_diversity to accept MutMatrixView instead of Vec<(Id, f32, Vec<f32>)>
- Maintains identical algorithmic behavior and recall across all test datasets (SIFT Small, Enron)
- Improves memory layout efficiency with contiguous distance/ID arrays
- Updated all call sites: diskann-benchmark inmem and diskann-disk providers
- Verified with comprehensive A/B benchmarks:
  * Recall parity: 100% match vs archived baselines
  * SIFT Small (256 vectors): -45% QPS (small dataset regression)
  * Enron (46,750 vectors): -5% to +13% QPS (larger dataset improvement)
- All tests passing, no panics or runtime errors
Replaces the previous post-processor sub-enum pattern with a dedicated
TopkDeterminantDiversity variant on the SearchPhase enum, following the
established pattern for Range, BetaFilter, and MultiHopFilter variants.

- Add TopkDeterminantDiversityPhase struct with validate() that reuses
  DeterminantDiversityParams::new for power/eta validation
- Add TopkDeterminantDiversity variants to SearchPhase and SearchPhaseKind
- Wire plugins::DeterminantDiversity to use the new variant
- Remove now-redundant post_processor field from TopkSearchPhase
- Update example config (async-determinant-diversity.json) to new format
- DiskSearchPhase intentionally unchanged (separate code path)

Addresses PR #1011 review comment r3390077824.
…r dispatch

Replace the two Search impls on KNN (one for () default, one for explicit PP) and the Option<PP> + expect(...) panic with a single AsPostProcessor trait. Defaulted resolves to the strategy's default post-processor; Forwarded<PP> carries an explicit one. KNN::new returns KNN<.., Defaulted>; KNN::with_postprocessor returns KNN<.., Forwarded<PP>>. The ConfiguredPostProcessor marker trait is removed.
…bility

Remove the brittle hardcoded checks in JSON-input validation that bailed when topk-determinant-diversity was paired with non-Float32 / PQ / SQ / spherical / streaming graph builds. The plugin registry already provides the same compatibility guarantee at run time (only Float32 FullPrecision registers DeterminantDiversity), and produces a generic 'Unsupported search phase' error listing the kinds the backend actually supports.
Adds test_disk_search_determinant_diversity exercising the new SearchPostProcessorKind::DeterminantDiversity path through DiskIndexSearcher::search end-to-end. The test asserts: result count matches k, all selected ids are a subset of the L=20 candidate pool, no duplicates, top-1 matches the nearest neighbor (greedy invariant), pure-greedy (eta=0) preserves the candidate-pool subset invariant, and the vector_filter is honored by DeterminantDiversityAndFilter.
…plugins

# Conflicts:
#	diskann-benchmark/src/backend/index/benchmarks.rs
#	diskann-benchmark/src/inputs/graph_index.rs
…plugins

# Conflicts:
#	diskann-benchmark/src/inputs/graph_index.rs
…plugins

# Conflicts:
#	diskann-benchmark/src/disk_index/search.rs
#	diskann-benchmark/src/index/benchmarks.rs
#	diskann-benchmark/src/index/search/plugins.rs
…plugins

# Conflicts:
#	diskann-benchmark-core/src/search/graph/knn.rs
…rait

Replace the AttributeValueProvider trait with a new DiverseId trait that exposes the attribute directly from the id. DiverseNeighborQueue is now generic over I: DiverseId and stores ids directly, dropping the Arc provider field. DiverseSearchParams becomes non-generic and the Diverse search adds a DP::InternalId: DiverseId bound. Unit tests rewritten around a TestId type.
@narendatha narendatha changed the base branch from main to u/narendatha/det_div_plugins June 18, 2026 15:12
@narendatha narendatha force-pushed the u/narendatha/attribute_diversity_refactor branch from b06180f to d4930c1 Compare June 18, 2026 15:55
Base automatically changed from u/narendatha/det_div_plugins to main June 20, 2026 09:42
…e_diversity_refactor

# Conflicts:
#	diskann-benchmark/src/disk_index/search.rs
#	diskann-benchmark/src/index/benchmarks.rs
#	diskann-disk/src/search/provider/disk_provider.rs
#	diskann-providers/src/model/graph/provider/async_/inmem/mod.rs
#	diskann-providers/src/model/graph/provider/determinant_diversity.rs
#	diskann-tools/src/utils/search_disk_index.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants