hfe_knn/CLAUDE.md
2025-08-27 13:13:20 +08:00

2.9 KiB
Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This project implements a K-Nearest Neighbors (KNN) algorithm using Fully Homomorphic Encryption (FHE) in Rust. The implementation uses TFHE-rs for cryptographic operations and operates on a 10-dimensional synthetic dataset with 100 training points.

Key Architecture

  • Homomorphic Encryption: Uses TFHE-rs library for fully homomorphic encryption operations
  • Data Processing: Synthetic dataset features are scaled by 10x to preserve decimal precision as integers
  • KNN Implementation: Complete implementation with multiple algorithms:
    • euclidean_distance(): Optimized distance calculation using precomputed squares
    • perform_knn_selection(): Supports selection sort, bitonic sort, and heap-based selection
    • encrypted_bitonic_sort(): Parallel bitonic sort with power-of-2 padding

Build and Development Commands

# Build the project
cargo build

# Run with different algorithms
cargo run --bin enc                              # Default selection sort
cargo run --bin enc -- --algorithm=bitonic       # Bitonic sort (fastest for large datasets)
cargo run --bin enc -- --algorithm=heap          # Heap-based selection
cargo run --bin enc -- --debug                   # Debug mode with plaintext verification
cargo run --bin plain                            # Plaintext version for comparison

# Development commands - ALWAYS use cargo check for verification
cargo check    # Use this for code verification, NOT cargo run
cargo test
cargo fmt
cargo clippy

Data Structure

The project processes synthetic 10-dimensional dataset with these key data structures:

  • EncryptedQuery: Query point with precomputed values for optimization
  • EncryptedPoint: Training data points with precomputed squared sums
  • EncryptedNeighbor: Distance and index pairs for KNN results
  • Custom deserializer converts float values to scaled integers (×10) for FHE compatibility

Dataset

  • Training Data: dataset/train.jsonl containing one query point and 100 10-dimensional training points
  • Results: dataset/answer.jsonl and dataset/answer1.jsonl contain KNN classification results in JSON format

Important Technical Notes

  • FheInt14 Range: Valid range is -8192 to 8191 (2^13). Using values outside this range (like i16::MAX = 32767) will cause overflow
  • Bitonic Sort: Requires up=true for ascending order to get smallest distances first. Using false gives largest distances (wrong for KNN)
  • Performance: Bitonic sort is fastest for larger datasets due to parallel processing, but requires power-of-2 padding

Git Workflow Instructions

IMPORTANT: When user asks to "write commit" or "帮我写commit":

  • Do NOT add any files to staging area
  • User has already staged the files they want to commit
  • Only create the commit with appropriate message for the staged changes