hfe_knn/CLAUDE.md
2025-08-27 13:13:20 +08:00

65 lines
2.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This project implements a K-Nearest Neighbors (KNN) algorithm using Fully Homomorphic Encryption (FHE) in Rust. The implementation uses TFHE-rs for cryptographic operations and operates on a 10-dimensional synthetic dataset with 100 training points.
## Key Architecture
- **Homomorphic Encryption**: Uses TFHE-rs library for fully homomorphic encryption operations
- **Data Processing**: Synthetic dataset features are scaled by 10x to preserve decimal precision as integers
- **KNN Implementation**: Complete implementation with multiple algorithms:
- `euclidean_distance()`: Optimized distance calculation using precomputed squares
- `perform_knn_selection()`: Supports selection sort, bitonic sort, and heap-based selection
- `encrypted_bitonic_sort()`: Parallel bitonic sort with power-of-2 padding
## Build and Development Commands
```bash
# Build the project
cargo build
# Run with different algorithms
cargo run --bin enc # Default selection sort
cargo run --bin enc -- --algorithm=bitonic # Bitonic sort (fastest for large datasets)
cargo run --bin enc -- --algorithm=heap # Heap-based selection
cargo run --bin enc -- --debug # Debug mode with plaintext verification
cargo run --bin plain # Plaintext version for comparison
# Development commands - ALWAYS use cargo check for verification
cargo check # Use this for code verification, NOT cargo run
cargo test
cargo fmt
cargo clippy
```
## Data Structure
The project processes synthetic 10-dimensional dataset with these key data structures:
- `EncryptedQuery`: Query point with precomputed values for optimization
- `EncryptedPoint`: Training data points with precomputed squared sums
- `EncryptedNeighbor`: Distance and index pairs for KNN results
- Custom deserializer converts float values to scaled integers (×10) for FHE compatibility
## Dataset
- **Training Data**: `dataset/train.jsonl` containing one query point and 100 10-dimensional training points
- **Results**: `dataset/answer.jsonl` and `dataset/answer1.jsonl` contain KNN classification results in JSON format
## Important Technical Notes
- **FheInt14 Range**: Valid range is -8192 to 8191 (2^13). Using values outside this range (like i16::MAX = 32767) will cause overflow
- **Bitonic Sort**: Requires `up=true` for ascending order to get smallest distances first. Using `false` gives largest distances (wrong for KNN)
- **Performance**: Bitonic sort is fastest for larger datasets due to parallel processing, but requires power-of-2 padding
## Git Workflow Instructions
**IMPORTANT**: When user asks to "write commit" or "帮我写commit":
- Do NOT add any files to staging area
- User has already staged the files they want to commit
- Only create the commit with appropriate message for the staged changes