Commit Graph

5 Commits

Author SHA1 Message Date
Steve Yegge
c34b93fa1a Fix bd-160: Implement JSONL integrity validation and prevent export deduplication data loss
## Problem
Export deduplication feature broke when JSONL and export_hashes diverged
(e.g., after git pull/reset). This caused exports to skip issues that
weren't actually in the file, leading to silent data loss.

## Solution
1. JSONL integrity validation before every export
   - Store JSONL file hash after export
   - Validate hash before export, clear export_hashes if mismatch
   - Automatically recovers from git operations changing JSONL

2. Clear export_hashes on all imports
   - Prevents stale hashes from causing future export failures
   - Import operations invalidate export_hashes state

3. Add Storage interface methods:
   - GetJSONLFileHash/SetJSONLFileHash for integrity tracking
   - ClearAllExportHashes for recovery

## Tests Added
- TestJSONLIntegrityValidation: Unit tests for validation logic
- TestImportClearsExportHashes: Verifies imports clear hashes
- TestExportIntegrityAfterJSONLTruncation: Simulates git reset (would have caught bd-160)
- TestExportIntegrityAfterJSONLDeletion: Tests recovery from file deletion
- TestMultipleExportsStayConsistent: Tests repeated export integrity

## Follow-up
Created bd-179 epic for remaining integration test gaps (multi-repo sync,
daemon auto-sync, corruption recovery tests).

Closes bd-160
2025-10-29 21:57:15 -07:00
Steve Yegge
298d559407 Remove unreachable utility functions (bd-224, bd-214)
- Remove duplicate computeIssueContentHash from sqlite/hash.go
- Remove FileUsed() from internal/config/config.go
- Remove verifyIssueOpen() test helper from git_sync_test.go
- Remove unimplemented SummarizeTier2 and all tier2 infrastructure from haiku.go

Removes ~120 LOC of dead code identified by deadcode analyzer.

Amp-Thread-ID: https://ampcode.com/threads/T-5f150c35-8d67-4dae-bb92-a7b5887d649d
Co-authored-by: Amp <amp@ampcode.com>
2025-10-27 20:45:59 -07:00
Steve Yegge
571a596641 Fix export test failures with ClearAllExportHashes for test isolation
Amp-Thread-ID: https://ampcode.com/threads/T-5335d274-44e1-4811-b63f-15c52ea3394f
Co-authored-by: Amp <amp@ampcode.com>
2025-10-26 23:06:03 -07:00
Steve Yegge
a02729ea57 Complete bd-164: Fix timestamp-only export deduplication
- Add export_hashes table to track last exported content state
- Implement GetExportHash/SetExportHash in storage interface
- Update shouldSkipExport to use export_hashes instead of dirty_issues
- Call SetExportHash after successful export
- Clean up dirty_issues table (remove content_hash column)
- Simplify MarkIssueDirty functions (no longer compute hashes)
- Update markIssuesDirtyTx signature (remove store parameter)

Testing:
- Timestamp-only updates are skipped during export ✓
- Real content changes trigger export ✓
- export_hashes table populated correctly ✓

Fixes bd-159, bd-164
2025-10-26 20:35:37 -07:00
Steve Yegge
a898df6915 WIP: bd-164 timestamp-only export deduplication (~80% complete)
Implemented content hash-based deduplication to skip exports when only
timestamps changed. Core logic complete, needs export_hashes table wiring.

Completed:
- Added computeIssueContentHash() excluding timestamps
- Created shouldSkipExport() logic
- Updated export loop to skip timestamp-only changes
- Added hash.go with content hashing
- Extended Storage interface

Remaining:
- Complete export_hashes table migration
- Add SetExportHash/GetExportHash to interface
- Revert content_hash from dirty_issues approach
- Wire up hash persistence in export
- Testing

See bd-164 notes for details.

Amp-Thread-ID: https://ampcode.com/threads/T-d70657d1-4433-4f7e-b10a-3fccf8bf17fb
Co-authored-by: Amp <amp@ampcode.com>
2025-10-26 20:29:10 -07:00