feat(sync): pull-first sync with 3-way merge (#918)

* feat(sync): implement pull-first synchronization strategy

- Add --pull-first flag and logic to sync command
- Introduce 3-way merge stub for issue synchronization
- Add concurrent edit tests for the pull-first flow

Ensures local changes are reconciled with remote updates before pushing to prevent data loss.

* feat(sync): implement 3-way merge and state tracking

- Implement 3-way merge algorithm for issue synchronization
- Add base state storage to track changes between syncs
- Add comprehensive tests for merge logic and persistence

Ensures data consistency and prevents data loss during concurrent
issue updates.

* feat(sync): implement field-level conflict merging

- Implement field-level merge logic for issue conflicts
- Add unit tests for field-level merge strategies

Reduces manual intervention by automatically resolving overlapping updates at the field level.

* refactor(sync): simplify sync flow by removing ZFC checks

The previous sync implementation relied on Zero-False-Convergence (ZFC)
staleness checks which are redundant following the transition to
structural 3-way merging. This legacy logic added complexity and
maintenance overhead without providing additional safety.

This commit introduces a streamlined sync pipeline:
- Remove ZFC staleness validation from primary sync flow
- Update safety documentation to reflect current merge strategy
- Eliminate deprecated unit tests associated with ZFC logic

These changes reduce codebase complexity while maintaining data
integrity through the robust structural 3-way merge implementation.

* feat(sync): default to pull-first sync workflow

- Set pull-first as the primary synchronization workflow
- Refactor core sync logic for better maintainability
- Update concurrent edit tests to validate 3-way merge logic

Reduces merge conflicts by ensuring local state is current before pushing changes.

* refactor(sync): clean up lint issues in merge code

- Remove unused error return from MergeIssues (never returned error)
- Use _ prefix for unused _base parameter in mergeFieldLevel
- Update callers to not expect error from MergeIssues
- Keep nolint:gosec for trusted internal file path

* test(sync): add mode compatibility and upgrade safety tests

Add tests addressing Steve's PR #918 review concerns:

- TestSyncBranchModeWithPullFirst: Verifies sync-branch config
  storage and git branch creation work with pull-first
- TestExternalBeadsDirWithPullFirst: Verifies external BEADS_DIR
  detection and pullFromExternalBeadsRepo
- TestUpgradeFromOldSync: Validates upgrade safety when
  sync_base.jsonl doesn't exist (first sync after upgrade)
- TestMergeIssuesWithBaseState: Comprehensive 3-way merge cases
- TestLabelUnionMerge: Verifies labels use union (no data loss)

Key upgrade behavior validated:
- base=nil (no sync_base.jsonl) safely handles all cases
- Local-only issues kept (StrategyLocal)
- Remote-only issues kept (StrategyRemote)
- Overlapping issues merged (LWW scalars, union labels)

* fix(sync): report line numbers for malformed JSON

Problem:
- JSON decoding errors when loading sync base state lacked line numbers
- Difficult to identify location of syntax errors in large state files

Solution:
- Include line number reporting in JSON decoder errors during state loading
- Add regression tests for malformed sync base file scenarios

Impact:
- Users receive actionable feedback for corrupted state files
- Faster troubleshooting of manual configuration errors

* fix(sync): warn on large clock skew during sync

Problem:
- Unsynchronized clocks between systems could lead to silent merge errors
- No mechanism existed to alert users of significant timestamp drift

Solution:
- Implement clock skew detection during sync merge
- Log a warning when large timestamp differences are found
- Add comprehensive unit tests for skew reporting

Impact:
- Users are alerted to potential synchronization risks
- Easier debugging of time-related merge issues

* fix(sync): defer state update until remote push succeeds

Problem:
- Base state updated before confirming remote push completion
- Failed pushes resulted in inconsistent local state tracking

Solution:
- Defer base state update until after the remote push succeeds

Impact:
- Ensures local state accurately reflects remote repository status
- Prevents state desynchronization during network or push failures

* fix(sync): prevent concurrent sync operations

Problem:
- Multiple sync processes could run simultaneously
- Overlapping operations risk data corruption and race conditions

Solution:
- Implement file-based locking using gofrs/flock
- Add integration tests to verify locking behavior

Impact:
- Guarantees execution of a single sync process at a time
- Eliminates potential for data inconsistency during sync

* docs: document sync architecture and merge model

- Detail the 3-way merge model logic
- Describe the core synchronization architecture principles

* fix(lint): explicitly ignore lock.Unlock return value

errcheck linter flagged bare defer lock.Unlock() calls. Wrap in
anonymous function with explicit _ assignment to acknowledge
intentional ignore of unlock errors during cleanup.

* fix(lint): add sync_merge.go to G304 exclusions

The loadBaseState and saveBaseState functions use file paths derived
from trusted internal sources (beadsDir parameter from config). Add
to existing G304 exclusion list for safe JSONL file operations.

* feat(sync): integrate sync-branch into pull-first flow

When sync.branch is configured, doPullFirstSync now:
- Calls PullFromSyncBranch before merge
- Calls CommitToSyncBranch after export

This ensures sync-branch mode uses the correct branch for
pull/push operations.

* test(sync): add E2E tests for sync-branch and external BEADS_DIR

Adds comprehensive end-to-end tests:
- TestSyncBranchE2E: verifies pull→merge→commit flow with remote changes
- TestExternalBeadsDirE2E: verifies sync with separate beads repository
- TestExternalBeadsDirDetection: edge cases for repo detection
- TestCommitToExternalBeadsRepo: commit handling

* refactor(sync): remove unused rollbackJSONLFromGit

Function was defined but never called. Pull-first flow saves base
state after successful push, making this safety net unnecessary.

* test(sync): add export-only mode E2E test

Add TestExportOnlySync to cover --no-pull flag which was the only
untested sync mode. This completes full mode coverage:

- Normal (pull-first): sync_test.go, sync_merge_test.go
- Sync-branch: sync_modes_test.go:TestSyncBranchE2E (PR#918)
- External BEADS_DIR: sync_external_test.go (PR#918)
- From-main: sync_branch_priority_test.go
- Local-only: sync_local_only_test.go
- Export-only: sync_modes_test.go:TestExportOnlySync (this commit)

Refs: #911

* docs(sync): add sync modes reference section

Document all 6 sync modes with triggers, flows, and use cases.
Include mode selection decision tree and test coverage matrix.

Co-authored-by: Claude <noreply@anthropic.com>

* test(sync): upgrade sync-branch E2E tests to bare repo

- Replace mocked repository with real bare repo setup
- Implement multi-machine simulation in sync tests
- Refactor test logic to handle distributed states

Coverage: sync-branch end-to-end scenarios

* test(sync): add daemon sync-branch E2E tests

- Implement E2E tests for daemon sync-branch flow
- Add test cases for force-overwrite scenarios

Coverage: daemon sync-branch workflow in cmd/bd

* docs(sync): document sync-branch paths and E2E architecture

- Describe sync-branch CLI and Daemon execution flow
- Document the end-to-end test architecture

* build(nix): update vendorHash for gofrs/flock dependency

New dependency added for file-based sync locking changes the
Go module checksum.

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Peter Chanthamynavong
2026-01-07 21:27:20 -08:00
committed by GitHub
parent e0b613d5b1
commit 1561374c04
14 changed files with 4188 additions and 925 deletions

308
docs/SYNC.md Normal file
View File

@@ -0,0 +1,308 @@
# Sync Architecture
This document explains the design decisions behind `bd sync` - why it works the way it does, and the problems each design choice solves.
> **Looking for something else?**
> - Command usage: [commands/sync.md](/commands/sync.md) (Reference)
> - Troubleshooting: [website/docs/recovery/sync-failures.md](/website/docs/recovery/sync-failures.md) (How-To)
> - Deletion behavior: [docs/DELETIONS.md](/docs/DELETIONS.md) (Explanation)
## Why Pull-First?
The core problem: if you export local state before seeing what's on the remote, you commit to a snapshot that may conflict with changes you haven't seen yet. Any changes that arrive during pull get imported to the database but never make it back to the exported JSONL — they're silently lost on the next push.
Pull-first sync solves this by reversing the order:
```
Machine A: Create bd-43, sync
↳ Load local state (bd-43 in memory)
↳ Pull (bd-42 edit arrives in JSONL)
↳ Merge local + remote
↳ Export merged state
↳ Push (contains both bd-43 AND bd-42 edit)
```
By loading local state into memory before pulling, we can perform a proper merge that preserves both sets of changes.
## The 3-Way Merge Model
Beads uses 3-way merge - the same algorithm Git uses for merging branches. The reason: it distinguishes between "unchanged" and "deleted".
With 2-way merge (just comparing local vs remote), you cannot tell if an issue is missing because:
- It was deleted locally
- It was deleted remotely
- It never existed in one copy
3-way merge adds a **base state** - the snapshot from the last successful sync:
```
Base (last sync)
|
+------+------+
| |
Local Remote
(your DB) (git pull)
| |
+------+------+
|
Merged
```
This enables precise conflict detection:
| Base | Local | Remote | Result | Reason |
|------|-------|--------|--------|--------|
| A | A | A | A | No changes |
| A | A | B | B | Only remote changed |
| A | B | A | B | Only local changed |
| A | B | B | B | Both made same change |
| A | B | C | **merge** | True conflict |
| A | - | A | **deleted** | Local deleted, remote unchanged |
| A | A | - | **deleted** | Remote deleted, local unchanged |
| A | B | - | B | Local changed after remote deleted |
| A | - | B | B | Remote changed after local deleted |
The last two rows show why 3-way merge prevents accidental data loss: if one side deleted while the other modified, we keep the modification.
## Sync Flow
```
bd sync
1. Pull --> 2. Merge --> 3. Export --> 4. Push
Remote 3-way JSONL Remote
| | | |
v v v v
Fetch Compare all Write merged Commit +
issues.jsonl three states to issues.jsonl push
```
Step-by-step:
1. **Load local state** - Read all issues from SQLite database into memory
2. **Load base state** - Read `sync_base.jsonl` (last successful sync snapshot)
3. **Pull** - Fetch and merge remote git changes
4. **Load remote state** - Parse `issues.jsonl` after pull
5. **3-way merge** - Compare base vs local vs remote for each issue
6. **Import** - Write merged result to database
7. **Export** - Write database to JSONL (ensures DB is source of truth)
8. **Commit & Push** - Commit changes and push to remote
9. **Update base** - Save current state as base for next sync
## Why Different Merge Strategies?
Not all fields should merge the same way. Consider labels: if Machine A adds "urgent" and Machine B adds "blocked", the merged result should have both labels - not pick one or the other.
Beads uses field-specific merge strategies:
| Field Type | Strategy | Why This Strategy? |
|------------|----------|-------------------|
| Scalars (title, status, priority) | LWW | Only one value possible; most recent wins |
| Labels | Union | Multiple valid; keep all (no data loss) |
| Dependencies | Union | Links should not disappear silently |
| Comments | Append | Chronological; dedup by ID prevents duplicates |
**LWW (Last-Write-Wins)** uses the `updated_at` timestamp to determine which value wins. On timestamp tie, remote wins (arbitrary but deterministic).
**Union** combines both sets. If local has `["urgent"]` and remote has `["blocked"]`, the result is `["blocked", "urgent"]` (sorted for determinism).
**Append** collects all comments from both sides, deduplicating by comment ID. This ensures conversations are never lost.
## Why "Zombie" Issues?
When merging, there is an edge case: what happens when one machine deletes an issue while another modifies it?
```
Machine A: Delete bd-42 → sync
Machine B: (offline) → Edit bd-42 → sync
Pull reveals bd-42 was deleted, but local has edits
```
Beads follows the principle of **no silent data loss**. If local has meaningful changes to an issue that remote deleted, the local changes win. The issue "resurrects" - it comes back from the dead.
This is intentional: losing someone's work without warning is worse than keeping a deleted issue. The user can always delete it again if needed.
However, if the local copy is unchanged from base (meaning the user on this machine never touched it since last sync), the deletion propagates normally.
## Concurrency Protection
What happens if you run `bd sync` twice simultaneously? Without protection, both processes could:
1. Load the same base state
2. Pull at different times (seeing different remote states)
3. Merge differently
4. Overwrite each other's exports
5. Push conflicting commits
Beads uses an **exclusive file lock** (`.beads/.sync.lock`) to serialize sync operations:
```go
lock := flock.New(lockPath)
locked, err := lock.TryLock()
if !locked {
return fmt.Errorf("another sync is in progress")
}
defer lock.Unlock()
```
The lock is non-blocking - if another sync is running, the second sync fails immediately with a clear error rather than waiting indefinitely.
The lock file is not git-tracked (it only matters on the local machine).
## Clock Skew Considerations
LWW relies on timestamps, which introduces a vulnerability: what if machine clocks disagree?
```
Machine A (clock correct): Edit bd-42 at 10:00:00
Machine B (clock +1 hour): Edit bd-42 at "11:00:00" (actually 10:00:30)
Machine B wins despite editing later
```
Beads cannot fully solve clock skew (distributed systems limitation), but it mitigates the risk:
1. **24-hour warning threshold** - If two timestamps differ by more than 24 hours, a warning is emitted. This catches grossly misconfigured clocks.
2. **Union for collections** - Labels and dependencies use union merge, which is immune to clock skew (both values kept).
3. **Append for comments** - Comments are sorted by `created_at` but never lost due to clock skew.
For maximum reliability, ensure machine clocks are synchronized via NTP.
## Files Reference
| File | Purpose |
|------|---------|
| `.beads/issues.jsonl` | Current state (git-tracked) |
| `.beads/sync_base.jsonl` | Last-synced state (git-tracked) |
| `.beads/.sync.lock` | Concurrency guard (not tracked) |
| `.beads/beads.db` | SQLite database (not tracked) |
The JSONL files are the source of truth for git. The database is derived from JSONL on each machine.
## Sync Modes
Beads supports several sync modes for different use cases:
| Mode | Trigger | Flow | Use Case |
|------|---------|------|----------|
| **Normal** | Default `bd sync` | Pull → Merge → Export → Push | Standard multi-machine sync |
| **Sync-branch** | `sync.branch` config | Separate git branch for beads files | Isolated beads history |
| **External** | `BEADS_DIR` env | Separate repo for beads | Shared team database |
| **From-main** | `sync.from_main` config | Clone beads from main branch | Feature branch workflow |
| **Local-only** | No git remote | Export only (no push) | Single-machine usage |
| **Export-only** | `--no-pull` flag | Export → Push (skip pull/merge) | Force local state to remote |
### Mode Selection Logic
```
sync:
├─ --no-pull flag?
│ └─ Yes → Export-only (skip pull/merge)
├─ No remote configured?
│ └─ Yes → Local-only (export only)
├─ BEADS_DIR or external .beads?
│ └─ Yes → External repo mode
├─ sync.branch configured?
│ └─ Yes → Sync-branch mode
├─ sync.from_main configured?
│ └─ Yes → From-main mode
└─ Normal pull-first sync
```
### Test Coverage
Each mode has E2E tests in `cmd/bd/`:
| Mode | Test File |
|------|-----------|
| Normal | `sync_test.go`, `sync_merge_test.go` |
| Sync-branch | `sync_modes_test.go` |
| External | `sync_external_test.go` |
| From-main | `sync_branch_priority_test.go` |
| Local-only | `sync_local_only_test.go` |
| Export-only | `sync_modes_test.go` |
| Sync-branch (CLI E2E) | `syncbranch_e2e_test.go` |
| Sync-branch (Daemon E2E) | `daemon_sync_branch_e2e_test.go` |
## Sync Paths: CLI vs Daemon
Sync-branch mode has two distinct code paths that must be tested independently:
```
bd sync (CLI) Daemon (background)
│ │
▼ ▼
Force close daemon daemon_sync_branch.go
(prevent stale conn) syncBranchCommitAndPush()
│ │
▼ ▼
syncbranch.CommitToSyncBranch Direct database + git
syncbranch.PullFromSyncBranch with forceOverwrite flag
```
### Why Two Paths?
SQLite connections become stale when the daemon holds them while the CLI operates on the same database. The CLI path forces daemon closure before sync to prevent connection corruption. The daemon path operates directly since it owns the connection.
### Test Isolation Strategy
Each E2E test requires proper isolation to prevent interference:
| Variable | Purpose |
|----------|---------|
| `BEADS_NO_DAEMON=1` | Prevent daemon auto-start (set in TestMain) |
| `BEADS_DIR=<clone>/.beads` | Isolate database per clone |
### E2E Test Architecture: Bare Repo Pattern
E2E tests use a bare repository as a local "remote" to enable real git operations:
```
┌─────────────┐
│ bare.git │ ← Local "remote"
└──────┬──────┘
┌──────┴──────┐
▼ ▼
Machine A Machine B
(clone) (clone)
│ │
│ bd-1 │ bd-2
│ push │ push (wins)
│ │
│◄────────────┤ divergence
│ 3-way merge │
▼ │
[bd-1, bd-2] │
```
| Aspect | update-ref (old) | bare repo (new) |
|--------|------------------|-----------------|
| Push testing | Simulated | Real |
| Fetch testing | Fake refs | Real |
| Divergence | Cannot test | Non-fast-forward |
### E2E Test Coverage Matrix
| Test | Path | What It Tests |
|------|------|---------------|
| TestSyncBranchE2E | CLI | syncbranch.CommitToSyncBranch/Pull |
| TestDaemonSyncBranchE2E | Daemon | syncBranchCommitAndPush/Pull |
| TestDaemonSyncBranchForceOverwrite | Daemon | forceOverwrite delete propagation |
## Historical Context
The pull-first sync design was introduced in PR #918 to fix issue #911 (data loss during concurrent edits). The original export-first design was simpler but could not handle the "edit during sync" scenario correctly.
The 3-way merge algorithm borrows concepts from:
- Git's merge strategy (base state concept)
- CRDT research (union for sets, LWW for scalars)
- Tombstone patterns (deletion tracking with TTL)
## See Also
- [DELETIONS.md](DELETIONS.md) - Tombstone behavior and deletion tracking
- [GIT_INTEGRATION.md](GIT_INTEGRATION.md) - How beads integrates with git
- [DAEMON.md](DAEMON.md) - Automatic sync via daemon
- [ARCHITECTURE.md](ARCHITECTURE.md) - Overall system architecture