Implement side-by-side diffs #5925

ChrisPenner · 2025-10-08T23:59:42Z

Overview

This represents a full overhaul of our diffing algorithm. The goal is to better align diffs in spite of change.

The old algorithm was entirely token-oriented, it didn't do anything to special-case lines, so if a bunch of lines were deleted on one side, the other side's diff would get out of alignment.

The new algorithm inserts spacers where necessary to keep similar chunks in line.

Here's an example diff using the new algorithm,
Each line is marked with -/+ when changed, and changed tokens within changed lines are highlighted by wrapping with {} (Don't worry, Simon will make the rendered version prettier)

=====left====
unchangedDefinition = 1
myDefinition : Nat -> Text -> Text
myDefinition n txt =
  -- Check for non-positive n
  if n == 0 then txt
  else Text.drop n txt

====right====
unchangedDefinition = 1
myNewDefinition : Int -> Text -> Text
myNewDefinition k input =
  -- Check for non-positive n
  if n <= 0
    then txt
  else
    Text.take n input

====diff====
  unchangedDefinition = 1                |   unchangedDefinition = 1
- {myDefinition} : {Nat} -> Text -> Text | + {myNewDefinition} : {Int} -> Text -> Text
- {myDefinition} {n} {txt} =             | + {myNewDefinition} {k} {input} =
    -- Check for non-positive n          |     -- Check for non-positive n
-   if n {==} 0 then txt                 | +   if n {<=} 0
-   else {Text.drop} n {txt}             | +  { }{ }{ }then txt
////                                     | +   else
////                                     | +   { }{ }{Text.take}{ }n {input}

Implementation notes

At a high level, the algorithm:

First runs the diff algorithm using entire lines as tokens, this gives us a list of equal lines, with chunks of changed lines in between
Now we can map over the changed trunks and run a token-diff on each of those. This behaves similar to the old algorithm
We then compute the difference between the number of lines in the lhs and rhs diff sides; and add padding to fill out the smaller.
This gives us two lists of lines, one for lhs, one for rhs. Each line is tagged with whether it's been changed or not. Lines with changes include annotations on each token indicating whether that particular token exists on the other side. This includes annotations for cases where the name is the same but the hash changed and vice versa.

This replaces the other diff algorithm entirely, which is currently unused in UCM, but will need matching updates from Simon when deploying to Share.

Test coverage

I added property tests to generate random diffs and assert that the diffs always have matching line-counts (asserting that spacers are being added)

I also added explicit test-cases using a text-rending of diffs.

There are also transcript tests to show the output json, but those aren't good for evaluating the diff effectiveness.

Loose ends

There's still another pass we can do on this to be smarter about where within a change block we put spacers, e.g. look at this VS code diff where spacers are interleaved within the change block to line up similar words.

Pretty sure I can iterate on this to get the same, but this is already a significant enough change that it's worth it to ship.

ChrisPenner · 2025-10-09T00:11:43Z

unison-src/transcripts/idempotent/definition-diff-api.md

@@ -127,187 +127,436 @@ GET /api/projects/scratch/diff/terms?oldBranchRef=main&newBranchRef=new&oldTerm=
 RESPONSE:
  {
      "diff": {


@hojberg note changes to the diff response.

Now the diff has {left: <lines>, right: <lines>}
where each side is an array of lines;

Each line is one of:

{kind: changed , value: [<difftokens>] }

{kind: unchanged , value: [<difftokens>] }

{kind: spacer}

The tokens themselves are similar to before, but for now each token is split up into its own object rather than having a both with a list of tokens.

We may actually want to look at more efficient serializations since I know Mitchell had noticed the diff sizes were getting really big.

I'm also looking at IDs which associate tokens on the lhs with their rhs counter-part when they match; that way you could hover the lhs and have it light up the matching token on the other side if we want :)

hojberg · 2025-10-09T20:11:52Z

@ChrisPenner how might I try this out with Share locally?

ChrisPenner · 2025-10-14T16:47:27Z

@ChrisPenner how might I try this out with Share locally?

@hojberg Here's a branch for ya!
unisoncomputing/share-api#155

ChrisPenner · 2025-10-16T20:31:55Z

@aryairani this is already up on Share and seems to be working well :)

Go ahead and merge unless you've got any concerns 👍🏼

aryairani · 2025-10-16T20:33:40Z

unison-share-api/src/Unison/Server/Backend/DefinitionDiff.hs

+-- diffSyntaxText :: SyntaxText -> SyntaxText -> [SemanticSyntaxDiff Syntax.Element]
+-- diffSyntaxText (AnnotatedText fromST) (AnnotatedText toST) =
+--   diffSegments syntaxElementDiffEq fromST toST
+--     & expandSpecialCases specialCaseAnnotations


still want this?

aryairani · 2025-10-16T20:35:01Z

unison-share-api/tests/Unison/Test/Server/Backend/DefinitionDiff.hs

+-- Helpers for testing semantic diffs on plaintext
+
+-- simpleLeft :: Text
+-- simpleLeft = "one word\ntwo words\nthree words"
+
+-- simpleRight :: Text
+-- simpleRight = "one word\ndifferent words\nthree words"
+
+-- complexLeft :: Text
+-- complexLeft = "one word\ntwo words\nthree words"
+
+-- complexRight :: Text
+-- complexRight = "one word\nmulti-line\ndifference\nshould add spacers\nthree words"


aryairani · 2025-10-16T20:37:07Z

unison-share-api/unison-share-api.cabal

 cabal-version: 1.12

-- This file has been generated from package.yaml by hpack version 0.36.0.
+-- This file has been generated from package.yaml by hpack version 0.38.1.


i have a draft to have CI use the latest 'stack' but I haven't had a chance to test it. fingers crossed though

ChrisPenner force-pushed the cp/side-by-side-diff-2 branch from 7cc0cc4 to fa3e015 Compare October 9, 2025 00:09

ChrisPenner commented Oct 9, 2025

View reviewed changes

hojberg mentioned this pull request Oct 10, 2025

Render diff on a line by line basis unisoncomputing/share-ui#118

Merged

ChrisPenner and others added 9 commits October 14, 2025 09:51

Linewise Diff Experimentation

5835fa2

Checkpoint

fba59b8

Working line-by-line diff with test harness

67784df

Add property tests for diffs

90eb986

Replace old Diff types with the new ones

86dc66e

Update transcripts

ea31c49

automatically run ormolu

f921822

Fix up tests

9e0044f

automatically run ormolu

49adb61

ChrisPenner force-pushed the cp/side-by-side-diff-2 branch from ddba7a2 to 49adb61 Compare October 14, 2025 16:52

ChrisPenner added 3 commits October 15, 2025 10:52

Fix snake case

86e9aef

Update transcripts

28bd62a

Aggregate similar diffTags

b3418a8

ChrisPenner marked this pull request as ready for review October 16, 2025 20:30

ChrisPenner requested a review from aryairani October 16, 2025 20:31

aryairani approved these changes Oct 16, 2025

View reviewed changes

aryairani merged commit fe78e65 into trunk Oct 17, 2025
32 checks passed

aryairani deleted the cp/side-by-side-diff-2 branch October 17, 2025 03:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement side-by-side diffs #5925

Implement side-by-side diffs #5925

Uh oh!

ChrisPenner commented Oct 8, 2025 •

edited

Loading

Uh oh!

ChrisPenner Oct 9, 2025 •

edited

Loading

Uh oh!

hojberg commented Oct 9, 2025

Uh oh!

ChrisPenner commented Oct 14, 2025

Uh oh!

ChrisPenner commented Oct 16, 2025

Uh oh!

aryairani Oct 16, 2025

Uh oh!

aryairani Oct 16, 2025

Uh oh!

aryairani Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement side-by-side diffs #5925

Implement side-by-side diffs #5925

Uh oh!

Conversation

ChrisPenner commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Implementation notes

Test coverage

Loose ends

Uh oh!

ChrisPenner Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hojberg commented Oct 9, 2025

Uh oh!

ChrisPenner commented Oct 14, 2025

Uh oh!

ChrisPenner commented Oct 16, 2025

Uh oh!

aryairani Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

aryairani Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

aryairani Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChrisPenner commented Oct 8, 2025 •

edited

Loading

ChrisPenner Oct 9, 2025 •

edited

Loading