Skip to content

Conversation

itaigilo
Copy link
Contributor

Change Description

Allowing an extension for the merge conflict resolution logic.

Will be used by the (Enterprise) compaction conflicts resolver.

@itaigilo itaigilo requested review from nopcoder, guy-har and a team September 24, 2025 16:11
@itaigilo itaigilo added exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached labels Sep 24, 2025
@itaigilo itaigilo requested review from guy-har and removed request for guy-har September 26, 2025 14:49
Copy link
Contributor

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comment, do not block. we should try and keep the current code work the same out of the box and enable customization without changing how we construct our types.

Comment on lines 160 to 162
conflictResolvers := catalogfactory.BuildConflictResolvers(blockStore)

c, err := catalog.New(ctx, catalogConfig, conflictResolvers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conflict resolver should be option in the calalog.Config, it was made in order not to pass N args to configure the catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As our dear friend Claude likes to say -
"You're absolutely right!"

Updated.

Comment on lines 137 to 145
type EntryConflictResolver interface {
// FilterByPath returns true if the path should be considered for conflict resolution.
FilterByPath(path string) bool

// ResolveConflict resolves conflicts between two DBEntry values.
// It returns the resolved value, or nil if the conflict cannot be resolved automatically.
// Assuming the source and dest have the same key (path).
ResolveConflict(ctx context.Context, oCtx graveler.ObjectContext, strategy graveler.MergeStrategy, srcValue, destValue *DBEntry) (*DBEntry, error)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to enable the catalog users to have access to the default conflict resolver - the current behavior and enable middleware like interface building for the resolve conflict.

It seems that we added FilterByPath just to understand in the graveler level if we configured conflict resolver. If we configure in graveler level a default handler and override it only when we override the handler in the catalog level, it should prevent any extra call to the catalog if we do not provide any implementation in the graveler level.

Following generated code an an example of how to produce a catalog conflict resolver middleware like: (we may not require all the code it is just how http middleware enables you interface/function the same functionality)

type ConflictHandler func(
	ctx context.Context,
	oCtx graveler.ObjectContext,
	strategy graveler.MergeStrategy,
	srcValue, destValue *DBEntry,
) (*DBEntry, error)

// Middleware wraps a handler and returns a new handler.
type ConflictMiddleware func(next ConflictHandler) ConflictHandler

// Chain applies middlewares (left-to-right) around a base handler.
func Chain(base ConflictHandler, mws ...ConflictMiddleware) ConflictHandler {
	h := base
	for i := len(mws) - 1; i >= 0; i-- {
		h = mws[i](h)
	}
	return h
}

// Interface form, if you prefer objects:
type ConflictResolver interface {
	ResolveConflict(
		ctx context.Context,
		oCtx graveler.ObjectContext,
		strategy graveler.MergeStrategy,
		srcValue, destValue *DBEntry,
	) (*DBEntry, error)
}

// Adapters between interface and function types.
func HandlerFromResolver(r ConflictResolver) ConflictHandler {
	return r.ResolveConflict
}

func ResolverFromHandler(h ConflictHandler) ConflictResolver {
	return conflictResolverFunc(h)
}

type conflictResolverFunc ConflictHandler

func (f conflictResolverFunc) ResolveConflict(
	ctx context.Context,
	oCtx graveler.ObjectContext,
	strategy graveler.MergeStrategy,
	srcValue, destValue *DBEntry,
) (*DBEntry, error) {
	return ConflictHandler(f)(ctx, oCtx, strategy, srcValue, destValue)
}

Comment on lines +3247 to +3249
type ConflictResolverWrapper struct {
ConflictResolver EntryConflictResolver
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement middleware/handler like interface at the graveler level - we don't need to create a struct to point to our handler.

}

func NewCommittedManager(m map[graveler.StorageID]MetaRangeManager, r map[graveler.StorageID]RangeManager, p Params) graveler.CommittedManager {
func NewCommittedManager(m map[graveler.StorageID]MetaRangeManager, r map[graveler.StorageID]RangeManager, crs []graveler.ConflictResolver, p Params) graveler.CommittedManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest that the catalog/graveler/committed will be created and work as the default behavior when created and optionally configured using a method to change the behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it would simplify some of the code. Also, if we have that functionality, we can change the module we created to accept the committed manager and enrich it if wanted

Comment on lines 131 to 135
// ObjectContext holds context information for reading an object
type ObjectContext struct {
StorageID string
StorageNamespace string
}
Copy link
Contributor

@guy-har guy-har Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object context is a bit confusing, maybe repositoryContext is more suitable
(It was hard for me to understand why is it OK for the merger that processes many objects to hold the objectContext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree with this -
RepoContext sounds too specific to me, and doesn't relate to the fact that it has what we need for reading an object.
It's also described in the comment...

Comment on lines 319 to 322
oCtx := graveler.ObjectContext{
StorageID: string(mctx.storageID),
StorageNamespace: string(mctx.ns),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this strings?
also nit:
merge context can hold the ObjectContext directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the string conversion to the consumer side,

And merge context is private, and isn't passed to Merge().

}

func NewCommittedManager(m map[graveler.StorageID]MetaRangeManager, r map[graveler.StorageID]RangeManager, p Params) graveler.CommittedManager {
func NewCommittedManager(m map[graveler.StorageID]MetaRangeManager, r map[graveler.StorageID]RangeManager, crs []graveler.ConflictResolver, p Params) graveler.CommittedManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it would simplify some of the code. Also, if we have that functionality, we can change the module we created to accept the committed manager and enrich it if wanted

Comment on lines 138 to 139
// FilterByPath returns true if the path should be considered for conflict resolution.
FilterByPath(path string) bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? can't Resolve conflict return nil, nil or something if it's not a path it can resolve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that EntryConflictResolver.ResolveConflict() is only called by ConflictResolverWrapper.ResolveConflict().
In order to avoid calling newCatalogEntryFromValueRecord() for each conflicted files (table or not),
I added this filter.

Gotta admit that having only path as a param seems too specific to me -
If you can think of a better option, lmk.

Comment on lines +8 to +10
func BuildConflictResolvers(block block.Adapter) []graveler.ConflictResolver {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we return the StrategyConflictResolver here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the default behavior when the conflict resolvers are nil.
I prefer relying on this default, rather then referring to committed.* here plus maintaining the default in two different places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, my only concern is that if someone decides to add something here in the future he will need to know that he needs to add the StrategyConflictResolver in the chain, otherwise it isn't called anymore

metaRangeManagers := make(map[graveler.StorageID]committed.MetaRangeManager)
metaRangeManagers[tst.storageID] = metaRangeManager
committedManager := committed.NewCommittedManager(metaRangeManagers, rangeManagers, params)
resolvers := append(tst.conflictResolvers, &committed.StrategyConflictResolver{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding the StrategyConflictResolver? This is a bit surprising, I wouldn't expect this when calling runMergeTests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, old code.
Updated.

@itaigilo itaigilo requested a review from guy-har September 29, 2025 10:06
Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice work
Added some suggestions, non of them are blockers

Comment on lines +8 to +10
func BuildConflictResolvers(block block.Adapter) []graveler.ConflictResolver {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, my only concern is that if someone decides to add something here in the future he will need to know that he needs to add the StrategyConflictResolver in the chain, otherwise it isn't called anymore

Comment on lines 3284 to 3286
if resolvedDBEntry == nil || err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
prefer this in two separate blocks
That way, it's clear that there are 4 cases (regardless of error)
maybe even a switch case here

	switch  {
	case resolvedDBEntry == nil:
		// Not a conflict the catalog should resolve
		returnValue = nil
	case resolvedDBEntry.Path == srcDBEntry.Path:
		// Resolved to src entry - return src value
		returnValue = srcValue
	case resolvedDBEntry.Path == destDBEntry.Path:
		// Resolved to dest entry - return dest value
		returnValue = destValue
	default:
		// Encode resolved entry to value
		resolvedEntry := newEntryFromCatalogEntry(*resolvedDBEntry)
		value, err := EntryToValue(resolvedEntry)
		if err != nil {
			return nil, fmt.Errorf("encode resolved entry: %w", err)
		}
		returnValue = &graveler.ValueRecord{
			Key:   srcValue.Key, // srcValue and destValue keys are the same
			Value: value,
		}
	}
	return returnValue, nil

// MergeStrategy changes from dest or source are automatically overridden in case of a conflict
type MergeStrategy int

// ConflictResolver is used to resolve conflicts during a merge operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion, but I think it's worth mentioning that the resolution is at the object level

Suggested change
// ConflictResolver is used to resolve conflicts during a merge operation
// ConflictResolver is used to resolve conflicting objects during a merge operation

@itaigilo itaigilo merged commit 08bff00 into master Sep 29, 2025
40 checks passed
@itaigilo itaigilo deleted the feature/extend-merge-conflicts-resolution-1 branch September 29, 2025 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants