-
Notifications
You must be signed in to change notification settings - Fork 398
Extend merge conflicts resolution [v2] #9552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend merge conflicts resolution [v2] #9552
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comment, do not block. we should try and keep the current code work the same out of the box and enable customization without changing how we construct our types.
cmd/lakefs/cmd/run.go
Outdated
conflictResolvers := catalogfactory.BuildConflictResolvers(blockStore) | ||
|
||
c, err := catalog.New(ctx, catalogConfig, conflictResolvers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conflict resolver should be option in the calalog.Config, it was made in order not to pass N args to configure the catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As our dear friend Claude likes to say -
"You're absolutely right!"
Updated.
type EntryConflictResolver interface { | ||
// FilterByPath returns true if the path should be considered for conflict resolution. | ||
FilterByPath(path string) bool | ||
|
||
// ResolveConflict resolves conflicts between two DBEntry values. | ||
// It returns the resolved value, or nil if the conflict cannot be resolved automatically. | ||
// Assuming the source and dest have the same key (path). | ||
ResolveConflict(ctx context.Context, oCtx graveler.ObjectContext, strategy graveler.MergeStrategy, srcValue, destValue *DBEntry) (*DBEntry, error) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to enable the catalog users to have access to the default conflict resolver - the current behavior and enable middleware like interface building for the resolve conflict.
It seems that we added FilterByPath
just to understand in the graveler level if we configured conflict resolver. If we configure in graveler level a default handler and override it only when we override the handler in the catalog level, it should prevent any extra call to the catalog if we do not provide any implementation in the graveler level.
Following generated code an an example of how to produce a catalog conflict resolver middleware like: (we may not require all the code it is just how http middleware enables you interface/function the same functionality)
type ConflictHandler func(
ctx context.Context,
oCtx graveler.ObjectContext,
strategy graveler.MergeStrategy,
srcValue, destValue *DBEntry,
) (*DBEntry, error)
// Middleware wraps a handler and returns a new handler.
type ConflictMiddleware func(next ConflictHandler) ConflictHandler
// Chain applies middlewares (left-to-right) around a base handler.
func Chain(base ConflictHandler, mws ...ConflictMiddleware) ConflictHandler {
h := base
for i := len(mws) - 1; i >= 0; i-- {
h = mws[i](h)
}
return h
}
// Interface form, if you prefer objects:
type ConflictResolver interface {
ResolveConflict(
ctx context.Context,
oCtx graveler.ObjectContext,
strategy graveler.MergeStrategy,
srcValue, destValue *DBEntry,
) (*DBEntry, error)
}
// Adapters between interface and function types.
func HandlerFromResolver(r ConflictResolver) ConflictHandler {
return r.ResolveConflict
}
func ResolverFromHandler(h ConflictHandler) ConflictResolver {
return conflictResolverFunc(h)
}
type conflictResolverFunc ConflictHandler
func (f conflictResolverFunc) ResolveConflict(
ctx context.Context,
oCtx graveler.ObjectContext,
strategy graveler.MergeStrategy,
srcValue, destValue *DBEntry,
) (*DBEntry, error) {
return ConflictHandler(f)(ctx, oCtx, strategy, srcValue, destValue)
}
type ConflictResolverWrapper struct { | ||
ConflictResolver EntryConflictResolver | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we implement middleware/handler like interface at the graveler level - we don't need to create a struct to point to our handler.
} | ||
|
||
func NewCommittedManager(m map[graveler.StorageID]MetaRangeManager, r map[graveler.StorageID]RangeManager, p Params) graveler.CommittedManager { | ||
func NewCommittedManager(m map[graveler.StorageID]MetaRangeManager, r map[graveler.StorageID]RangeManager, crs []graveler.ConflictResolver, p Params) graveler.CommittedManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest that the catalog/graveler/committed will be created and work as the default behavior when created and optionally configured using a method to change the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it would simplify some of the code. Also, if we have that functionality, we can change the module we created to accept the committed manager and enrich it if wanted
pkg/graveler/graveler.go
Outdated
// ObjectContext holds context information for reading an object | ||
type ObjectContext struct { | ||
StorageID string | ||
StorageNamespace string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object context is a bit confusing, maybe repositoryContext is more suitable
(It was hard for me to understand why is it OK for the merger that processes many objects to hold the objectContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I agree with this -
RepoContext sounds too specific to me, and doesn't relate to the fact that it has what we need for reading an object.
It's also described in the comment...
pkg/graveler/committed/manager.go
Outdated
oCtx := graveler.ObjectContext{ | ||
StorageID: string(mctx.storageID), | ||
StorageNamespace: string(mctx.ns), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this strings?
also nit:
merge context can hold the ObjectContext directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the string conversion to the consumer side,
And merge context is private, and isn't passed to Merge()
.
} | ||
|
||
func NewCommittedManager(m map[graveler.StorageID]MetaRangeManager, r map[graveler.StorageID]RangeManager, p Params) graveler.CommittedManager { | ||
func NewCommittedManager(m map[graveler.StorageID]MetaRangeManager, r map[graveler.StorageID]RangeManager, crs []graveler.ConflictResolver, p Params) graveler.CommittedManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it would simplify some of the code. Also, if we have that functionality, we can change the module we created to accept the committed manager and enrich it if wanted
// FilterByPath returns true if the path should be considered for conflict resolution. | ||
FilterByPath(path string) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? can't Resolve conflict return nil, nil
or something if it's not a path it can resolve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that EntryConflictResolver.ResolveConflict()
is only called by ConflictResolverWrapper.ResolveConflict()
.
In order to avoid calling newCatalogEntryFromValueRecord()
for each conflicted files (table or not),
I added this filter.
Gotta admit that having only path
as a param seems too specific to me -
If you can think of a better option, lmk.
func BuildConflictResolvers(block block.Adapter) []graveler.ConflictResolver { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we return the StrategyConflictResolver here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's the default behavior when the conflict resolvers are nil.
I prefer relying on this default, rather then referring to committed.*
here plus maintaining the default in two different places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, my only concern is that if someone decides to add something here in the future he will need to know that he needs to add the StrategyConflictResolver
in the chain, otherwise it isn't called anymore
pkg/graveler/committed/merge_test.go
Outdated
metaRangeManagers := make(map[graveler.StorageID]committed.MetaRangeManager) | ||
metaRangeManagers[tst.storageID] = metaRangeManager | ||
committedManager := committed.NewCommittedManager(metaRangeManagers, rangeManagers, params) | ||
resolvers := append(tst.conflictResolvers, &committed.StrategyConflictResolver{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding the StrategyConflictResolver
? This is a bit surprising, I wouldn't expect this when calling runMergeTests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, old code.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice work
Added some suggestions, non of them are blockers
func BuildConflictResolvers(block block.Adapter) []graveler.ConflictResolver { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, my only concern is that if someone decides to add something here in the future he will need to know that he needs to add the StrategyConflictResolver
in the chain, otherwise it isn't called anymore
pkg/catalog/catalog.go
Outdated
if resolvedDBEntry == nil || err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
prefer this in two separate blocks
That way, it's clear that there are 4 cases (regardless of error)
maybe even a switch case here
switch {
case resolvedDBEntry == nil:
// Not a conflict the catalog should resolve
returnValue = nil
case resolvedDBEntry.Path == srcDBEntry.Path:
// Resolved to src entry - return src value
returnValue = srcValue
case resolvedDBEntry.Path == destDBEntry.Path:
// Resolved to dest entry - return dest value
returnValue = destValue
default:
// Encode resolved entry to value
resolvedEntry := newEntryFromCatalogEntry(*resolvedDBEntry)
value, err := EntryToValue(resolvedEntry)
if err != nil {
return nil, fmt.Errorf("encode resolved entry: %w", err)
}
returnValue = &graveler.ValueRecord{
Key: srcValue.Key, // srcValue and destValue keys are the same
Value: value,
}
}
return returnValue, nil
pkg/graveler/graveler.go
Outdated
// MergeStrategy changes from dest or source are automatically overridden in case of a conflict | ||
type MergeStrategy int | ||
|
||
// ConflictResolver is used to resolve conflicts during a merge operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion, but I think it's worth mentioning that the resolution is at the object level
// ConflictResolver is used to resolve conflicts during a merge operation | |
// ConflictResolver is used to resolve conflicting objects during a merge operation |
Change Description
Allowing an extension for the merge conflict resolution logic.
Will be used by the (Enterprise) compaction conflicts resolver.