-
Notifications
You must be signed in to change notification settings - Fork 2k
[WIP] Update LoraConfig
for KaSA implementation
#2698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for resuming your work on KaSA.
Implementation-wise, we need to take a different approach. Right now, KaSA is just added to the normal LoRA code, but we only want to activate it if the user opts in. Therefore, it should be implemented in a separate class, something like KasaVariant
, in peft/tuners/lora/variants.py
. Please check how DoRA is implemented and use a similar approach, as I have detailed in my previous comment. If anything is unclear, feel free to ask.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
gentle ping @nsbg |
Thank you for your alert! I spent some time looking over the KaSA paper and code to get ready for more serious work, but it does seem pretty difficult 🥲 My goal is to upload code that's ready for review before the end of September, so I'm going to try even harder. Right now, I'm stuck at the 'Extend LoRA variant resolution' stage you mentioned. Honestly, this seems like the most important part, but it's hard for me to figure out where to start—specifically, which file and class I should work on first. Could you help me with this? |
That's great to see, thanks for picking this back up.
You're already on the right track, you added Next about resolving the variants. As a first step, let's revert the changes you made to Then let's look at these lines in peft/src/peft/tuners/lora/layer.py Lines 636 to 642 in a3197b1
Here we need to extend the functionality to add KaSA. The updated method could be something like: def resolve_lora_variant(self, *, use_dora: bool, use_kasa: bool, **kwargs) -> Optional[LoraVariant]:
if use_dora and use_kasa:
raise ValueError("Cannot use DoRA and KaSA at the same time, please choose only one.")
variant = None
if use_dora:
from .variants import DoraLinearVariant
variant = DoraLinearVariant()
elif use_kasa:
...
return variant Does that make sense? Similarly, we'd have to update the I would suggest that you work on this as a next step, then we'll see what else needs to be done. |
wow I really appreciate your sincere feedback. I'll read your advice carefully and then move forward 🤗 |
@BenjaminBossan I modified the code in the files below based on what you explained. Please give me feedback if there are parts that still need fixing, and then we can discuss the next steps. 1. variants.py
2. layer.py
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for integrating my feedback. I gave this another review and noted the next few changes that are necessary. Please check my comments.
Apart from this, the branch is now encountering merge conflicts. Could you please bring your fork up-to-date with the remote and then merge with, or rebase on, the latest main branch from PEFT? If you have questions on how to resolve the merge conflicts, don't hesitate to ask.
Furthermore, please always run make style
on your changes before pushing to make our linter happy.
More of a note for myself: Since KaSA updates the base weights of the model, we will have to take extra care to ensure that it works correctly when saving and loading the adapter.
src/peft/tuners/lora/layer.py
Outdated
""" | ||
return None | ||
if use_dora and use_kasa: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's undo the changes in this method body and return None
. Instead, since this KaSA layer is implemented for Linear
only, add the logic to lora.Linear.resolve_lora_variant
instead.
Also, we should update the resolve_lora_variant
methods of the other layer types like lora.Embedding.resolve_lora_variant
to accept the use_kasa
argument but raise an error if it's True
. Otherwise, users may add it to non-supported layers and not notice that it doesn't actually do anything there.
src/peft/tuners/lora/layer.py
Outdated
############ kasa ############# | ||
self.lora_diag[adapter_name] = nn.Parameter(torch.randn(r), requires_grad=True) | ||
|
||
weight = self.get_base_layer().weight | ||
dtype = weight.dtype | ||
svd_rank = self.in_features - r | ||
weight = weight.to(torch.float32) | ||
U, S, Vh = torch.linalg.svd(weight.data, full_matrices=False) | ||
U_principle, S_principle, Vh_principle = U[:, :svd_rank], S[:svd_rank], Vh[:svd_rank, :] | ||
self.get_base_layer().weight.data = (U_principle @ torch.diag(S_principle) @ Vh_principle).to(dtype) | ||
|
||
######################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this can be removed, since it's part of KasaLinearVariant.init
, right?
# initialize lora_diag | ||
module.lora_diag[adapter_name] = nn.Parameter(torch.randn(module.r[adapter_name]), requires_grad=True) | ||
|
||
# SVD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a reference here, so that we know the origin:
# see https://github.com/juyongjiang/KaSA/blob/f85e88c22d0fa4cb8ab2923d7c2bf1bbec152da3/peft/src/peft/tuners/lora/layer.py#L132
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# initialize lora_diag
module.lora_diag[adapter_name] = nn.Parameter(torch.randn(module.r[adapter_name]), requires_grad=True)
# see https://github.com/juyongjiang/KaSA/blob/f85e88c22d0fa4cb8ab2923d7c2bf1bbec152da3/peft/src/peft/tuners/lora/layer.py#L132
# SVD
I put it in here, how is it?
@staticmethod | ||
def merge_safe(module: Linear, active_adapter: str, orig_weight: torch.Tensor) -> torch.Tensor: | ||
delta_weight = module.get_delta_weight(active_adapter) | ||
return orig_weight + delta_weight | ||
|
||
@staticmethod | ||
def merge_unsafe(module: Linear, active_adapter: str, orig_weight: torch.Tensor) -> None: | ||
delta_weight = module.get_delta_weight(active_adapter) | ||
orig_weight.data += delta_weight | ||
|
||
@staticmethod | ||
def unmerge(module: Linear, active_adapter: str, orig_weight: torch.Tensor) -> torch.Tensor: | ||
delta_weight = module.get_delta_weight(active_adapter) | ||
return orig_weight - delta_weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KaSA should have an influence on the merged weights, should it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this PR is closed, it seems I've incorporated everything else except for this comment (of course, you'd have to look at the code). Could you explain this question in more detail?
src/peft/tuners/lora/variants.py
Outdated
x = dropout(x) | ||
|
||
# KaSA calculation | ||
lora_output = lora_B(torch.einsum('ijk,kl->ijl', lora_A(x), diag)) * scaling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, let's add a reference:
# see https://github.com/juyongjiang/KaSA/blob/f85e88c22d0fa4cb8ab2923d7c2bf1bbec152da3/peft/src/peft/tuners/lora/layer.py#L602C21-L602C110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# KaSA calculation
# see https://github.com/juyongjiang/KaSA/blob/f85e88c22d0fa4cb8ab2923d7c2bf1bbec152da3/peft/src/peft/tuners/lora/layer.py#L602C21-L602C110
lora_output = lora_B(torch.einsum('ijk,kl->ijl', lora_A(x), diag)) * scaling
return result + lora_output
I inserted this near where the actual calculation logic begins, rather than just in an empty space. I think this is a bit better.
@BenjaminBossan oh I didn't mean to close the branch, but it seems to have closed while I was merging with the main branch. I guess I'll have to open a new PR, right? 😰 +) when I tried to sync with the main branch, I ended up discarding all my commits, so did that cause it to close? |
I don't know what happened, but I could re-open the PR and there are some changes visible. Can you double check that everything looks as expected? If for some reason it's not what it's expected, you can create a new PR and push your local branch. |
I usually handle merges in the terminal, and I suspect the pull request was closed because I accidentally wiped the commit history while using the 'Sync fork' feature on GitHub. I'll be more careful in the future. Thanks for reopening it. I'll review the changes and open a new PR if needed. Sorry to keep bothering you with this. |
No worries. If the diff on this PR looks good, let me know and I'll do a review. Only open a new PR if for some reason, the code here does not correspond to what it should be. |
@BenjaminBossan I checked layer.py/variants.py and KasaLinearVariants class in variants.py was removed. I added it again and I updated file based on your minor feedback, so I think we can discuss in this PR continually. BTW I ran
I ran |
No, let's not push any changes to unrelated files. If
|
I referred to your explanation and added the use_kasa parameter to the
The logic for raising errors in each layer hasn’t been applied yet, but I committed first to check whether adding the parameter in this way matches what you meant. Excluding the Linear class, it seems like an error should be raised when use_kasa is true in the other classes. However, I might be mistaken, so please feel free to give me feedback anytime. Also, I noticed there’s no part that calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, we're making good progress here.
The logic for raising errors in each layer hasn’t been applied yet, but I committed first to check whether adding the parameter in this way matches what you meant. Excluding the Linear class, it seems like an error should be raised when use_kasa is true in the other classes.
Nice, this looks correct, please raise the error in the unsupported layers as indicated.
Also, I noticed there’s no part that calls KasaLinearVariant—should this be called inside the linear class? I’m a bit confused about this part.
Don't worry, it is being called. E.g. in lora.Linear.forward
we have this code:
peft/src/peft/tuners/lora/layer.py
Lines 806 to 816 in 190f987
if active_adapter not in self.lora_variant: # vanilla LoRA | |
result = result + lora_B(lora_A(dropout(x))) * scaling | |
else: | |
result = self.lora_variant[active_adapter].forward( | |
self, | |
active_adapter=active_adapter, | |
x=x, | |
result=result, | |
**variant_kwargs, | |
**kwargs, | |
) |
So if the KaSA variant is found, KasaLinearVariant.forward
will be used here. Same for the other methods.
src/peft/tuners/lora/variants.py
Outdated
U, S, Vh = torch.linalg.svd(weight.data, full_matrices=False) | ||
U_principle, S_principle, Vh_principle = U[:, :svd_rank], S[:svd_rank], Vh[:svd_rank, :] | ||
module.get_base_layer().weight.data = (U_principle @ torch.diag(S_principle) @ Vh_principle).to(dtype) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a new @staticmethod
called _get_delta_weight
here. This method should implement the KaSA delta weight logic from above:
diag = torch.diag(module.lora_diag[adapter])
output_tensor = transpose(weight_B @ diag @ weight_A, module.fan_in_fan_out) * module.scaling[adapter]
Then, the merge_safe
, merge_unsafe
, and unmerge
methods below can call KasaLinearVariant._get_delta_weight(...)
.
cc @BenjaminBossan
I was delayed in updating the code because I was focusing on company work, but now I'm planning to resume the project in earnest. If I have any questions about implementing the code, may I continue to ask you?
I apologize for opening a new pull request, as the previous one was closed 🥲 Thank you for your understanding.