-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[NUTCH-2856] Implement a protocol-smb plugin based on hierynomus/smbj #826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Draft version of a protocol-smb plugin. Lots of todo comments still, but it seems to work.
Moving this to DRAFT status and acknowledging the PR @HiranChaudhuri thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HiranChaudhuri I added quite a few comments for your consideration. Thanks for submitting this PR 👍
Please ping me once your ready and we can go for round # 2 of peer review.
Further out, I think we could implement some testing for this protocol plugin. We could use testcontainers
and essentially spin up a local Samba server using @nddipiazza 's smbj-inttest image. We can come back to this one the PR has evolved a bit.
filePattern="${hadoop.log.dir}/$${date:yyyy-MM}/nutch-%d{yyyy-MM-dd}.log.gz"> | ||
<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" /> | ||
<!--<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />--> | ||
<PatternLayout pattern="%d %p %c [%t] %m%n" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this change? Does this print the logger name in full?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does print in full as I am otherwise not entirely clear where a message is coming from.
We can revert the change before merge - for the time being on my side I need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HiranChaudhuri please revert. Thank you
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/Smb.java
Outdated
Show resolved
Hide resolved
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/Smb.java
Outdated
Show resolved
Hide resolved
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/Smb.java
Outdated
Show resolved
Hide resolved
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/Smb.java
Outdated
Show resolved
Hide resolved
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SmbURLConnection.java
Show resolved
Hide resolved
Improve error handling Rename class as requested Added license header Improve url parsing added robots.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of the responses @HiranChaudhuri please see my updated suggestions.
filePattern="${hadoop.log.dir}/$${date:yyyy-MM}/nutch-%d{yyyy-MM-dd}.log.gz"> | ||
<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" /> | ||
<!--<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />--> | ||
<PatternLayout pattern="%d %p %c [%t] %m%n" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Hi @HiranChaudhuri now that you've activated the plugin
Please let me know your thoughts on this. |
The container looks good. I have no clue about the @rule annotation and am interested to see how this gets combined together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivial header requests. Thanks
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/URLAuthentication.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HiranChaudhuri
Thanks for coming back to this PR, and thanks for the contribution.
Please see my comments, I am happy to work with you in a bid to get this PR into the Nutch codebase.
filePattern="${hadoop.log.dir}/$${date:yyyy-MM}/nutch-%d{yyyy-MM-dd}.log.gz"> | ||
<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" /> | ||
<!--<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />--> | ||
<PatternLayout pattern="%d %p %c [%t] %m%n" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HiranChaudhuri please revert. Thank you
<dependencies> | ||
<dependency org="com.hierynomus" name="smbj" rev="0.13.0"/> | ||
<!-- | ||
These dependencies are either contained in smbj (transitive) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just remove these comments.
@Before | ||
public void setUp() { | ||
LOG.warn("setUp()"); | ||
Assert.fail(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests cannot be committed in the failing state. Thy will destabilize the CI builds.
Previously I suggested using test containers.
If docker is not available on the host machine when tests are being run then we could use the @Testcontainers(disabledWithoutDocker = true) syntax.
[NUTCH-2856] Implement a protocol-smb plugin based on hierynomus/smbj
Draft version of a protocol-smb plugin. Lots of todo comments still,
but it seems to work.