Skip to content

Conversation

HiranChaudhuri
Copy link
Contributor

[NUTCH-2856] Implement a protocol-smb plugin based on hierynomus/smbj

Draft version of a protocol-smb plugin. Lots of todo comments still,
but it seems to work.

Draft version of a protocol-smb plugin. Lots of todo comments still,
but it seems to work.
@lewismc lewismc marked this pull request as draft October 3, 2024 04:10
@lewismc
Copy link
Member

lewismc commented Oct 3, 2024

Moving this to DRAFT status and acknowledging the PR @HiranChaudhuri thank you.
I will try to perform a full review this coming week... thank you.

Copy link
Member

@lewismc lewismc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @HiranChaudhuri I added quite a few comments for your consideration. Thanks for submitting this PR 👍
Please ping me once your ready and we can go for round # 2 of peer review.

Further out, I think we could implement some testing for this protocol plugin. We could use testcontainers and essentially spin up a local Samba server using @nddipiazza 's smbj-inttest image. We can come back to this one the PR has evolved a bit.

filePattern="${hadoop.log.dir}/$${date:yyyy-MM}/nutch-%d{yyyy-MM-dd}.log.gz">
<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />
<!--<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />-->
<PatternLayout pattern="%d %p %c [%t] %m%n" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change? Does this print the logger name in full?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does print in full as I am otherwise not entirely clear where a message is coming from.
We can revert the change before merge - for the time being on my side I need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HiranChaudhuri please revert. Thank you

Improve error handling
Rename class as requested
Added license header
Improve url parsing
added robots.txt
@HiranChaudhuri HiranChaudhuri requested a review from lewismc October 6, 2024 15:59
Copy link
Member

@lewismc lewismc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of the responses @HiranChaudhuri please see my updated suggestions.

filePattern="${hadoop.log.dir}/$${date:yyyy-MM}/nutch-%d{yyyy-MM-dd}.log.gz">
<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />
<!--<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />-->
<PatternLayout pattern="%d %p %c [%t] %m%n" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@lewismc
Copy link
Member

lewismc commented Oct 16, 2024

Hi @HiranChaudhuri now that you've activated the plugin test target, I suggest that we implement the unit tests using the test-containers utility. An example can be found at https://java.testcontainers.org/quickstart/junit_4_quickstart/.
For example we could use

@Rule
public GenericContainer redis = new GenericContainer(DockerImageName.parse("ndipiazza/smbj-inttest:1.1"))
    .withExposedPorts(445);

Please let me know your thoughts on this.

@HiranChaudhuri
Copy link
Contributor Author

The container looks good. I have no clue about the @rule annotation and am interested to see how this gets combined together.

Copy link
Member

@lewismc lewismc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial header requests. Thanks

@HiranChaudhuri HiranChaudhuri marked this pull request as ready for review October 18, 2024 19:40
Copy link
Member

@lewismc lewismc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HiranChaudhuri
Thanks for coming back to this PR, and thanks for the contribution.
Please see my comments, I am happy to work with you in a bid to get this PR into the Nutch codebase.

filePattern="${hadoop.log.dir}/$${date:yyyy-MM}/nutch-%d{yyyy-MM-dd}.log.gz">
<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />
<!--<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />-->
<PatternLayout pattern="%d %p %c [%t] %m%n" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HiranChaudhuri please revert. Thank you

<dependencies>
<dependency org="com.hierynomus" name="smbj" rev="0.13.0"/>
<!--
These dependencies are either contained in smbj (transitive) or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just remove these comments.

@Before
public void setUp() {
LOG.warn("setUp()");
Assert.fail();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests cannot be committed in the failing state. Thy will destabilize the CI builds.
Previously I suggested using test containers.
If docker is not available on the host machine when tests are being run then we could use the @Testcontainers(disabledWithoutDocker = true) syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants