-
Notifications
You must be signed in to change notification settings - Fork 45
Better emphasis #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Better emphasis #77
Conversation
I was getting failures where URLs with underscores were getting <em> blocks thrown in them, creating invalid HTML.
Most don't pass (and are disabled), but should. Unfortunately, adding a word boundary (\b) to the strong regex works for these tests, but somehow the full parser then breaks. The regex that I believed should work is added as a comment, for anyone wishing to make things work going forward.
// match. | ||
|
||
std::string text = "some ___text testing it__ out"; | ||
std::string expected = "some <strong>_text testing it</strong> out"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the CommonMark Spec, there is this example:
markdown "___foo__\n"
html "<p>_<strong>foo</strong></p>\n"
example 456
Also GitHub spec: https://github.github.com/gfm/#example-465
So I guess, the _
in the expected string has to come before the strong tag.
void Parse(std::string& line) override | ||
{ | ||
// Modifed from previous version, with help from | ||
// https://stackoverflow.com/questions/61346949/regex-for-markdown-emphasis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to put this comment into your commit message instead and remove it from here, because as a comment it could become outdated at some point.
// it then passes all the 'disabled' tests in the 'strong parser' | ||
// test, but then it fails general parsing. For some reason, | ||
// "__text__" translates "<i></i>text<i></i>" even though there | ||
// are no word boundaries at the correct places. It's weird! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strong parser is usually handled before the emphasized one: https://github.com/progsource/maddy/blob/master/include/maddy/parser.h#L195
This is so, that double _
or *
can be easier determined as strong. Therefor I guess, that is why you only get italic tags, if you do not first run it through the strong parser.
Thank you for your contribution and sorry for the delay in me checking the PR. |
Description of the change
Don't replace internal underscores with emphasis tags. This was particularly important for URLs, which are often littered with them, and none of them should instead have
<em>
in them; that's invalid HTML.The same change should be made for strong tags (like this) but the same fix there broke overall parsing. Stubs provided; maybe someone can figure out what went wrong.
Checklist for contributor
tools/format.py
)