Better emphasis #77

luciansmith · 2025-04-30T23:27:57Z

Description of the change

Don't replace internal underscores with emphasis tags. This was particularly important for URLs, which are often littered with them, and none of them should instead have  in them; that's invalid HTML.

The same change should be made for strong tags (like this) but the same fix there broke overall parsing. Stubs provided; maybe someone can figure out what went wrong.

Checklist for contributor

if you want to be mentioned in the AUTHORS file, you added yourself
added an entry to CHANGELOG.md at "Upcoming"
if any Markdown definition changed, you updated the definition docs
your C++ code change is accommodated with a unit/integration test where it makes sense
your code meets the code format style (clang-format) of the project (tools/format.py)

I was getting failures where URLs with underscores were getting blocks thrown in them, creating invalid HTML.

Most don't pass (and are disabled), but should. Unfortunately, adding a word boundary (\b) to the strong regex works for these tests, but somehow the full parser then breaks. The regex that I believed should work is added as a comment, for anyone wishing to make things work going forward.

progsource · 2025-07-01T23:12:58Z

tests/maddy/test_maddy_strongparser.cpp

+  // match.
+
+  std::string text = "some ___text testing it__ out";
+  std::string expected = "some <strong>_text testing it</strong> out";


In the CommonMark Spec, there is this example:

markdown "___foo__\n" html "_foo\n" example 456

Also GitHub spec: https://github.github.com/gfm/#example-465

So I guess, the _ in the expected string has to come before the strong tag.

progsource · 2025-07-01T23:19:00Z

include/maddy/emphasizedparser.h

  void Parse(std::string& line) override
  {
+    // Modifed from previous version, with help from
+    // https://stackoverflow.com/questions/61346949/regex-for-markdown-emphasis


I would suggest to put this comment into your commit message instead and remove it from here, because as a comment it could become outdated at some point.

progsource · 2025-07-01T23:25:07Z

include/maddy/strongparser.h

+    // it then passes all the 'disabled' tests in the 'strong parser'
+    // test, but then it fails general parsing.  For some reason,
+    // "__text__" translates "<i></i>text<i></i>" even though there
+    // are no word boundaries at the correct places.  It's weird!


The strong parser is usually handled before the emphasized one: https://github.com/progsource/maddy/blob/master/include/maddy/parser.h#L195
This is so, that double _ or * can be easier determined as strong. Therefor I guess, that is why you only get italic tags, if you do not first run it through the strong parser.

progsource · 2025-07-01T23:27:10Z

Thank you for your contribution and sorry for the delay in me checking the PR.

luciansmith added 6 commits April 30, 2025 14:56

Improve the emphasis regex.

9292d3c

I was getting failures where URLs with underscores were getting blocks thrown in them, creating invalid HTML.

Clang fixes; changelog.

c89a983

More clang fixes.

1a12c2c

A clang fix turned out to break stuff!

8e2c44c

Fixed double negative phrasing.

30d6cf9

progsource reviewed Jul 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better emphasis #77

Better emphasis #77

Uh oh!

luciansmith commented Apr 30, 2025 •

edited

Loading

Uh oh!

progsource Jul 1, 2025 •

edited

Loading

Uh oh!

progsource Jul 1, 2025 •

edited

Loading

Uh oh!

progsource Jul 1, 2025

Uh oh!

progsource commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Better emphasis #77

Are you sure you want to change the base?

Better emphasis #77

Uh oh!

Conversation

luciansmith commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Checklist for contributor

Uh oh!

progsource Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

progsource Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

progsource Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

progsource commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luciansmith commented Apr 30, 2025 •

edited

Loading

progsource Jul 1, 2025 •

edited

Loading

progsource Jul 1, 2025 •

edited

Loading