Markdown-to-Markdown renderer #4

lhayhurst · 2017-10-04T18:49:22Z

Hi, great project! I selected it versus the alternatives because I want to render the Markdown back into MarkDown. Is there a simple pass type Renderer that will render it back to its original input form? (My larger use case is a want to edit nodes in the AST to do some programmatic improvements of user entered markdown). Cheers!

miyuchina · 2017-10-05T23:15:51Z

Thanks for the interest! Unfortunately rendering back to Markdown does require implementing a complete renderer, as the original syntax information is lost in the parsed AST.

Such a renderer is certainly planned for mistletoe, though it does require a bit of work. If you're interested at all in implementing this feature yourself, feel free to open a pull request and we'll see how it goes. Otherwise, it would be a planned feature for the next release.

lhayhurst · 2017-10-06T17:42:34Z

Thanks for the reply! Cool, that is what I thought. My friend ( @dgroo) and I are going to take a shot at writing the MarkDown renderer (starting from the HTML one), but we're both a little busy right now, so if this is something you are hoping to get done quickly, please let me know :-)

miyuchina · 2018-01-13T17:18:42Z

I'm going to add a "help-wanted" tag to this issue, since I don't think I'd be getting around to this anytime soon. If you're interested in this feature, add your thumps-up to @lhayhurst 's topmost comment. Comment below if you're in a pinch!

For potential contributors, take a look at mistletoe.html_renderer module. It would serve as a good example for writing your own renderer classes, and you will find most token attributes there.

Also a reminder to branch off your changes from the dev branch, not the master branch!

nickovs · 2018-06-20T20:06:43Z

Has any progress been made on this? I too need a MarkDown renderer for Mistletoe. If there's work in progress then I would be happy to take a look at using that as a starting point and see if I can build something.

miyuchina · 2018-06-20T21:15:55Z

Thank you @nickovs for taking this task on yourself! I think the main difficulty is working through all the edge cases that a Markdown document can contain, and this is partly why I've been putting this issue off. For example:

**_foo_**

... should be parsed as:

<strong><em>foo</em></strong>

But using a naive implementation, e.g.,

def render_strong(self, token):
    return '**{}**'.format(self.render_inner(token))

def render_emphasis(self, token):
    return '*{}*'.format(self.render_inner(token))

... we would have the output:

***foo***

... which gets parsed as:

<em><strong>foo</strong></em>

And things get trickier when we have escape characters, which influence the parsing process, but in some cases are not reflected in the abstract syntax tree.

I have some thoughts on how to get around this, but it would require some additional work apart from implementing a renderer. What are your thoughts, and what do you think would be your use case for such a renderer?

Edit and thank you @huettenhain!

nickovs · 2018-06-20T21:35:19Z

I've been taking a look at this just now since I have active need for it at work. The use case that I have is that we manage a bunch of processes internally using Markdown wiki pages; some of these pages are generated by humans and some by machine. I need to be able to have code that can add, modify and/or delete content in the sections in the middle of the pages and ideally I'd like to be able to do this in a structured way. I can extract the content but at the moment I can't regenerate the content after editing it.

As for thoughts about how to do this, I think that the key piece that is missing is for the renderer for a given token to be able to look back up the stack at the tokens above. This would be fairly easy to do just by having BaseRenderer.render() push the token being rendered onto a stack before it makes the call through the render_map and pop it back off afterwards. Doing this would be useful to improve the rendering of nested strong and emphasis and also might make some cases like tables a little easier to keep looking nice.

lhayhurst · 2018-06-20T21:46:33Z

Hi, thank you for picking this up! I've been knee-deep in job-work recently and unable to complete the task :-(

nickovs · 2018-06-20T22:11:20Z

@miyuchina Since you mentioned that this was already planned as a feature for mistletoe, when I send you a pull request would you like me to put this into the mistletoe directory or the contrib directory? It seems to me that it should be core functionality for the library, which would suggest the former.

miyuchina · 2018-06-20T22:38:40Z

@nickovs Yes, go ahead and put it in the mistletoe directory! I like the idea, but for now, if you do end up implementing this, is it okay if you only override the render function in your new renderer? Don't worry too much about writing tests, they can come later.

I'm thinking about adding location information to each token, e.g., a Paragraph knows it has lines 3-6 of the original document, and an Emphasis knows it's characters 12-20. This would potentially help with features like incremental compilation. For implementing MarkdownRenderer, there's a simpler (and faster?) way that allows us to avoid handling edge cases one by one:

if we see an unmodified token, copy the relevant text region from the original document;
if we see a modified token, render according to the new render method.

But adding location information to tokens needs quite a bit of work, so if you want to go through with your method, feel free!

nickovs · 2018-06-20T23:14:30Z

OK. I have a naive version working for the documents that I care about. I will get it to a state where a parse of the samples in the tests and parses of my rendered rendered versions of the first pass look the same and then I'll send it to you.

miyuchina · 2018-06-22T19:54:01Z

@nickovs no rush of course, but I'd love to include your Markdown renderer in version 0.7.1, which I plan to release this coming weekend. Do you think it can be finished before then, or do you think we should give it more time?

nickovs · 2018-06-25T21:54:31Z

It looks like I missed the 0.7.1 release window! What I have is somewhat untested but works for my purposes. I’ll send you a PR of what I’ve got when I get back to my computer and you can give me your comments.

_{Sent with GitHawk}

gruns · 2018-07-04T13:14:39Z

I'm thinking about adding location information to each token, e.g.,
a Paragraph knows it has lines 3-6 of the original document, and an
Emphasis knows it's characters 12-20.

This is information is required, in some capacity, to preserve tokens
with abiguous Markdown representations, like headers, emphasis, list
item prefixes, etc. Without such, there's no way to preserve the
input's character choice. E.g. mistletoe can't know whether to render
the input **Strong** as **Strong** (correct) or __Strong__
(incorrect).

@nickovs Any progress on your PR? And how does your implementation
handle the above situation?

miyuchina · 2018-07-20T15:15:43Z

Sorry for the late reply, I've been busy with other commitments for the past half month. Hopefully in the next week or so I can squeeze in some time to work on this feature.

I already have two commits on a local branch implementing location information. There are tricky cases, and I still need to think about how they fit together in the Markdown renderer. This is just to say that I'm working on it, and will keep posting updates to this thread.

Jyhess · 2019-03-28T14:07:27Z

Hi, any news on this feature?
Like @nickovs we are documenting our project with Markdown, and we need a parser to extract or add some information. Mistletoe is great for parsing, with a data tree easily manipulable (thank for this work). We just need a way to write modified structure.
I don't have time yet to write it by myself, but I can test it and provide feedback.

matthubb · 2021-03-24T20:21:18Z

2 years later bump?

This is the most promising thread I could find for a Markdown -> AST -> Markdown solution, but nothing published so far?

chrisjsewell · 2021-09-18T20:37:52Z

Heya, just to note https://github.com/executablebooks/markdown-it-py provides a markdown -> markdown render via https://github.com/executablebooks/mdformat

pbodnar · 2021-09-18T21:33:15Z

@chrisjsewell, that looks promising, thanks for the tip. 👍 I think it would help you if you mentioned this, or how to use different renderers (which ones?) generally, somewhere at the top of your docs for markdown-it-py. I've searched through them quickly and I couldn't find much info on that topic.

chrisjsewell · 2021-09-18T21:36:57Z

Yeh no worries it's on the todo list 😅 executablebooks/markdown-it-py#10 (comment)

pbodnar · 2022-06-24T19:09:25Z

A brief summary and feedback after some time:

I'm thinking about adding location information to each token, e.g.,
a Paragraph knows it has lines 3-6 of the original document, and an
Emphasis knows it's characters 12-20.

This is information is required, in some capacity, to preserve tokens with abiguous Markdown representations, like headers, emphasis, list item prefixes, etc. Without such, there's no way to preserve the input's character choice. E.g. mistletoe can't know whether to render the input **Strong** as **Strong** (correct) or __Strong__ (incorrect).

So far the use cases presented here, like this one, seem NOT to need any location information? Instead, it should be sufficient (or even required) to know what enclosing characters were used in the input for a given token (which should be relatively easy to do). OTOH location information (BTW a feature freshly requested in #144) would be useful if we wanted to keep the original text 100% untouched (which might be quite a challenge)? Please let me know if I have overlooked anything here.

@nickovs Any progress on your PR? And how does your implementation handle the above situation?

Unfortunately, it looks like there are no branches or PRs available yet. So we would either have to start from scratch, or to inspire from other projects. ;)

anderskaplan · 2022-08-09T20:03:53Z

I'd like to see this too! In particular, to get as close as possible to a bit-perfect roundtrip. The use case would be to use it for translation.

I'd be happy to contribute this. Can't make any promises as to when it will be finished, but I've done some research and I think it should be possible.

The approach would be to add the necessary information (e.g., if '_' or '*' was used for emphasis) to the tokens, and then create a new renderer class.

huguesdevimeux · 2022-09-10T14:06:53Z

Hello,

Sorry, I'm late to the party. I'm working on this feature (no promise at all) for a personal project, and this thread is the closest one I could find on AST → MD, in python.

For reference, such renderer as already been coded in js here by @DamonOehlman. Most of the logic can be found here.
That being said, the issue @miyuchina mentioned is seemingly not fixed by this renderer.

I will give a try on implementing this.

pbodnar · 2022-09-11T08:28:58Z

@huguesdevimeux, thanks for your contribution to this topic.

Just be aware that @anderskaplan is currently probably working on this as well, while also greatly helping us fix many other things "on the way", so I'm not sure how far he actually got with this one (no published branch for this yet?)

For reference, such renderer as already been coded in js here by @DamonOehlman. Most of the logic can be found here.
That being said, the issue @miyuchina mentioned is seemingly not fixed by this renderer.

Just checked, I can confirm the linked JS renderer does seem like the basic "naive" implementation, i.e. not considering types of headings or strong texts from the original markdown text. As suggested by me and confirmed by @anderskaplan just above, these cases shouldn't be that difficult to cover by extending the AST, not sure about the rest - but I still think we don't need to keep all the original formatting...

anderskaplan · 2022-09-12T10:05:27Z

@huguesdevimeux just so you know, I will soon put up a PR for this. I've got it working for everything except tables. As I wrote above, I'm aiming for a near-perfect roundtrip. Some whitespace will be lost, that's inevitable, but apart from that the rendered document should look just like the input. As it happens, this approach solves the problem that @miyuchina mentioned above!

But, the PR builds on top of some other PR's, so those will have to go in first.

I can publish a draft PR if you'd like to see it, and maybe try it out. Probably sometime later this week.

huguesdevimeux · 2022-09-12T13:21:16Z

@huguesdevimeux just so you know, I will soon put up a PR for this. I've got it working for everything except tables. As I wrote above, I'm aiming for a near-perfect roundtrip. Some whitespace will be lost, that's inevitable, but apart from that the rendered document should look just like the input. As it happens, this approach solves the problem that @miyuchina mentioned above!

But, the PR builds on top of some other PR's, so those will have to go in first.

I can publish a draft PR if you'd like to see it, and maybe try it out. Probably sometime later this week.

Ok, then, perfect. I'm curious to see what you did, though :).

anderskaplan · 2022-09-17T12:59:57Z

@huguesdevimeux hi, I've just created a draft PR for this. Please check it out and let me know how it works for you!

mikez · 2022-11-18T09:12:57Z

+1 on rendering back to Markdown. :)

For my use case, it would be useful if the location of references and footnotes were preserved in the ast.
Why: Sometimes, there may be two different lists of footnotes: a notes section [^a], [^b], [^c], ... and a references section [^1], [^2], [^3], akin to how Wikipedia has it.

…sion in issue miyuchina#4.

anderskaplan · 2023-03-18T09:11:17Z

Removed the draft status on the PR now.

pbodnar · 2023-06-10T09:23:56Z

@ALL, the PR has been merged into the master branch and it will available in the coming release. 🎉 Testing and feedback are welcome. :)

lhayhurst · 2023-06-10T09:31:12Z

(OP here). Amazing! Incredible fortitude seeing this 6.5 year old ticket through to completion. 🥳

mikez · 2023-06-10T10:23:14Z

@anderskaplan @pbodnar
🎉 Tested and works as expected. :)

Minor remark

Consider this markdown text:

lorem[^a] ipsum[^b].

## Notes

[^a]: dolor
[^b]: sit amet

When trying to traverse the ast, I was confused why [^a] turns into a LinkReferenceDefinition, but [^b] is turned into a RawText and merged with "ipsum" to ipsum[^b].

pbodnar · 2023-06-13T19:11:22Z

@mikez, thanks for your feedback. :)

Regarding your remark, maybe you could file an issue describing the problem in more detail? Note that mistletoe still doesn't support "classical footnotes" as given your example - see #47.

mikez · 2023-06-13T20:18:32Z

@pbodnar Thank you for the clarification. Markdown Extra and MultiMarkdown have footnotes, but CommonMark and GitHub Flavored Markdown (GFM) do not at this time. You follow CommonMark, so now I understand why my example can have unpredictable behavior.

pbodnar · 2023-12-17T16:07:37Z

Regarding competing, ready-made markdown renderers like this MarkdownRenderer, I've just found out that the markdown-it-py project actually also has one: they have it in a separate Python package called mdformat which can be used on its own, or together with the MarkdownIt API as described here. It would be interesting to compare the 2 renderers...

UPDATE: I've just realized the existence of mdformat was already mentioned above. :)

miyuchina added the feature label Oct 5, 2017

miyuchina changed the title ~~Rendering in Markdown~~ Markdown-to-Markdown renderer Oct 6, 2017

miyuchina added the help wanted label Jan 13, 2018

pbodnar mentioned this issue Jun 24, 2022

question: how to render AST back to a markdown string #148

Closed

This was referenced Jun 26, 2022

Configureable markdown formatting hukkin/mdformat#331

Open

Markdown Renderer executablebooks/markdown-it-py#10

Open

anderskaplan mentioned this issue Sep 17, 2022

Markdown renderer #162

Merged

anderskaplan added a commit to anderskaplan/mistletoe that referenced this issue Jan 2, 2023

Moved the Markdown renderer from contrib to core, according to discus…

b8620af

…sion in issue miyuchina#4.

kim0 mentioned this issue Feb 15, 2023

Arbitrary release notes via regex brave/brave-core#17226

Merged

25 tasks

SomeoneSerge mentioned this issue Mar 14, 2023

Question how to modify ast and write back out as markdown #178

Closed

pbodnar linked a pull request Jun 10, 2023 that will close this issue

Markdown renderer #162

Merged

pbodnar added this to the 1.1.0 milestone Jun 10, 2023

pbodnar closed this as completed Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markdown-to-Markdown renderer #4

Markdown-to-Markdown renderer #4

lhayhurst commented Oct 4, 2017

miyuchina commented Oct 5, 2017

lhayhurst commented Oct 6, 2017 •

edited

Loading

miyuchina commented Jan 13, 2018 •

edited

Loading

nickovs commented Jun 20, 2018

miyuchina commented Jun 20, 2018 •

edited

Loading

nickovs commented Jun 20, 2018

lhayhurst commented Jun 20, 2018

nickovs commented Jun 20, 2018

miyuchina commented Jun 20, 2018

nickovs commented Jun 20, 2018

miyuchina commented Jun 22, 2018

nickovs commented Jun 25, 2018

gruns commented Jul 4, 2018 •

edited

Loading

miyuchina commented Jul 20, 2018

Jyhess commented Mar 28, 2019

matthubb commented Mar 24, 2021

chrisjsewell commented Sep 18, 2021

pbodnar commented Sep 18, 2021

chrisjsewell commented Sep 18, 2021

pbodnar commented Jun 24, 2022

anderskaplan commented Aug 9, 2022

huguesdevimeux commented Sep 10, 2022

pbodnar commented Sep 11, 2022

anderskaplan commented Sep 12, 2022

huguesdevimeux commented Sep 12, 2022

anderskaplan commented Sep 17, 2022

mikez commented Nov 18, 2022

anderskaplan commented Mar 18, 2023

pbodnar commented Jun 10, 2023

lhayhurst commented Jun 10, 2023

mikez commented Jun 10, 2023

pbodnar commented Jun 13, 2023

mikez commented Jun 13, 2023

pbodnar commented Dec 17, 2023 •

edited

Loading

Markdown-to-Markdown renderer #4

Markdown-to-Markdown renderer #4

Comments

lhayhurst commented Oct 4, 2017

miyuchina commented Oct 5, 2017

lhayhurst commented Oct 6, 2017 • edited Loading

miyuchina commented Jan 13, 2018 • edited Loading

nickovs commented Jun 20, 2018

miyuchina commented Jun 20, 2018 • edited Loading

nickovs commented Jun 20, 2018

lhayhurst commented Jun 20, 2018

nickovs commented Jun 20, 2018

miyuchina commented Jun 20, 2018

nickovs commented Jun 20, 2018

miyuchina commented Jun 22, 2018

nickovs commented Jun 25, 2018

gruns commented Jul 4, 2018 • edited Loading

miyuchina commented Jul 20, 2018

Jyhess commented Mar 28, 2019

matthubb commented Mar 24, 2021

chrisjsewell commented Sep 18, 2021

pbodnar commented Sep 18, 2021

chrisjsewell commented Sep 18, 2021

pbodnar commented Jun 24, 2022

anderskaplan commented Aug 9, 2022

huguesdevimeux commented Sep 10, 2022

pbodnar commented Sep 11, 2022

anderskaplan commented Sep 12, 2022

huguesdevimeux commented Sep 12, 2022

anderskaplan commented Sep 17, 2022

mikez commented Nov 18, 2022

anderskaplan commented Mar 18, 2023

pbodnar commented Jun 10, 2023

lhayhurst commented Jun 10, 2023

mikez commented Jun 10, 2023

Minor remark

pbodnar commented Jun 13, 2023

mikez commented Jun 13, 2023

pbodnar commented Dec 17, 2023 • edited Loading

lhayhurst commented Oct 6, 2017 •

edited

Loading

miyuchina commented Jan 13, 2018 •

edited

Loading

miyuchina commented Jun 20, 2018 •

edited

Loading

gruns commented Jul 4, 2018 •

edited

Loading

pbodnar commented Dec 17, 2023 •

edited

Loading