-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown-to-Markdown renderer #4
Comments
Thanks for the interest! Unfortunately rendering back to Markdown does require implementing a complete renderer, as the original syntax information is lost in the parsed AST. Such a renderer is certainly planned for mistletoe, though it does require a bit of work. If you're interested at all in implementing this feature yourself, feel free to open a pull request and we'll see how it goes. Otherwise, it would be a planned feature for the next release. |
Thanks for the reply! Cool, that is what I thought. My friend ( @dgroo) and I are going to take a shot at writing the MarkDown renderer (starting from the HTML one), but we're both a little busy right now, so if this is something you are hoping to get done quickly, please let me know :-) |
I'm going to add a "help-wanted" tag to this issue, since I don't think I'd be getting around to this anytime soon. If you're interested in this feature, add your thumps-up to @lhayhurst 's topmost comment. Comment below if you're in a pinch! For potential contributors, take a look at Also a reminder to branch off your changes from the dev branch, not the master branch! |
Has any progress been made on this? I too need a MarkDown renderer for Mistletoe. If there's work in progress then I would be happy to take a look at using that as a starting point and see if I can build something. |
Thank you @nickovs for taking this task on yourself! I think the main difficulty is working through all the edge cases that a Markdown document can contain, and this is partly why I've been putting this issue off. For example: **_foo_** ... should be parsed as: <strong><em>foo</em></strong> But using a naive implementation, e.g., def render_strong(self, token):
return '**{}**'.format(self.render_inner(token))
def render_emphasis(self, token):
return '*{}*'.format(self.render_inner(token)) ... we would have the output: ***foo*** ... which gets parsed as: <em><strong>foo</strong></em> And things get trickier when we have escape characters, which influence the parsing process, but in some cases are not reflected in the abstract syntax tree. I have some thoughts on how to get around this, but it would require some additional work apart from implementing a renderer. What are your thoughts, and what do you think would be your use case for such a renderer? Edit and thank you @huettenhain! |
I've been taking a look at this just now since I have active need for it at work. The use case that I have is that we manage a bunch of processes internally using Markdown wiki pages; some of these pages are generated by humans and some by machine. I need to be able to have code that can add, modify and/or delete content in the sections in the middle of the pages and ideally I'd like to be able to do this in a structured way. I can extract the content but at the moment I can't regenerate the content after editing it. As for thoughts about how to do this, I think that the key piece that is missing is for the renderer for a given token to be able to look back up the stack at the tokens above. This would be fairly easy to do just by having |
Hi, thank you for picking this up! I've been knee-deep in job-work recently and unable to complete the task :-( |
@miyuchina Since you mentioned that this was already planned as a feature for mistletoe, when I send you a pull request would you like me to put this into the |
@nickovs Yes, go ahead and put it in the I'm thinking about adding location information to each token, e.g., a
But adding location information to tokens needs quite a bit of work, so if you want to go through with your method, feel free! |
OK. I have a naive version working for the documents that I care about. I will get it to a state where a parse of the samples in the tests and parses of my rendered rendered versions of the first pass look the same and then I'll send it to you. |
@nickovs no rush of course, but I'd love to include your Markdown renderer in version 0.7.1, which I plan to release this coming weekend. Do you think it can be finished before then, or do you think we should give it more time? |
It looks like I missed the 0.7.1 release window! What I have is somewhat untested but works for my purposes. I’ll send you a PR of what I’ve got when I get back to my computer and you can give me your comments. Sent with GitHawk |
This is information is required, in some capacity, to preserve tokens @nickovs Any progress on your PR? And how does your implementation |
Sorry for the late reply, I've been busy with other commitments for the past half month. Hopefully in the next week or so I can squeeze in some time to work on this feature. I already have two commits on a local branch implementing location information. There are tricky cases, and I still need to think about how they fit together in the Markdown renderer. This is just to say that I'm working on it, and will keep posting updates to this thread. |
Hi, any news on this feature? |
2 years later bump? This is the most promising thread I could find for a |
Heya, just to note https://github.com/executablebooks/markdown-it-py provides a markdown -> markdown render via https://github.com/executablebooks/mdformat |
@chrisjsewell, that looks promising, thanks for the tip. 👍 I think it would help you if you mentioned this, or how to use different renderers (which ones?) generally, somewhere at the top of your docs for markdown-it-py. I've searched through them quickly and I couldn't find much info on that topic. |
Yeh no worries it's on the todo list 😅 executablebooks/markdown-it-py#10 (comment) |
A brief summary and feedback after some time:
So far the use cases presented here, like this one, seem NOT to need any location information? Instead, it should be sufficient (or even required) to know what enclosing characters were used in the input for a given token (which should be relatively easy to do). OTOH location information (BTW a feature freshly requested in #144) would be useful if we wanted to keep the original text 100% untouched (which might be quite a challenge)? Please let me know if I have overlooked anything here.
Unfortunately, it looks like there are no branches or PRs available yet. So we would either have to start from scratch, or to inspire from other projects. ;) |
I'd like to see this too! In particular, to get as close as possible to a bit-perfect roundtrip. The use case would be to use it for translation. I'd be happy to contribute this. Can't make any promises as to when it will be finished, but I've done some research and I think it should be possible. The approach would be to add the necessary information (e.g., if '_' or '*' was used for emphasis) to the tokens, and then create a new renderer class. |
Hello, Sorry, I'm late to the party. I'm working on this feature (no promise at all) for a personal project, and this thread is the closest one I could find on AST → MD, in python. For reference, such renderer as already been coded in js here by @DamonOehlman. Most of the logic can be found here. I will give a try on implementing this. |
@huguesdevimeux, thanks for your contribution to this topic. Just be aware that @anderskaplan is currently probably working on this as well, while also greatly helping us fix many other things "on the way", so I'm not sure how far he actually got with this one (no published branch for this yet?)
Just checked, I can confirm the linked JS renderer does seem like the basic "naive" implementation, i.e. not considering types of headings or strong texts from the original markdown text. As suggested by me and confirmed by @anderskaplan just above, these cases shouldn't be that difficult to cover by extending the AST, not sure about the rest - but I still think we don't need to keep all the original formatting... |
@huguesdevimeux just so you know, I will soon put up a PR for this. I've got it working for everything except tables. As I wrote above, I'm aiming for a near-perfect roundtrip. Some whitespace will be lost, that's inevitable, but apart from that the rendered document should look just like the input. As it happens, this approach solves the problem that @miyuchina mentioned above! But, the PR builds on top of some other PR's, so those will have to go in first. I can publish a draft PR if you'd like to see it, and maybe try it out. Probably sometime later this week. |
Ok, then, perfect. I'm curious to see what you did, though :). |
@huguesdevimeux hi, I've just created a draft PR for this. Please check it out and let me know how it works for you! |
+1 on rendering back to Markdown. :) For my use case, it would be useful if the location of references and footnotes were preserved in the ast. |
Removed the draft status on the PR now. |
@ALL, the PR has been merged into the master branch and it will available in the coming release. 🎉 Testing and feedback are welcome. :) |
(OP here). Amazing! Incredible fortitude seeing this 6.5 year old ticket through to completion. 🥳 |
@anderskaplan @pbodnar Minor remarkConsider this markdown text:
When trying to traverse the ast, I was confused why |
@pbodnar Thank you for the clarification. Markdown Extra and MultiMarkdown have footnotes, but CommonMark and GitHub Flavored Markdown (GFM) do not at this time. You follow CommonMark, so now I understand why my example can have unpredictable behavior. |
Regarding competing, ready-made markdown renderers like this UPDATE: I've just realized the existence of mdformat was already mentioned above. :) |
Hi, great project! I selected it versus the alternatives because I want to render the Markdown back into MarkDown. Is there a simple pass type Renderer that will render it back to its original input form? (My larger use case is a want to edit nodes in the AST to do some programmatic improvements of user entered markdown). Cheers!
The text was updated successfully, but these errors were encountered: