mirror of
https://github.com/lucaspalomodevelop/txtmark.git
synced 2026-03-12 23:37:22 +00:00
365 lines
12 KiB
Markdown
365 lines
12 KiB
Markdown
# Txtmark - Java markdown processor
|
|
Copyright (C) 2011-2015 René Jeschke <rene_jeschke@yahoo.de>
|
|
See LICENSE.txt for licensing information.
|
|
|
|
***
|
|
|
|
### Txtmark is yet another markdown processor for the JVM.
|
|
|
|
* It is easy to use:
|
|
|
|
String result = txtmark.Processor.process("This is ***TXTMARK***");
|
|
|
|
* It is fast (see below)
|
|
... *well, it is the fastest markdown processor on the JVM right now.*
|
|
(This might be outdated, but txtmark is still flippin' fast.)
|
|
|
|
* It does not depend on other libraries, so classpathing `txtmark.jar` is
|
|
sufficient to use Txtmark in your project.
|
|
|
|
For an in-depth explanation of markdown have a look at the original [Markdown Syntax].
|
|
|
|
***
|
|
|
|
### Maven repository
|
|
|
|
Txtmark is available on [maven central](http://search.maven.org/#search|ga|1|txtmark).
|
|
|
|
***
|
|
|
|
### Txtmark extensions
|
|
|
|
To enable Txtmark's extended markdown parsing you can use the $PROFILE$ mechanism:
|
|
|
|
[$PROFILE$]: extended
|
|
|
|
This seemed to me as the easiest and safest way to enable different behaviours.
|
|
Just put this line into your Txtmark file like you would use reference links.
|
|
|
|
#### Behavior changes when using `[$PROFILE$]: extended`
|
|
|
|
* Lists and code blocks end a paragraph
|
|
|
|
In normal markdown the following:
|
|
|
|
This is a paragraph
|
|
* and this is not a list
|
|
|
|
Will produce:
|
|
|
|
<p>This is a paragraph
|
|
* and this is not a list</p>
|
|
|
|
When using Txtmark extensions this changes to:
|
|
|
|
<p>This is a paragraph</p>
|
|
<ul>
|
|
<li>and this is not a list</li>
|
|
</ul>
|
|
|
|
* Text anchors
|
|
|
|
Headlines and list items may recieve an ID which
|
|
you can refer to using links.
|
|
|
|
## Headline with ID ## {#headid}
|
|
|
|
Another headline with ID {#headid2}
|
|
------------------------
|
|
|
|
* List with ID {#listid}
|
|
|
|
Links: [Foo] (#headid)
|
|
|
|
this will produce:
|
|
|
|
<h2 id="headid">Headline with ID</h2>
|
|
<h2 id="headid2">Another headline with ID</h2>
|
|
<ul>
|
|
<li id="listid">List with ID</li>
|
|
</ul>
|
|
<p>Links: <a href="#headid">Foo</a></p>
|
|
|
|
The ID _must_ be the last thing on the first line.
|
|
|
|
All spaces before `{#` get removed, so you can't
|
|
use an ID and a manual line break in the same line.
|
|
|
|
* Auto HTML entities
|
|
|
|
* `(C)` becomes `©` - ©
|
|
* `(R)` becomes `®` - ®
|
|
* `(TM)` becomes `™` - ™
|
|
* `--` becomes `–` - –
|
|
* `---` becomes `—` - —
|
|
* `...` becomes `…` - …
|
|
* `<<` becomes `«` - «
|
|
* `>>` becomes `»` - »
|
|
* `"Hello"` becomes `“Hello”` - “Hello”
|
|
|
|
* Underscores (Emphasis)
|
|
|
|
Underscores in the middle of a word don't result in emphasis.
|
|
|
|
Con_cat_this
|
|
|
|
normally produces this:
|
|
|
|
Con<em>cat</em>this
|
|
|
|
* Superscript
|
|
|
|
You can use `^` to mark a span as superscript.
|
|
|
|
2^2^ = 4
|
|
|
|
turns into
|
|
|
|
2<sup>2</sup> = 4
|
|
|
|
* Abbreviations
|
|
|
|
Abbreviations are defined like reference links, but using a `*`
|
|
instead of a link and must be single-line only.
|
|
|
|
[Git]: * "Fast distributed revision control system"
|
|
|
|
and used like this
|
|
|
|
This is [Git]!
|
|
|
|
which will produce
|
|
|
|
This is <abbr title="Fast distributed revision control system">Git</abbr>!
|
|
|
|
* Fenced code blocks
|
|
|
|
```
|
|
This is code!
|
|
```
|
|
|
|
~~~
|
|
Another code block
|
|
~~~
|
|
|
|
~~~
|
|
You can also mix flavours
|
|
```
|
|
|
|
Fenced code block delimiter lines do start with at least three of `` or `~
|
|
|
|
It is possible to add meta data to the beginning line. Everything trailing after `` or `~ is then considered meta data. These are all valid meta lines:
|
|
|
|
```python
|
|
~ ~ ~ ~ ~java
|
|
``` ``` ``` this is even more meta
|
|
|
|
The meta information that you provide here can be used with a `BlockEmitter` to include e.g. syntax highlighted code blocks. Here's an example:
|
|
|
|
public class CodeBlockEmitter implements BlockEmitter
|
|
{
|
|
private static void append(StringBuilder out, List<String> lines)
|
|
{
|
|
out.append("<pre class=\"pre_no_hl\">");
|
|
for (final String l : lines)
|
|
{
|
|
Utils.escapedAdd(out, l);
|
|
out.append('\n');
|
|
}
|
|
out.append("</pre>");
|
|
}
|
|
|
|
@Override
|
|
public void emitBlock(StringBuilder out, List<String> lines, String meta)
|
|
{
|
|
if (Strings.isEmpty(meta))
|
|
{
|
|
append(out, lines);
|
|
}
|
|
else
|
|
{
|
|
try
|
|
{
|
|
// Utils#highlight(...) is not included with txtmark, it's sole purpose
|
|
// is to show what the meta can be used for
|
|
out.append(Utils.highlight(lines, meta));
|
|
out.append('\n');
|
|
}
|
|
catch (final IOException e)
|
|
{
|
|
// Ignore or do something, still, pump out the lines
|
|
append(out, lines);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
You can then set the `BlockEmitter` in the txtmark `Configuration` using `Configuration.Builder#setCodeBlockEmitter(BlockEmitter emitter)`.
|
|
|
|
|
|
***
|
|
|
|
### Markdown conformity
|
|
|
|
Txtmark passes all tests inside [MarkdownTest\_1.0\_2007-05-09](http://daringfireball.net/projects/downloads/MarkdownTest_1.0_2007-05-09.tgz)
|
|
except of two:
|
|
|
|
1. **Images.text**
|
|
|
|
Fails because Txtmark doesn't produce empty 'title' image attributes.
|
|
(IMHO: Images ... OK)
|
|
|
|
2. **Literal quotes in titles.text**
|
|
|
|
What the frell ... this test will continue to FAIL.
|
|
Sorry, but using unescaped `"` in a title which should be surrounded
|
|
by `"` is unacceptable for me ;)
|
|
|
|
Change:
|
|
|
|
Foo [bar](/url/ "Title with "quotes" inside").
|
|
[bar]: /url/ "Title with "quotes" inside"
|
|
|
|
to:
|
|
|
|
Foo [bar](/url/ "Title with \"quotes\" inside").
|
|
[bar]: /url/ "Title with \"quotes\" inside"
|
|
|
|
and Txtmark will produce the correct result.
|
|
(IMHO: Literal quotes in titles ... OK)
|
|
|
|
***
|
|
|
|
### Where Txtmark is not like Markdown
|
|
|
|
* Txtmark does not produce empty `title` attributes in link and image tags.
|
|
|
|
* Unescaped `"` in link titles starting with `"` are not recognized and result
|
|
in unexpected behaviour.
|
|
|
|
* Due to a different list parsing approach some things get interpreted differently:
|
|
|
|
* List
|
|
> Quote
|
|
|
|
will produce when processed with Markdown:
|
|
|
|
<p><ul>
|
|
<li>List</p>
|
|
|
|
<blockquote>
|
|
<p>Quote</li>
|
|
</ul></p>
|
|
</blockquote>
|
|
|
|
and this when produced with Txtmark:
|
|
|
|
<ul>
|
|
<li>List<blockquote><p>Quote</p>
|
|
</blockquote>
|
|
</li>
|
|
</ul>
|
|
|
|
Another one:
|
|
|
|
* List
|
|
====
|
|
|
|
will produce when processed with Markdown:
|
|
|
|
<h1>* List</h1>
|
|
|
|
and this when produced with Txtmark:
|
|
|
|
<ul>
|
|
<li><h1>List</h1>
|
|
</li>
|
|
</ul>
|
|
|
|
* List of escapeable characters:
|
|
|
|
\ [ ] ( ) { } #
|
|
" ' . < > + - _
|
|
! ` ^
|
|
|
|
|
|
***
|
|
|
|
### Performance comparison of markdown processors for the JVM
|
|
|
|
**Remarks:** These benchmarks are too old to be of any value. I leave them here as a reference, though.
|
|
|
|
Based on [this benchmark suite](http://henkelmann.eu/2011/01/10/performance_comparison_of_markdown_processor_for_the_jvm).
|
|
|
|
Excerpt from the original post concerning this benchmark suite:
|
|
|
|
> Most of these tests are of course unrealistic: Who would write a
|
|
> text where each word is a link? Yet they serve an important use:
|
|
> It makes it possible for the developer to pinpoint the parts of
|
|
> the parser where there is most room for improvement. Also, it
|
|
> explains why certain texts might render much faster in one
|
|
> Processor than in another.
|
|
|
|
Benchmark system:
|
|
|
|
* Ubuntu Linux 10.04 32 Bit
|
|
* Intel(R) Core(TM) 2 Duo T7500 @ 2.2GHz
|
|
* Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
|
|
* Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)
|
|
|
|
|
|
<table>
|
|
<tr><th>Test</th><th colspan="2">Actuarius</th><th colspan="2">PegDown</th><th colspan="2">Knockoff</th><th colspan="2">Txtmark</th></tr>
|
|
<tr><td></td><td>1st Run (ms)</td><td>2nd Run (ms)</td><td>1st Run (ms)</td><td>2nd Run (ms)</td><td>1st Run (ms)</td><td>2nd Run (ms)</td><td>1st Run (ms)</td><td>2nd Run (ms)</td></tr>
|
|
<tr><td>Plain Paragraphs</td><td>1127</td><td>577</td><td>1273</td><td>1037</td><td>740</td><td>400</td><td>157</td><td>64</td></tr>
|
|
<tr><td>Every Word Emphasized</td><td>1562</td><td>1001</td><td>1523</td><td>1513</td><td>13982</td><td>13221</td><td>54</td><td>46</td></tr>
|
|
<tr><td>Every Word Strong</td><td>1125</td><td>997</td><td>1115</td><td>1114</td><td>9543</td><td>9647</td><td>44</td><td>41</td></tr>
|
|
<tr><td>Every Word Inline Code</td><td>382</td><td>277</td><td>1058</td><td>1052</td><td>9116</td><td>9074</td><td>51</td><td>39</td></tr>
|
|
<tr><td>Every Word a Fast Link</td><td>2257</td><td>1600</td><td>537</td><td>531</td><td>3980</td><td>3410</td><td>109</td><td>55</td></tr>
|
|
<tr><td>Every Word Consisting of Special XML Chars</td><td>4045</td><td>4270</td><td>2985</td><td>3044</td><td>312</td><td>377</td><td>778</td><td>775</td></tr>
|
|
<tr><td>Every Word wrapped in manual HTML tags</td><td>3334</td><td>2919</td><td>901</td><td>896</td><td>3863</td><td>3736</td><td>73</td><td>62</td></tr>
|
|
<tr><td>Every Line with a manual line break</td><td>510</td><td>588</td><td>1445</td><td>1440</td><td>1527</td><td>1130</td><td>56</td><td>56</td></tr>
|
|
<tr><td>Every word with a full link</td><td>452</td><td>246</td><td>1045</td><td>996</td><td>1884</td><td>1819</td><td>86</td><td>55</td></tr>
|
|
<tr><td>Every word with a full image</td><td>268</td><td>150</td><td>1140</td><td>1132</td><td>1985</td><td>1908</td><td>38</td><td>36</td></tr>
|
|
<tr><td>Every word with a reference link</td><td>9847</td><td>9082</td><td>18956</td><td>18719</td><td>121136</td><td>115416</td><td>1525</td><td>1380</td></tr>
|
|
<tr><td>Every block a quote</td><td>445</td><td>206</td><td>1312</td><td>1301</td><td>478</td><td>457</td><td>50</td><td>45</td></tr>
|
|
<tr><td>Every block a codeblock</td><td>70</td><td>87</td><td>373</td><td>376</td><td>161</td><td>175</td><td>60</td><td>22</td></tr>
|
|
<tr><td>Every block a list</td><td>920</td><td>912</td><td>1720</td><td>1725</td><td>622</td><td>651</td><td>55</td><td>55</td></tr>
|
|
<tr><td>All tests together</td><td>3281</td><td>2885</td><td>5184</td><td>5196</td><td>10130</td><td>10460</td><td>206</td><td>196</td></tr>
|
|
</table>
|
|
|
|
##### Benchmarked versions:
|
|
[Actuarius] version: 0.2
|
|
[PegDown] version: 0.8.5.4
|
|
[Knockoff] version: 0.7.3-15
|
|
|
|
***
|
|
|
|
### Mentioned/related projects
|
|
|
|
[Markdown] is Copyright (C) 2004 by John Gruber
|
|
[SmartyPants] is Copyright (C) 2003 by John Gruber
|
|
[Actuarius] is Copyright (C) 2010 by Christoph Henkelmann
|
|
[Knockoff] is Copyright (C) 2009-2011 by Tristan Juricek
|
|
[PegDown] is Copyright (C) 2010 by Mathias Doenitz
|
|
[PHP Markdown & Extra] is Copyright (C) 2009 Michel Fortin
|
|
|
|
***
|
|
|
|
[Markdown Syntax]: http://daringfireball.net/projects/markdown/syntax/ "daringfireball.net"
|
|
[Markdown]: http://daringfireball.net/projects/markdown/
|
|
[SmartyPants]: http://daringfireball.net/projects/smartypants/
|
|
[Actuarius]: http://henkelmann.eu/projects/actuarius/
|
|
[Knockoff]: http://tristanhunt.com/projects/knockoff/
|
|
[PegDown]: https://github.com/sirthias/pegdown/
|
|
[PHP Markdown & Extra]: http://michelf.com/projects/php-markdown/
|
|
[Apache Ant(TM)]: http://ant.apache.org/
|
|
|
|
[repo]: https://github.com/rjeschke/txtmark/ "Txtmark at GitHub.com"
|
|
[tar]: https://github.com/rjeschke/txtmark/tarball/master "branch: master"
|
|
[zip]: https://github.com/rjeschke/txtmark/zipball/master "branch: master"
|
|
|
|
[$PROFILE$]: extended "Txtmark processing information."
|
|
|
|
Project link: <https://github.com/rjeschke/txtmark>
|