7.8 KiB
Txtmark - Java markdown processor
Copyright (C) 2011 René Jeschke rene_jeschke@yahoo.de
See LICENSE.txt for licensing information.
Txtmark is yet another markdown processor for the JVM.
-
It is easy to use:
String result = txtmark.Processor.process("This is ***TXTMARK***"); -
It is fast (see below)
... well, it is the fastest markdown processor on the JVM right now. -
It does not depend on other libraries, so classpathing
txtmark.jaris sufficient to use Txtmark in your project.
For an in-depth explanation of the markdown syntax have a look at daringfireball.net.
Build instructions
-
Install Apache Ant(TM)
-
Do
ant releaseand you will find everything you need inside the
releasefolder.
Where Txtmark is not like Markdown
-
Txtmark does not produce empty
titleattributes in link and image tags. -
Unescaped
"in link titles starting with"are not recognized and result in unexpected behaviour. -
Due to a different list parsing approach some things get interpreted differently:
* List > Quotewill produce when processed with Markdown:
<p><ul> <li>List</p> <blockquote> <p>Quote</li> </ul></p> </blockquote>and this when produced with Txtmark:
<ul> <li>List<blockquote><p>Quote</p> </blockquote> </li> </ul>Another one:
* List ====will produce when processed with Markdown:
<h1>* List</h1>and this when produced with Txtmark:
<ul> <li><h1>List</h1> </li> </ul>
Txtmark extensions
To enable Txtmark's extended markdown parsing you can use the PROFILE mechanism:
[$PROFILE$]: extended
This seemed to me as the easiest and safest way to enable different behaviours. (All other markdown processors will ignore this line.)
Behavior changes when using [$PROFILE$]: extended
-
Lists and code blocks end a paragraph (inspired by Actuarius)
In normal markdown the following:
This is a paragraph * and this is not a listwill produce:
<p>This is a paragraph * and this is not a list</p>When using Txtmark extensions this changes to:
<p>This is a paragraph</p> <ul> <li>and this is not a list</li> </ul> -
Auto HTML entities:
(C)becomes©- ©(R)becomes®- ®(TM)becomes™- ™--becomes—- —...becomes…- …<<becomes«- «>>becomes»- »"Hello"becomes“Hello”- “Hello”
Markdown conformity
Txtmark passes all tests inside MarkdownTest_1.0_2007-05-09 except of two:
-
Images.text
Fails because Txtmark doesn't produce empty 'title' image attributes.
(IMHO: Images ... OK) -
Literal quotes in titles.text
What the frell ... this test will continue to FAIL.
Sorry, but using unescaped"in a title which should be surrounded by"is unacceptable for me ;)Change:
Foo [bar](/url/ "Title with "quotes" inside"). [bar]: /url/ "Title with "quotes" inside"to:
Foo [bar](/url/ "Title with \"quotes\" inside"). [bar]: /url/ "Title with \"quotes\" inside"and Txtmark will produce the correct result.
(IMHO: Literal quotes in titles ... OK)
Performance comparison of markdown processors for the JVM
Based on this benchmark suite.
Excerpt from the original post concerning this benchmark suite:
Most of these tests are of course unrealistic: Who would write a text where each word is a link? Yet they serve an important use: It makes it possible for the developer to pinpoint the parts of the parser where there is most room for improvement. Also, it explains why certain texts might render much faster in one Processor than in another.
Benchmark system:
- Ubuntu Linux 10.04 32 Bit
- Intel(R) Core(TM) 2 Duo T7500 @ 2.2GHz
- Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
- Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)
| Test | Actuarius | PegDown | Knockoff | Txtmark | ||||
|---|---|---|---|---|---|---|---|---|
| 1st Run (ms) | 2nd Run (ms) | 1st Run (ms) | 2nd Run (ms) | 1st Run (ms) | 2nd Run (ms) | 1st Run (ms) | 2nd Run (ms) | |
| Plain Paragraphs | 1127 | 577 | 1273 | 1037 | 740 | 400 | 157 | 64 |
| Every Word Emphasized | 1562 | 1001 | 1523 | 1513 | 13982 | 13221 | 54 | 46 |
| Every Word Strong | 1125 | 997 | 1115 | 1114 | 9543 | 9647 | 44 | 41 |
| Every Word Inline Code | 382 | 277 | 1058 | 1052 | 9116 | 9074 | 51 | 39 |
| Every Word a Fast Link | 2257 | 1600 | 537 | 531 | 3980 | 3410 | 109 | 55 |
| Every Word Consisting of Special XML Chars | 4045 | 4270 | 2985 | 3044 | 312 | 377 | 778 | 775 |
| Every Word wrapped in manual HTML tags | 3334 | 2919 | 901 | 896 | 3863 | 3736 | 73 | 62 |
| Every Line with a manual line break | 510 | 588 | 1445 | 1440 | 1527 | 1130 | 56 | 56 |
| Every word with a full link | 452 | 246 | 1045 | 996 | 1884 | 1819 | 86 | 55 |
| Every word with a full image | 268 | 150 | 1140 | 1132 | 1985 | 1908 | 38 | 36 |
| Every word with a reference link | 9847 | 9082 | 18956 | 18719 | 121136 | 115416 | 1525 | 1380 |
| Every block a quote | 445 | 206 | 1312 | 1301 | 478 | 457 | 50 | 45 |
| Every block a codeblock | 70 | 87 | 373 | 376 | 161 | 175 | 60 | 22 |
| Every block a list | 920 | 912 | 1720 | 1725 | 622 | 651 | 55 | 55 |
| All tests together | 3281 | 2885 | 5184 | 5196 | 10130 | 10460 | 206 | 196 |
Benchmarked versions:
Actuarius version: 0.2
PegDown version: 0.8.5.4
Knockoff version: 0.7.3-15
Mentioned/related projects:
Markdown is Copyright (C) 2004 by John Gruber
Actuarius is Copyright (C) 2010 by Christoph Henkelmann
Knockoff is Copyright (C) 2009-2011 by Tristan Juricek
PegDown is Copyright (C) 2010 by Mathias Doenitz
Project link: https://github.com/rjeschke/txtmark