Txtmark - Java markdown processor

Txtmark is yet another markdown processor for the JVM.

It is easy to use:

String result = txtmark.Processor.process("This is ***TXTMARK***");

It is fast (see below)
... well, it is the fastest markdown processor on the JVM right now.
It does not depend on other libraries, so classpathing txtmark.jar is sufficient to use Txtmark in your project.

For an in-depth explanation of the markdown syntax have a look at daringfireball.net.

Where Txtmark is not like Markdown

Txtmark does not produce empty title attributes in link and image tags.
Unescaped " in link titles starting with " are not recognized and result in unexpected behaviour.
Due to a different list parsing approach some things get interpreted differently:
```
* List
> Quote
```
will produce when processed with Markdown:
```
<p><ul>
<li>List</p>

<blockquote>
 <p>Quote</li>
</ul></p>
</blockquote>
```
and this when produced with Txtmark:
```
<ul>
<li>List<blockquote><p>Quote</p>
</blockquote>
</li>
</ul>
```
Another one:
```
* List
====
```
will produce when processed with Markdown:
```
<h1>* List</h1>
```
and this when produced with Txtmark:
```
<ul>
<li><h1>List</h1>
</li>
</ul>
```

Txtmark extensions

To enable Txtmark's extended markdown parsing you can use the PROFILE mechanism:

[$PROFILE$]: extended

This seemed to me as the easiest and safest way to enable different behaviours. (All other markdown processors will ignore this line.)

Behavior changes when using `[$PROFILE$]: extended`

Lists and code blocks end a paragraph (inspired by Actuarius)

In normal markdown the following:

This is a paragraph
* and this is not a list

will produce:

<p>This is a paragraph
* and this is not a list</p>

When using Txtmark extensions this changes to:

<p>This is a paragraph</p>
<ul>
<li>and this is not a list</li>
</ul>

More to come ...

Markdown conformity

Txtmark passes all tests inside MarkdownTest_1.0_2007-05-09 except of two:

Images.text

Fails because Txtmark doesn't produce empty 'title' image attributes.
(IMHO: Images ... OK)
Literal quotes in titles.text

What the frell ... this test will continue to FAIL.
Sorry, but using unescaped " in a title which should be surrounded by " is unacceptable for me ;)

Change:
```
Foo [bar](/url/ "Title with "quotes" inside").
[bar]: /url/ "Title with "quotes" inside"
```
to:
```
Foo [bar](/url/ "Title with \"quotes\" inside").
[bar]: /url/ "Title with \"quotes\" inside"
```
and Txtmark will produce the correct result.
(IMHO: Literal quotes in titles ... OK)

Performance comparison of markdown processors for the JVM

Based on this benchmark suite.

Test	Actuarius		PegDown		Knockoff		Txtmark
	1st Run (ms)	2nd Run (ms)	1st Run (ms)	2nd Run (ms)	1st Run (ms)	2nd Run (ms)	1st Run (ms)	2nd Run (ms)
Plain Paragraphs	887	461	2455	2236	764	568	89	47
Every Word Emphasized	2220	2077	3411	3406	30503	30514	72	66
Every Word Strong	2384	2270	2456	2466	23639	23577	62	57
Every Word Inline Code	824	804	2337	2237	23506	23622	54	55
Every Word a Fast Link	3942	3738	1164	1159	8621	8595	89	68
Every Word Consisting of Special XML Chars	9393	9312	7544	7314	801	608	3587	3614
Every Word wrapped in manual HTML tags	6843	6828	1850	1859	8699	8692	1169	1154
Every Line with a manual line break	859	724	2968	2946	2171	1990	58	56
Every word with a full link	528	501	2252	2280	3513	3512	66	60
Every word with a full image	395	374	2463	2569	3757	3726	56	55
Every word with a reference link	19208	19035	39183	38710	243450	244943	1826	1798
Every block a quote	465	449	2687	2684	978	977	48	48
Every block a codeblock	151	134	597	601	270	262	36	27
Every block a list	1209	1106	3448	3432	1411	1368	52	60
All tests together	6062	6042	11556	11589	19827	19637	452	448

Q: Why is Txtmark so slow when it comes to XML entities?
A: Because Txtmark does some sanity checks on XML entities to make sure it outputs valid XML. For example:
```
&cutie;
```
will produce (when processed with Markdown and most other markdown processors):
```
&cutie;
```
and
```
&amp;cutie;
```
when processed with Txtmark.

Benchmarked versions:
Actuarius version: 0.2
PegDown version: 0.8.5.4
Knockoff version: 0.7.3-15

Markdown is copyright (c) 2004 by John Gruber
Actuarius is copyright (c) 2010 by Christoph Henkelmann
Knockoff is copyright (c) 2009-2011 by Tristan Juricek
PegDown is copyright (c) 2010 by Mathias Doenitz

Project link: https://github.com/rjeschke/txtmark

6.8 KiB Raw Blame History