txtmark - Java markdown processor

Copyright (C) 2011 René Jeschke rene_jeschke@yahoo.de
See LICENSE.txt for licensing information.


txtmark is yet another markdown processor for the JVM.

  • It is easy to use:

    String result = txtmark.Processor.process("This is ***TXTMARK***");
    
  • It is fast (see below)
    ... well, it is the fastest markdown processor on the JVM right now.

This is a RC version, tagged v0.5

For an in-depth explanation of the markdown syntax have a look at daringfireball.net.

Where Txtmark is not like Markdown


  • Txtmark does not produce empty title attributes in link and image tags.

  • Unescaped " in link titles starting with " are not recognized and result in unexpected behaviour.

  • Due to a different list parsing approach some things get interpreted differently:

    * List
    > Quote
    

    will produce when processed with Markdown:

    <p><ul>
    <li>List</p>
    
    <blockquote>
     <p>Quote</li>
    </ul></p>
    </blockquote>
    

    and this when produced with Txtmark:

    <ul>
    <li>List<blockquote><p>Quote</p>
    </blockquote>
    </li>
    </ul>
    

    Another one:

    * List
    ====
    

    will produce when processed with Markdown:

    <h1>* List</h1>
    

    and this when produced with Txtmark:

    <ul>
    <li><h1>List</h1>
    </li>
    </ul>
    

Txtmark extensions


To enable Txtmark's extended markdown parsing you can use the PROFILE mechanism:

[$PROFILE$]: extended

This seemed to me as the easiest and safest way to enable different behaviours. (All other markdown processors will ignore this line.)

Behavior changes when using [$PROFILE$]: extended

  • Lists and code blocks end a paragraph (inspired by [Actuarius])

    In normal markdown the following:

    This is a paragraph
    * and this is not a list
    

    will produce:

    <p>This is a paragraph
    * and this is not a list</p>
    

    When using Txtmark extensions this changes to:

    <p>This is a paragraph</p>
    <ul>
    <li>and this is not a list</li>
    </ul>
    
  • More to come ...

Markdown conformity


Txtmark passes all tests inside MarkdownTest_1.0_2007-05-09 except of two:

  1. Images.text

    Fails because Txtmark doesn't produce empty 'title' image attributes.
    (IMHO: Images ... OK)

  2. Literal quotes in titles.text

    What the frell ... this test will continue to FAIL.
    Sorry, but using unescaped " in a title which should be surrounded by " is unacceptable for me ;)

    Change:

    Foo [bar](/url/ "Title with "quotes" inside").
    [bar]: /url/ "Title with "quotes" inside"
    

    to:

    Foo [bar](/url/ "Title with \"quotes\" inside").
    [bar]: /url/ "Title with \"quotes\" inside"
    

    and Txtmark will produce the correct result.
    (IMHO: Literal quotes in titles ... OK)

Performance comparison of markdown processors for the JVM


Based on this benchmark suite.

TestActuariusPegDownKnockoffTxtmark
1st Run (ms)2nd Run (ms)1st Run (ms)2nd Run (ms)1st Run (ms)2nd Run (ms)1st Run (ms)2nd Run (ms)
Plain Paragraphs887461245522367645688947
Every Word Emphasized222020773411340630503305147266
Every Word Strong238422702456246623639235776257
Every Word Inline Code8248042337223723506236225455
Every Word a Fast Link3942373811641159862185958968
Every Word Consisting of Special XML Chars939393127544731480160835873614
Every Word wrapped in manual HTML tags68436828185018598699869211691154
Every Line with a manual line break85972429682946217119905856
Every word with a full link52850122522280351335126660
Every word with a full image39537424632569375737265655
Every word with a reference link1920819035391833871024345024494318261798
Every block a quote465449268726849789774848
Every block a codeblock1511345976012702623627
Every block a list1209110634483432141113685260
All tests together6062604211556115891982719637452448
  • Q: Why is Txtmark so slow when it comes to XML entities?

  • A: Because Txtmark does some sanity checks on XML entities to make sure it outputs valid XML. For example:

    &cutie;
    

    will produce (when processed with Markdown and most other markdown processors):

    &cutie;
    

    and

    &amp;cutie;
    

    when processed with Txtmark.

Tested versions:
[Actuarius] version: 0.2
[PegDown] version: 0.8.5.4
[Knockoff] version: 0.7.3-15


[Markdown] is copyright (c) 2004 by John Gruber
[Markdown]: http://daringfireball.net/projects/markdown/ [Actuarius] is copyright (c) 2010 by Christoph Henkelmann
[Actuarius]: http://henkelmann.eu/projects/actuarius/ [Knockoff] is copyright (c) 2009-2011 by Tristan Juricek
[Knockoff]: http://tristanhunt.com/projects/knockoff/ [PegDown] is copyright (c) 2010 by Mathias Doenitz
[PegDown]: https://github.com/sirthias/pegdown


Project link: https://github.com/rjeschke/txtmark

Description
No description provided
Readme 293 KiB
Languages
Java 79.5%
HTML 19.2%
Python 1.3%