[dev] Possible replacement for Text_Filter_BBCode

Sun Mar 16 23:07:32 UTC 2008

Quoting Chuck Hagenbuch <chuck at horde.org>:

> Aside from XHTML (a laudable goal, but one that could be taken care of
> with tidy also if that's what you need), does this add any features or
> fix bugs compared to the current parser?
>
> To be honest this is a pretty little-used part of Horde, so to add a
> much more complicated parser here, we'd need a pretty big reason.

Good point. Actually there is little gain for simple texts.
The parser comes with a pretty good recognition for free standing links. Also,
we've had a solid record of smileys being parsed within links thus resulting
into inaccessible links. Especially :) and :D showed to be a problem. Since this
parser understands what parts of the text should be inspected for smileys, this
doesn't happen.

One problem I came across with preg_replace based parsers has been that there is a limit concerning how far a tag can span. I have to admit, it takes a few thousend characters. But it happens. This parser doesn't have this issue as the
matches are not spanning. Also, large matches are very, very, very expensive to
process. Just try it out yourself: Take a [url=...] tag and toss in a few kb of
characters before closing it. This parser doesn't suffer from this problem.

Then, since the parser is designed to be quickly expandable, it is in comparison
very flexible. The parser itself doesn't really know the tags and only follows
the following assumptions:
  - Tagnames are not to be handled case sensitive. All of the following examples
    are treated equally:
     [b]foo[/b], [b]foo[/B], [B]foo[/b], [B]foo[/B]
  - Every tag starts with [ and ends with ]
  - Every tag can have an optional parameter
Everything else - including text transformations - is handled by the assigned
classes within the parse tree. See the [code] tag for a good example.

So, to conclude this: This parser is overweight for simple operations. However,
it is the best solution so far for complex tasks such as resolving heavily
nested tags etc (apart from the PECL bbcode extension).

So long,
   Stephan