Jump to content

One-pass parser

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Brooke Vibber (talk | contribs) at 19:38, 17 March 2003 (Alternate hard and soft layers). It may differ significantly from the current version.

Our current wikitext parser goes through many passes, with many regexps and a few explode()/implode()s. Not only is it kinda slow, it's prone to horrible frightening bugs when different levels interfere with each other (such as the URL-in-URL bugs).

One, or at least fewer, passes would be good.

Magnus has, according to rumour, written such a thing for WINOR. Could it be adapted? (It doesn't appear to handle nested italics & bold properly; may have other problems. Also doesn't touch HTML yet, but that's a separate step really.)

Yes, I did :-)
Actually, I was surprised myself how well it worked, considering the fast hack. Well, I guess writing in C++ is different to PHP after all. I do have "multiple passes", however; the nowiki tags are parsed in and out in additional steps. Also, the whole text is broken into lines and patched together again. Another step would have to be added for HTML proofreading, but is that really our job?
Would it make sense to call a C++-compiled parser from PHP? Or should I try and rewrite it in PHP? I'd prefer to write a Phase IV in C++, though. Magnus Manske 19:14 17 Mar 2003 (UTC)
Sure, Wiki:AlternateHardAndSoftLayers. I don't know offhand how to get php and c++ to talk to each other nicely, but I'm sure there's a way... --Brion VIBBER 19:38 17 Mar 2003 (UTC)