One-pass parser: Difference between revisions

Content deleted Content added

Inline

Revision as of 13:55, 25 February 2004

Our current wikitext parser goes through many passes, with many regexps and a few explode()/implode()s. Not only is it kinda slow, it's prone to horrible frightening bugs when different levels interfere with each other (such as the URL-in-URL bugs).

One, or at least fewer, passes would be good.

Magnus has, according to rumour, written such a thing for WINOR. Could it be adapted? (It doesn't appear to handle nested italics & bold properly; may have other problems. Also doesn't touch HTML yet, but that's a separate step really.)

Yes, I did :-)

Actually, I was surprised myself how well it worked, considering the fast hack. Well, I guess writing in C++ is different to PHP after all. I do have "multiple passes", however; the nowiki tags are parsed in and out in additional steps. Also, the whole text is broken into lines and patched together again. Another step would have to be added for HTML proofreading, but is that really our job?

Would it make sense to call a C++-compiled parser from PHP? Or should I try and rewrite it in PHP? I'd prefer to write a Phase IV in C++, though. Magnus Manske 19:14 17 Mar 2003 (UTC)

Sure, Wiki:AlternateHardAndSoftLayers. I don't know offhand how to get php and c++ to talk to each other nicely, but I'm sure there's a way... --Brion VIBBER 19:38 17 Mar 2003 (UTC)

Having written many parsers before, I am poking around and exploring the idea a creating a one or two pass parser for parsing the wiki syntax.

Unlikely, but I wonder if anybody has ported LEX and YACC (Yet another compiler compiler) to PHP? Lex is a tokenizer, YACC takes a context free grammer and parses the source using callbacks to process each language element. See The Lex & Yacc Page. The wiki language is not entirely context free, but there may be workarounds for this.
My son found this link about someones experience creating a parser in PHP: [1]
If some ambitious developer, who knows the full wiki syntax really well, could define the grammer, it would serve as a first step to implementing this project. From a Unix shell do a man rcsfile an example of such a grammer. Even if a one pass parser does not come to fruition, such a description would be an important piece of documentation, and could be used as an aid in ensuring that the current scheme is debugged.
(Asside: The last thing these grammers are good for is helping a human understand a language. Likewise, if not written very carefully the code to do a one pass parse, a state machine in software, can be quite difficult to understand and maintain. It should be very fast.)

NickP 13:55, 25 Feb 2004 (UTC)

@@ Line 10: / Line 10: @@
 :Sure, [[Wiki:AlternateHardAndSoftLayers]]. I don't know offhand how to get php and c++ to talk to each other nicely, but I'm sure there's a way... --[[User:Brion VIBBER|Brion VIBBER]] 19:38 17 Mar 2003 (UTC)
+----
+Having written many parsers before, I am poking around and exploring the idea a creating a one or two pass parser for parsing the wiki syntax.
+*Unlikely, but I wonder if anybody has ported LEX and YACC (''Yet another compiler compiler'') to PHP? Lex is a tokenizer, YACC takes a ''context free grammer'' and parses the source using ''callbacks'' to process each language element. See [http://dinosaur.compilertools.net/ The Lex & Yacc Page]. The wiki language is not entirely context free, but there may be workarounds for this.
+*My son found this link about someones experience creating a parser in PHP: [http://www.phppatterns.com/index.php/article/view/65/1/11]
+*If some ambitious developer, who knows the full wiki syntax really well, could define the grammer, it would serve as a first step to implementing this project. From a Unix shell do a ''man rcsfile'' an example of such a grammer. Even if a one pass parser does not come to fruition, such a description would be an important piece of documentation, and could be used as an aid in ensuring that the current scheme is debugged.<p>(''Asside: ''The last thing these grammers are good for is helping a human understand a language. Likewise, if not written very carefully the code to do a one pass parse, a ''state machine'' in software, can be quite difficult to understand and maintain. It should be very fast.)
+[[User:Nick Pisarro, Jr.|NickP]] 13:55, 25 Feb 2004 (UTC)
+----