i’m no lisp programmer. Though i’ve used some variant of emacs as my primary editor since 1996, i’ve never bothered to learn lisp (the linga franca of emacs). But today i got a bit more interested in it…
First we need to briefly introduce pegc, a PEG (Parsing Expression Grammar) parser generator library for C (it’s not a code generator). A few months ago i spent some weeks working on pegc, mainly just to see how far i could stretch/push/mold that model using only the C language (it’s been demonstrated many times over in C++ and other high-level languages, but AFAIK pegc is the first C library of its kind). It was an interesting problem with interesting implications and future uses, but not something i really needed “right now.” So, in short, pegc is a beta piece of software which i wrote but which i’ve never actually used (only in pegc test code!), so i haven’t gotten around to figuring out what needs to be refined/changed. So it surprised me when i got an email about pegc today.
With that in mind…
A couple hours ago i got a mail from Zajcev Evgeny (a.k.a. “lg”), a member of the SXEmacs project, where he revealed something pretty impressive. As part of the SXEmacs project he’s integrated pegc with lisp, such that people can write parser generators in lisp, and those parser generators will use pegc for the back end. (His intention is to use it to implement a syntax highlighter.) As an example he says:
yeah, i’ve done this on Emacs Lisp level, you can specify either raw
ruleset like this:(setq pp (pegc-intern-ruleset '((sentence (and (opt article) subject verb (and (opt preposition) (opt article) (or object (and object preposition))))) (article (and spaces (or (str "the") (str "the two") (str "a")))) (preposition (and spaces (or (str "on") (str "at") (str "to") (str "with")))) (subject (and spaces (or (str "man") (str "men") (str "dog") (str "dogs") (str "cat")))) (verb (and spaces (or (str "sat") (str "saw") (str "shot") (str "gave")))) (object (and spaces (or (str "cannon") (str "hat") (str "mat")))) )))
and then use it like:
(pegc-parse (pegc-create-parser “the cat sat on the mat”) pp)
which assumes first rule in ruleset as start symbol
Or otherwise you can use user friendly expression like this:(setq pp #<peg sentence - article? subject verb (preposition? article? (object / object preposition object)) article - spaces ("the" / "the two" / "a") preposition - spaces ("on" / "at" / "to" / "with") subject - spaces ("man" / "men" / "dog" / "dogs" / "cat") verb - spaces ("sat" / "saw" / "shot" / "gave") object - spaces ("cannon" / "hat" / "mat")>
which produces the same interned ruleset as previous raw
Pretty cool, in my humble opinion. i was extremely impressed with what he’d accomplished, but also extremely surprised that pegc could do the things he’s doing. The exchange went like this:
Stephan Beal wrote:
> Holy cow! You got all that working with the existing pegc code??? And
> it works!?!?
yes, actually pegc is quite solid and extensible. I did not modified
a bit in pegc code to implement this. I just created two levels of
abstractions - 1) low level - is direct FFI to pegc code and 2) high
level - is written to omit low level details and just do what user
Just what a software developer wants to hear! :-D
When i started pegc, i never envisioned the possibility of using it as the back-end for parsers in any language other than C. Now that the proof of concept is out there, the possibilities would seem to be pretty limitless. We could theoretically use SWIG to generate bindings for just about any scripting language.
Just what i needed - yet another interesting project to divide my time amongst!
Back to hacking!
—– stephan beal, 23 Dec 2008