Treetop Parser

I’ve been doing some work with Coco/R for Ruby lately. I understand that Coco/R is classic. I understand that in some languages, it might be great. But the Ruby implementation of it is a half-finished, hacked-together, piece of crap.

Seriously. Its rubbish.

So, anyway, I’ve spent a couple days at the office trying to get it to parse these, fairly complicated strings of text. An example might look something like…

Some Title Name #5 Vol. 01 Subtitle:Subsubtitle (AAA123456) (More Extra Data) TYLER CVR A

That in itself would not be hard… its the fact that they’re all horribly different. Is data consistency really that freaking hard?! ...But thats a whole other digression.

Anyway… Getting to the point. At RubyConf, Nathan Sobo of Pivotal Labs introduced a parser written in Ruby. It uses a completely different theory than traditional compilers. I haven’t looked into the gory details much, so I can’t really comment.

What I can comment on, however, is the fact that it works really well. I decided to toy around with a bit, before I switch my project at work to it. So, I watched the screencast which Nathan put together, and I got to work…

After maybe 30 minutes of hacking on it, I have what I believe to be a pretty decent CSS parser. It lacks some things at the moment, specifically application… But whatever. Here it is, for your amusement:

grammar Css
  rule stylesheet
    whitespace rule_set* whitespace
  end

  rule rule_set
    whitespace selector+ whitespace '{' whitespace instruction* whitespace '}'
  end

  rule selector
    selector_key whitespace
  end

  rule selector_key
    [a-zA-Z#.:]+
  end

  rule instruction
    instruction_key whitespace ':' whitespace instruction_value ';' whitespace
  end

  rule instruction_key
    [a-z-]+
  end

  rule instruction_value
    [a-z]+
  end

  rule whitespace
    [\s]*
  end
end

I’m sure it can be done more elegantly and more solidly… but for a first pass in 30 minutes, I’m pleased. What strikes me most of all is how easy it is. Easy to get setup, easy to write grammar files for, and easy to use.

It took me significantly longer than this just to get Coco/R running…

blog comments powered by Disqus