Treetop Introductory Tutorial Part 5 of 10 -- A Match made in Patterns
Let’s take another look at the email list sample that we want our program to understand:
"Jena L. Dovie" <jdovie_qs@agora.bungi.com>, <marleen_df@acg-aos.com>; Charmain Lashunda <c.lashunda_mc@promero.com>; "Traci Shauna" <traci_shaunaxp@cs.com>
We might as well start at the beginning, now that we’ve matched the quote. Let’s look at the first email name:
"Jena L. Dovie"
If I asked you, “where is the actual email name?”, you would probably say something like “It’s whatever’s between the double quotes”. And you’d be right. So let’s call the email name with the quotes the “enclosed email name” because it’s enclosed by the double quotes.
Our enclosed email name could be matched by a rule that looks like this:
But now we need to understand something fundamental about computer syntax checkers. Computer syntax checkers are like those poor people who offer instantaneous translation at the UN. They have to start translating, even before they know what the speaker is really going to say. Computer parsers like the one created by Treetop start at the beginning of your text, and they make decisions as they go along without knowing what comes next.
So here is a clearer definition, that’s closer to the way Treetop parsers try to understand your text:
followed by the email name
followed by another double-quote (")
Save your double_quote.treetop
program as email_list.treetop
and edit it to match the following:
# a Treetop Grammar to parse email lists
#
grammar EmailList
rule full_email_address
'"' email_name '"'
end
end
(Yes, there is something wrong with this program. I’m sure you’re smart enough to see it right off. However, let’s use this opportunity to see how Treetop tells us about errors.)
Save your program talk_to_me.rb
as parse_email_list.rb
and change the lines to load the treetop grammar so it loads our new email_list grammar.
Treetop.load 'email_list'
puts 'Loaded email list grammar with no problems...'
parser = EmailListParser.new
Save and run your parse_email_list.rb
program. Notice that it loads correctly. (If you followed my instructions exactly. If not, you know what to fix…)
Enter the following text into your test program:
"Jena L. Dovie"
You get an error message something like this:
(eval):43:in `_nt_full_email_address': undefined local variable or method `_nt_email_name' for #<EmailListParser:0x1005b0580> (NameError)
from /Library/Ruby/Gems/1.8/gems/treetop-1.4.4/lib/treetop/runtime/compiled_parser.rb:18:in `send'
from /Library/Ruby/Gems/1.8/gems/treetop-1.4.4/lib/treetop/runtime/compiled_parser.rb:18:in `parse'
from parse_email_list.rb:17
As usual, not very helpful. But, if we focus on some key information in the error message (with the help of a purple marker):
(eval):43:in `_nt_full_email_address': undefined local variable or method `_nt_email_name' for #<EmailListParser:0x1005b0580> (NameError)
So, what Treetop is trying to tell us, in its arcane way, is that the rule (_nt_
) full_email_address
refers to the rule email_name
— but we never defined any such rule!
That’s easy to fix:
# a Treetop Grammar to parse email lists
#
grammar EmailList
rule full_email_address
'"' email_name '"'
end
rule email_name
[^"]*
end
end
Oops. Completely new concept here. What we have entered for email name is called a Regular Expression. Regular expressions match patterns of text, so it’s like Treetop but much more condensed. The email name pattern [^"]*
can be expressed as the following rule.
And keep matching as long as possible
Some other useful regular expressions you might need in Treetop grammars:
Expression | Meaning |
---|---|
[a-zA-Z] |
match any letter in the range a to z or A to Z (e.g. match 'a', 'e', 'W', 'Z', but don't match '%', '3' or '*'). |
.* |
match any character, any number of them (even none at all!). |
[.]* |
match any number of periods, e.g. '', '.', '..', '...'. |
c+ |
match at least 1 c, e.g. 'c', 'cc', 'ccc'. |
If this is your first encounter with Regular Expressions, accept that it’s going to take you a while to digest them. You can read about Regular Expressions at Regular-Expressions.info. There is even a tutorial there you can try.
Before we go on, let’s fire up our test program and see if our grammar that has the regular expression for email_name works. Save the modified grammar and run your test program. Enter the following text when prompted:
"Jena L. Dovie"
Yes! It understands! But understands what? That’s for our next tutorial.