Treetop Introductory Tutorial Part 10 of 10 -- Summary and Advanced Debugging
Let’s summarize what we’ve learned as well as look at more in-depth debugging techniques to help you write more complex grammars.
Summary of Treetop Features
Comments start with #
, just like Ruby. They capture everything up to the end of the line, just like Ruby.
Grammars are identified by the keyword grammer
, have a CamelCase name, and end with the word end
, like Ruby classes and modules.
Note: you can also have modules in Treetop grammars, just like Ruby. Modules can be used to encapsulate complicated subgrammars. You can use the include
keyword to include other grammars.
Grammars consists of Rules. Each rule starts with the keyword rule
, has a lower-case name, and ends with the word end
, like Ruby methods.
Each rule generates an instance of the class SyntaxNode
, extended with the methods you defined between the curly brackets. Methods to access the named elements (child nodes) are also included.
Some useful methods available to a syntax node:
Method | Description |
---|---|
terminal? |
True if the node doesn't have children. |
non_terminal? |
True if the node does have children. |
text_value |
The text the node has matched. |
elements |
An array of all the children. |
inspect |
Pretty-print the contents of the node and (recursively) of its children. |
A simple program to load a Treetop grammar grammar_name and to parse something with it is:
#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
Treetop.load grammar_name
# or include grammar_name if Polyglot gem installed
parser=GrammarNameParser.new
result=parser.parse(input)
The parse
method returns nil
if the input cannot be parsed by the grammar. In that case parser.failure_reason
can be used to determine where and for what reason the parser failed. (Unless the reason is because the parser could not consume all the input. In that case, you can use parser.index
to determine how far it got.)
The parse method calls the first rule defined in the grammar. You can override this by setting parser.root
.
A rule consists of:
- Terminals (text in quotes)
- Non-Terminals (names of other rules)
- Regular Expressions (usually inside square brackets)
- Iterations (
*
matches 0 or more;+
matches 1 or more,?
matches 0 or 1) - Choices (separated by
/
) - Lookaheads (identified by
!
or&
) - Method Definitions (enclosed in curly brackets)
- and, of course brackets to override the standard order.
Here are some sample rules to illustrate the types of components:
grammar SampleGrammar
rule root_rule
'terminal' non_terminal [regular expression]
end
rule sample_iterations
optional_non_terminal?
'terminal repeated 0 or more times'*
[expression repeated one or more times]+
end
rule choice_sample
a / b tried_if_cant_match_a_and_does_match_b
end
rule bracket_override
(a / b) tried if matches_a_or_matches_b
end
rule lookaheads
matches_me !as_long_as_not_followed_by_me /
matches_me &only_if_followed_by_me
end
rule with_methods
non_terminal {
def sample_method
can_refer_to_by_name(non_terminal)
end
}
end
end
Advanced Debugging Capabilities
The tt
command provided with Treetop allows you to compile your Treetop grammar into Ruby code. The compiler takes your file_name.treetop
file and creates a file_name.rb
file. If you have problems loading your Treetop grammar, use the command-line and tt
to get more details on why your grammar does not compile.
If your grammar does not parse what you think it should parse, you can use the generated Ruby file to figure out what is going on.
A sample rule will generate code like this (at least in Treetop 1.4.4 — this aspect of Treetop is the one most likely to change):
def _nt_double_quote
# (some code about cached info)
if has_terminal?('"', false, index)
r0 = instantiate_node(SyntaxNode,input, index...(index + 1))
@index += 1
else
terminal_parse_failure('"')
r0 = nil
end
node_cache[:double_quote][start_index] = r0
r0
end
You can insert debug statements into this code and use require
in place of Treetop.load
to help you understand how the compiled parser is parsing your text.
Finally, you can start up the interactive interpreter irb
and execute your test code from there. In irb
, you can call Treetop.load; parser = GrammarNameParser.new
to reload your grammar each time you change it. In irb
, you can test out the value of the various syntax nodes and of the parser to determine which grammar will work for you.
If your grammar fails to parse, you can still get a sense of how far it got by examining parser.instance_variable_get :@node_cache
. But by now you’re getting into pretty heavy debugging…
So that’s it! You should have enough knowledge by now to set you on your way. Happy parsing!