Материал из Wiki.crossplatform.ru

Перейти к: навигация, поиск

As Andreas already pointed out, we got lots of great feedback and suggestions from customers at Developers Days; thanks to everyone who participated and chatted with us. One question I got was “So, how do I use this QLALR thing?” As you might know, QLALR is a parser generator hosted on Trolltech Labs; just download it from there and (in theory you could then) start using it. In Qt, we use QLALR to generate the parser for QtScript and QXmlStream. Even though I’m still in a bit of daze due to jet lag, I’ll now attempt to explain how to get started using QLALR to create a parser for your own super-duper language (or existing language XYZ), and embed it into your Qt application, QtScript-style.

For those of you who haven’t read a few books on compilers and aren’t familiar with similar tools like bison/yacc (and the related flex/lex), the QLALR documentation might feel a bit lackluster. For the rest of you, it probably feels the same. I’m not saying the QLALR docs are bad; I’m just saying they don’t exist. One way to get your feet wet is to have a look at the QLALR grammar (.g) files in the Qt sources; src/script/qscript.g for QtScript, and src/xml/stream/qxmlstream.g for QXmlStream. However, both of these are non-trivial and part of a greater whole, and thus difficult to rip out and use as the basis for your own parser. So I’ve made a simple example that shows the basic setup.

The example is a parser for a small toy language I call Qbicle. You can grab the source code here. To build Qbicle, do the usual qmake and make (you don’t need QLALR installed to try it). Run the example to see the result of evaluating some Qbicle statements. To embed Qbicle in your own application (it’s super-useful, I promise!), include qbicle.pri in your QMake project (.pro) file.

The main Qbicle class is QbicleEngine, whose declaration looks like this:

typedef QHash<QString, QVariant> QbicleEnvironment;
 class QbicleEngine : public QObject
    QbicleEngine(QObject *parent = 0);
    QbicleEnvironment environment() const;
    void setEnvironment(const QbicleEnvironment &env);
    QVariant evaluate(const QString &program);
    int errorLineNumber() const;
    QString errorMessage() const;

The QbicleEngine class is not generated; it just provides a nice API for evaluating Qbicle programs. Internally it uses the parser generated by QLALR to actually implement the parsing and evaluation of Qbicle programs.

There is a single global environment (a mapping from identifiers to QVariants) that can be set on the engine, and Qbicle programs passed to the evaluate() function can access and change this environment. So you can do stuff like:

QbicleEngine engine;
 // initialize environment
 QbicleEnvironment env;
 env["foo"] = 123;
 qDebug() << engine.evaluate("bar = foo + 3;").toDouble(); // 126
 qDebug() << engine.environment(); // foo and bar

The Qbicle parser is defined in qbicle.g. This file is the input to QLALR; executing “qlalr qbicle.g” will generate qbiclegrammar.cpp, qbiclegrammar_p.h, qbicleparser.cpp and qbicleparser.h.The first part of qbicle.g contains definitions of the tokens used in the Qbicle language:

%token T_EQ "="
%token T_PLUS "+"
%token T_MINUS "-"
%token T_LPAREN "("
%token T_RPAREN ")"
%token T_SEMICOLON ";"

… and so on. The parser relies on a lexer (AKA scanner, AKA tokenizer) to break up the input stream (characters) into a sequence of such tokens. The %token definitions in the .g file result in an enum in the generated parser; the lexer uses this enum to communicate back to the parser that a certain token has been recognized. You have to provide the lexer yourself; you can use e.g. flex to generate a lexer from regular expressions, or you can hand-craft your own. Qbicle uses a simple hand-crafted lexer (see qbiclelexer.cpp). (Note that the string that you associate with a token in a %token definition is only used in error messages and for debugging purposes; your lexer is responsible for matching whatever input you want to associate with a token.)

The rest of qbicle.g contains the parser driver, the Qbicle language productions and the associated code to execute when a production has been recognized. The driver uses the tables generated by QLALR to actually implement the parsing. You can typically use the same (or very similar) driver for parsers of different languages (i.e. use qbicle.g as the starting point). The productions are what’s really interesting; they make up the grammar of your language (i.e. the legal form of input programs). Here are a couple of the Qbicle productions:

AdditiveExpression: AdditiveExpression T_MINUS PrimaryExpression ;
case $rule_number: {
   sym(1) = sym(1).toDouble() - sym(3).toDouble();
} break;

LeftHandSideExpression: T_IDENTIFIER ;
case $rule_number: {
    sym(1) = lexer->identifier();
} break;

The code between the /. and ./ markers will be output to the generated parser, and is executed when the parser has matched the preceding production. $rule_number will be replaced by the actual number that represents the production in the grammar; the rest of the code is passed through without modification. The sym() function is used to access the parser’s value stack; in this example, the value stack consists of QVariants. In the Qbicle example, no intermediate representation is generated; expressions are evaluated on the fly. I leave it as an exercise to the reader to implement abstract syntax tree (AST) generation, conditionals, bytecode compilation & interpretation, and Just-In-Time (JIT) compilation on all supported Qt platforms. (For hints on how to do everything except the JIT part, have a look at the QtScript sources.)

For completeness, here’s the grammar for Qbicle in BNF:
Program ::= Statement+

Statement ::= Expression ';'

Expression ::= AssignmentExpression

AssignmentExpression ::= LeftHandSideExpression '=' AssignmentExpression
                       | AdditiveExpression

LeftHandSideExpression ::= T_IDENTIFIER

AdditiveExpression ::= AdditiveExpression '+' PrimaryExpression
                     | AdditiveExpression '-' PrimaryExpression
                     | PrimaryExpression

                    | '(' Expression ')'