Chapter 3

A small language for declarations

Top

In this chapter we'll look a small language thats helps a declare which people or groups of people are allowed to use something.

Keywords

Let's take a look at a small declarative part of a language with two keywords: 'Deny', 'Allow'.

Deny baduser
Allow admin

The semantics (or meaning) of the language could be anything. Parsing the language can be really easy with Marpa.

:start        ::= rules
rules         ::= rule+

rule          ::= cmd_type username

cmd_type        ~ 'Deny' | 'Allow'
username        ~ [\w]+

:discard        ~ ws
ws              ~ [\s]+

examples/deny-allow.pl

Trying to parse:
Deny baduser
Allow admin

Output:
$VAR1 = [
    [
        'Deny',
        'baduser'
    ],
    [
        'Allow',
        'admin'
    ]
];

The output shows that Marpa parsed two rules. The first rule with the 'Deny' keyword and the second with the 'Allow' keyword. It seems that the keywords could interfere with the usernames. The cmd_types are a subset of the usernames.

Let's see what happens when we change the first username from baduser to Allow. Can Marpa see the difference between keywords and usernames?

Output:
$VAR1 = [
    [
        'Deny',
        'Allow'
    ],
    [
        'Allow',
        'admin'
    ]
];

It doesn't matter that the first username matches a cmd_type. Marpa knows that the word 'Allow' could also be a username.

Now let's see what happens when we use an input that doesn't match by switching around the username and the keyword. A keyword could be a username, but it can't be the other way around.

I change the input to:

baduser Deny
admin Allow

When we run the examples we get the following output:

Error in SLIF G1 read: No lexemes accepted at position 0
* String before error: baduser
* The error  was at line 1, column 8, and at character 0x0020 (non-graphic character), ...
* here:  Deny\nadmin Allow\n
Marpa::R2 exception at examples/deny-allow.pl line 33.

Marpa tries to point to the error as best as it can. Marpa starts with the problem: No lexemes accepted at position 0. This is Marpa's way of telling us that Marpa couldn't find a way to match baduser to one of the expected lexemes, Deny or Allow.

More syntax

Now let's add a way to specify groups of users. The syntax could look like this.

admins = admin root
Deny baduser
Allow @admins

The first line specifies a list of admin users. The second line stays the same and the third line contains a reference to the admins list of users. The @ operators makes it a reference.

We start by changing the input in the file. We run the code and find that it doesn't parse. That's what we expected.

Now we need to add a rule for parsing the admins = ... line. In this case admins is similar to a username so let's use that.

user_list  ::= username '=' username username

This line should work. We add it in the grammar. Let's add it to the rule rule as an alternative. We use the | (or) operator.

rule    ::= cmd_type username
          | user_list

When we run this code we get the following:

Lexing failed at unacceptable character 0x0040 '@'
Marpa::R2 exception at examples/deny-allow2.pl line 37.

Marpa doesn't like the '@' character in the third line. Let's add that. We add another line to rule. The reference is like a username.

rule          ::= cmd_type username
                | cmd_type '@' username
                | user_list

examples/deny-allow2.pl

When we run this code, the parse succeeds, but the result is not completely right.

[
    'Allow',
    '@',
    'admins'
]

The @ and admins is parsed in two parts. We like the reference to be a single thing. Another problem with this version is that there can be spaces between the '@' and the name after it. This is not what we mean. The '@' and the name should always be part of the same lexeme. We change this by making a lexical rule for the list_ref.

When two tokens should never be separated, make them part of the same lexical rule.

rule          ::= cmd_type username
                | cmd_type list_ref
                | user_list

list_ref        ~ '@' username

examples/deny-allow3.pl

Now let's run deny-allow3.pl.

Symbol <username> is a lexeme in G1, but not in G0.
  This may be because <username> was used on a RHS in G0.
  A lexeme cannot be used on the RHS of a G0 rule.
Marpa::R2 exception at examples/deny-allow3.pl line 6.

This gives us a look into the inner workings of Marpa. This error message says that we can't use a lexeme in both the structural rules (G1) and lexical rules (G0). We need to create two lexical rules: one for rule and user_list and one for list_ref.

rule          ::= cmd_type user
                | cmd_type list_ref
                | user_list

user_list  ::= user '=' user user

list_ref        ~ '@' username
user            ~ username

examples/deny-allow4.pl

With these changes we can parse the input without problems. The output shows what we expected.

Multiple users

Even though the code works, it could be that you saw a problem with this. Ask yourself what would happen if you specify more or less than two users on the right side of a user_list? Try it.

The parser doesn't know what to do with the extra user in the list. Let's change user_list to allow multiple users.

user_list  ::= user '=' users
users      ::= user+

examples/deny-allow5.pl

This change now allows us to specify multiple users in a user_list. The change that allows us to specify multiple users on an Allow or Deny line is featured in Exercise 1.1.

With that change this language allows us to easily specify users that are allowed or denied access to a particular thing. The design of the language leaves out the specification of that particular thing to keep the design simple and general.

Exercise

  1. Multiple users — Change the grammar to allow multiple users on the Allow or Deny line. For example:

    Allow root admin
    Deny baduser cracker
    
  2. All users — Add a rule that allows you to specify all or 'everybody' as a rule that denies or allows everyone.

  3. Implementation — Implement a class Authorization with a method CanAccess($username) that return true if a user is allowed access and false if the user is denied access. The contructor of the class takes a source string in the language that we specified above. The rules that the parser creates should be checked in sequence.

Next step

Chapter 4: Parsing an expression