Chapter 3
In this chapter we'll look a small language thats helps a declare which people or groups of people are allowed to use something.
Let's take a look at a small declarative part of a language with two keywords: 'Deny', 'Allow'.
Deny baduser
Allow admin
The semantics (or meaning) of the language could be anything. Parsing the language can be really easy with Marpa.
:start ::= rules
rules ::= rule+
rule ::= cmd_type username
cmd_type ~ 'Deny' | 'Allow'
username ~ [\w]+
:discard ~ ws
ws ~ [\s]+
Trying to parse:
Deny baduser
Allow admin
Output:
$VAR1 = [
[
'Deny',
'baduser'
],
[
'Allow',
'admin'
]
];
The output shows that Marpa parsed two rules. The first rule with the
'Deny' keyword and the second with the 'Allow' keyword. It seems that the
keywords could interfere with the usernames. The cmd_type
s are a subset of
the usernames.
Let's see what happens when we change the first username from baduser to Allow. Can Marpa see the difference between keywords and usernames?
Output:
$VAR1 = [
[
'Deny',
'Allow'
],
[
'Allow',
'admin'
]
];
It doesn't matter that the first username matches a cmd_type
. Marpa knows
that the word 'Allow' could also be a username.
Now let's see what happens when we use an input that doesn't match by switching around the username and the keyword. A keyword could be a username, but it can't be the other way around.
I change the input to:
baduser Deny
admin Allow
When we run the examples we get the following output:
Error in SLIF G1 read: No lexemes accepted at position 0
* String before error: baduser
* The error was at line 1, column 8, and at character 0x0020 (non-graphic character), ...
* here: Deny\nadmin Allow\n
Marpa::R2 exception at examples/deny-allow.pl line 33.
Marpa tries to point to the error as best as it can. Marpa starts with the
problem: No lexemes accepted at position 0
. This is Marpa's way of telling us
that Marpa couldn't find a way to match baduser
to one of the expected
lexemes, Deny
or Allow
.
Now let's add a way to specify groups of users. The syntax could look like this.
admins = admin root
Deny baduser
Allow @admins
The first line specifies a list of admin users. The second line stays the same
and the third line contains a reference to the admins
list of users. The @
operators makes it a reference.
We start by changing the input in the file. We run the code and find that it doesn't parse. That's what we expected.
Now we need to add a rule for parsing the admins = ...
line. In this case
admins
is similar to a username so let's use that.
user_list ::= username '=' username username
This line should work. We add it in the grammar. Let's add it to the
rule
rule as an alternative. We use the |
(or) operator.
rule ::= cmd_type username
| user_list
When we run this code we get the following:
Lexing failed at unacceptable character 0x0040 '@'
Marpa::R2 exception at examples/deny-allow2.pl line 37.
Marpa doesn't like the '@' character in the third line. Let's add that.
We add another line to rule
. The reference is like a username.
rule ::= cmd_type username
| cmd_type '@' username
| user_list
When we run this code, the parse succeeds, but the result is not completely right.
[
'Allow',
'@',
'admins'
]
The @
and admins
is parsed in two parts. We like the reference to be a
single thing. Another problem with this version is that there can be spaces
between the '@' and the name after it. This is not what we mean. The '@' and
the name should always be part of the same lexeme. We change this by making a
lexical rule for the list_ref
.
When two tokens should never be separated, make them part of the same lexical rule.
rule ::= cmd_type username
| cmd_type list_ref
| user_list
list_ref ~ '@' username
Now let's run deny-allow3.pl
.
Symbol <username> is a lexeme in G1, but not in G0.
This may be because <username> was used on a RHS in G0.
A lexeme cannot be used on the RHS of a G0 rule.
Marpa::R2 exception at examples/deny-allow3.pl line 6.
This gives us a look into the inner workings of Marpa. This error message says
that we can't use a lexeme in both the structural rules (G1) and lexical rules
(G0). We need to create two lexical rules: one for rule
and user_list
and
one for list_ref
.
rule ::= cmd_type user
| cmd_type list_ref
| user_list
user_list ::= user '=' user user
list_ref ~ '@' username
user ~ username
With these changes we can parse the input without problems. The output shows what we expected.
Even though the code works, it could be that you saw a problem with this. Ask
yourself what would happen if you specify more or less than two users on the
right side of a user_list
? Try it.
The parser doesn't know what to do with the extra user in the list.
Let's change user_list
to allow multiple users.
user_list ::= user '=' users
users ::= user+
This change now allows us to specify multiple users in a user_list. The change
that allows us to specify multiple users on an Allow
or Deny
line is
featured in Exercise 1.1.
With that change this language allows us to easily specify users that are allowed or denied access to a particular thing. The design of the language leaves out the specification of that particular thing to keep the design simple and general.
Multiple users — Change the grammar to allow multiple users on the
Allow
or Deny
line. For example:
Allow root admin
Deny baduser cracker
All users — Add a rule that allows you to specify all
or
'everybody' as a rule that denies or allows everyone.
Implementation — Implement a class Authorization
with a method
CanAccess($username)
that return true if a user is allowed access and false
if the user is denied access. The contructor of the class takes a source string
in the language that we specified above. The rules that the parser creates
should be checked in sequence.