Skip navigation

Monthly Archives: March 2009

So perhaps the most important set of features provided by Fera are those which together provide a cohesive aspect-oriented programming framework. This is still very much a work in progress (I’m in the process of defining and implementing it in the reference compiler). As far as I can find, there are no other languages for .NET which integrate aspect-oriented programming syntactically. There are many aspect-oriented frameworks but none have come close to standardization and I think cross-language AOP won’t be feasible until Microsoft, Mono or some other big player in the CLI world steps up and creates something everyone can agree on. So due to this limitation, Fera can only use aspects written in Fera.

The model chosen is similar to AspectJ and other AOP languages. You define member matching patterns (called ‘pointcuts’) and advice in the form of join methods or error conditions within special types called ‘aspects’. Aspect classes can inherit normal classes or aspects (but classes cannot inherit aspects). Aspects cannot be instantiated directly, as they are singleton classes. All aspects have an Instance property which retrieves this singleton instance.

All pointcuts and advice are named like any other CIL member. Pointcuts can match methods, nested join points in methods, fields, indexers, properties, events and other pointcuts. Join methods reference these pointcuts (or create them anonymously), and attach a join target (one of ‘in’, ‘before’ or ‘after’).

Methods can also define nested join points with the ‘join’ keyword, which can be matched by pointcuts like anything else. These join points have signatures just like methods, and data is passed from the method to the join point for consumption by join advice. Example:

public void Blah ():
    join Inner (int)
{
   join Inner (3);
}

A pointcut which matches the Inner nested point looks like this:

pointcut BlahInner {
   method void %.Blah().Inner (int);
}

The % character is used as a wildcard within pointcut match clauses. When targetting properties/fields/events with a pointcut, you can match individual events for those types. Field and property accepts ‘get’ and ‘set’, events accept add/remove. The parameters specified in method/indexer matches have special properties. You can use the ‘…’ symbol as a parameter to specify that there is zero or more arbitrary parameters. You can only have one ‘…’ in a pointcut match definition (this isn’t an arbitrary limitation, parameter matching becomes very ambiguous when this is not the case). The same pattern matching is available for the types of the parameters, but parameters using type pattern matching cannot be captured within join advice.

A join advice definition looks like this:

join (int i):
   BlahInner (int i)
{
   // do something here
}

This particular join matches all items of the BlahInner pointcut, adding a more specific version of the parameter list which overrides the one described within the BlahInner pointcut. I say “more specific” intentionally, you cannot attach an overridden parameter list which would not have been matched by the original pointcut. As you see, a name is attached to the int parameter specified in the BlahInner pointcut. This is called parameter capturing. The name (which is the same as the real parameter ‘int i’) is linked so that calls to the join from items matching BlahInner are provided with the method’s first int argument.

You can also define error/warning advice which causes compile-time errors/warnings respectively when a pointcut matches something. It is also possible to filter these errors based on the caller information.

All types/members created for aspect-related stuff is marked with attributes from the Fera.Runtime.Aspects namespace. Aspect classes are marked with AspectAttribute, Pointcuts are generated as constant null CIL fields with PointcutAttribute and a set of PointcutMatchAttributes. Join methods become normal methods with JoinAdviceAttribute and a set of PointcutMatchAttributes.

I’ve got the syntactic elements defined within the Fera grammar, and the IR representation contains the proper structures to support them. Next I need to work on the resolution of name patterns as well as resolving non-pattern builtin types into their fully qualified equivalents. Then I must create the aspect application code, which, to do efficiently, will probably require having a list of the aspects available for when each method is compiled. (that way you don’t start with an join() method and then search the entire type system to find items which match it, the opposite is much more efficient). More info later.

Over the last two days I implemented exception handling support in Fera. This was quite challenging, as the model used for try/catch/fault/finally blocks in CIL is rather cumbersome, and requires a lot of love and attention to get working properly for all cases. I figured I’d write about some of the things I’d learned.

First, CIL represents “protected” blocks via a special metadata table associated with each method body. Each protected block directive is an entry in this table. It’s first worth noting that each entry can only specify a singleĀ handler, and that handler mustĀ be one of catch, finally, filter, or fault. C# only exposes catch and finally, though it’s possible they may make use of the other two in some manner. Each try block is defined by a range of instructions, specified by their offsets. The ‘start’ part of the try block is the first protected instruction in the block, while the ‘end’ part of the try block is the instruction immediately proceeding the last instruction in the block. This is important to note, as setting the end instruction of a try block to the tailing ‘leave’ instruction which routes control flow to where it should go will result in verification errors from peverify/ilverify, and may even cause the runtime to emit an InvalidProgramException.

In addition, it is illegal to have more than one type of handler per try block, meaning something like:

try {
   // x x x 
} catch (Exception) {
   // x x x
} finally {
   // x x x 
}

must be split into a pair of try blocks like this:

try {
   try {
      // x x x 
   } catch (Exception) {
      // x x x
   }
} finally {
   // x x x 
}

Not doing this results in an InvalidProgramException when the compiled program is run.

Another quirk is the behavior of the ‘ret’ instruction within a protected block. If used, the finally handlers will not run (and more than likely you’ll get runtime complaints, in addition to verification errors). When returning inside a protected block, the ‘leave’ instruction should be used, with a target instruction which is at the tail of the method. When ‘leave’ executes, control is transferred to the finally block first.

If the method you’re compiling has a return value, the value to return must be saved in an intermediate variable, because transfer to the finally block will clear/change or otherwise mess up the stack. So to put these ‘return’ requirements together, it is necessary for the code controlling the method compilation to be aware of whether the method contains any try blocks which contain ‘return’ statements, unless you don’t mind generating some useless instructions that are unreachable when there aren’t any ‘return’ statements that use it.

On another note, I implemented a new part of the code flow analysis which checks for paths that don’t return when the method return type is not void. This doesn’t require dominators or anything crazy, just a basic code flow graph. I just winged it really but the algorithm’s time efficiency is O(nb) where n = the maximum depth of the code flow graph (ignoring recursions) and b = the average number of branches in a boundary block. The algorithm employs batch processing instead of recursion, so the memory requirements are very slightly higher, namely O(2n + b) worst case. It might be more efficient to use recursion, but managing to avoid following loops would be more difficult, and it would be harder to track for debugging. I’m happy with the one I’ve got, it required very little work to get working properly (causing no regressions even!)

Welcome to the new f(e) hosted on wordpress. I’ve imported all the old posts too, and I must say WP has made it a breeze!

I’ve managed to get more than 100 of fera’s testcases to pass, now! The exact number is 125 / 127 testcases (the remaining two are generics1 and doexits1, both features are not implemented yet). I finished implementing variable parameters and a lot of constructor / object-oriented stuff like base/this constructor calls.

Since it was inherently easy thanks to the way I built the compiler, it is currently possible to call the base version of a virtual method via the constructor-like base() syntax:

public override method void Write (string s):
   base (s)
{
   // x x x
}

This is less to type than using ‘base.Write (s)’ but is limited to situations where the base version should be called first. It only works for methods that return void right now, the compiler emits an error if this is not the case. Also you can only use ‘base’, the compiler doesn’t let you do ‘this’. I’m still on the fence about whether to keep this.

By the way, this isn’t new but I haven’t mentioned it in any other posts: Fera defines constructors and destructors like so:

public construct ()
{
}

public destruct ()
{
}

This is in sharp contrast to almost every other C-family language which make you type the class name of the class for the method name when defining constructors. Not only does this make typing out a new constructor take less time in most scenarios, but no distraction is necessary if you aren’t sure exactly what the class name is your looking at or you forgot the nuances of it’s naming.

I investigated some ideas for less common loop types from the Wikipedia article on control flow, and decided to integrate some into the design of Fera.

The first is the concept of a loop with a condition somewhere in the middle of it’s body. This can already be done in C# (and indeed any language with a while loop) with:

while (true) {
   // x x x
   if (condition) break;
   // x x x
}

In Fera, you can do the same thing with a little less typing and a bit more style:

do {
   // x x x
   break when (condition);
   // x x x
}

This also allows for the programmer to perform infinite loops, though the compiler will emit a warning, letting you know that you’ve forgot to put in some kind of condition for your loop. You can also use the ‘when’ construct with ‘next’ and ‘continue’ where ever they can be used. The compiler simply reverses the construct, so ‘next when (x);’ turns into ‘if (x) next;’.

Originally I went with ‘break if (x);’. Some languages allow if() tails on any instruction, but they are usually not whitespace-agnostic. I looked at this case to decide against it:

if (x)
   doSomething ();
else
   doSomethingElse ()      // OOPS! forgot my semicolon!

if (x)
   doSomethingTotallyDifferent ();

Forgetting semicolons happens all the time, and this error wouldn’t even be acknowledged by the compiler, making this bug very hard to find. Adding the ‘when’ keyword avoids this ambiguity, and fits in nicely with the do{} loop syntax. I’ll be further investigating whether it’s a good idea to allow this for all instructions (or most anyway).

There are some specific error conditions for the infinite ‘do’ loop; for one, the construct:

do;

causes a compile-time error, not because we disfavor pure infinite loops, but rather because it looks retarded. Also, only certain instructions are allowed to be put as the action of an infinite do loop. Curly brace groups, standalone expressions which have known/unknown side effects, conditionals, or loops are allowed. So this construct ‘do echo “annoying!”;’ is thankfully not allowed.

If you want a pure infinite loop, just use ‘loop;’. This will always emit a warning, telling you it should never be used in production code (unless you explicitly ignore it, or elevate the warning to be treated like an error).

Another type of loop (which is not implemented yet by my compiler) is do/exits.

var int x = 0;
var int y = 3;
do {
   // x x x
} exits {
   case x > 10:
      // x x x
   case y < 0:
      // x x x
}

This is syntactically very different from similar constructs in other languages, as it reuses the switch() syntax and logic and tosses it in with the do loop. The result is a pretty unobscured syntax that is easy to remember once you’ve learned the nuances of both.

Using ‘break’ in this loop will cause the exits to be evaluated, before control returns outside the loop. Using ‘next’ will also cause the exits to be evaluated, except that a lack of any exits will result in a new loop iteration. Within the ‘exits’ block, the ‘continue’ and ‘break’ work exactly like they do in switch/branch.

Owing to the fact that we use ‘next’ for loops and ‘continue’ for switches, you can use ‘next’ within an exit block to have the loop return to iteration. This is a unique property among the loop types that Fera offers, something which must be done with if() in C#.

Naturally the test casing marathon continues. Currently there are a total of 90 tests within the test suite, and 71 of them pass. 10 are expected fails, and the other 9 are regressions. I tested many of the earliest work on this project by hand, so now that I’ve added testcases for them, I discovered many bugs in the Fera grammar and of course the compiler itself.

One of the most perplexing problems was something I thought was happening in the compiler. The problem was happening in the do/while testcases which used a compound condition like so:

var int x = 0;

do {
   // x x x
   ++x;
} while (x > 0 && x < 5);

This particular formulation ensures that the compiler isn’t creating the FWhileLoop instruction which represents the loop with PostCondition = false. Doing so would result in the loop behaving like a while() loop, in which case it would never do the loop action.

The problem was that it was running infinitely, and there is a Console.Write(x) within it, which resulted in an infinite sea of digits across the console window.

I quickly figured out it was the compound nature of the condition, so I turned my attention to test ‘if3’, which tested the if() statement with a similar condition. It happened to be ‘x == 3 && y == 4’. The testcase failed because the action of the if() never runs though it should.

Inspecting the CIL showed the comparison code to be something like this:

0  ldloc.0     ; load x to the stack
1  ldc.i4.3    ; load constant zero to the stack
2  dup         ; save the top value
3  brfalse 8   ; skip the second condition if false
4  pop         ; pop the unused save value
5  ldloc.1     ; load y
6  ldc.i4.4    ; load 4
7  ceq         ; compare equality
8  nop         ; label

Naturally, there should be a ceq inserted at #2. The way it is now, the first condition would always end up being true. It looked like the FComparison expression operand was not outputting ‘ceq’. However upon inspecting that code it was fine and I was able to verify that it in fact was running the branch that emitted the instruction.

Only after much head scratching did I realize that the ‘ceq’ was there, but was far later in the body then it should be. At a hunch I checked the token listing for the file being compiled and sure enough, it was treating the expression like this:

x == (3 && y == 4)

Which would cause a more complete compiler to yelp but not here. So the tree of expression operands looked like:

FComparison
 | FVariableAccess
 | FBinaryBoolean
 |  | FConstant
 |  | FComparison
 |  |  | FVariableAccess
 |  |  | FConstant

I had to add a new level in the expression grammar called ‘CValue’ which is comprised of a ‘Value’ term and an optional set of ComparisonTails. Binary logic operations are handled in the level above it, causing comparisons to be treated as children of the binary logic operators. The tree thus became:

FBinaryBoolean
 | FComparison
 |  | FVariableAccess
 |  | FConstant
 | FComparison
 |  | FVariableAccess
 |  | FConstant

Pretty.

For people who don’t care about multilingual support in their grammars, a token type like this might be sufficient:
IDENTIFIER = <<[A-Za-z][A-Za-z0-9_]*>>

I wanted to expand this to support more than just the basic latin alphabet, hoping it would be as easy as reformulating this expression into character classes, now that Grammatica 1.5 apparently supports unicode regular expressions. First I tried something like:

IDENTIFIER = <<[[:alpha:]][[:alnum:]]*>>

However it appears that Grammatica entirely ignores this type of structure, instead treating it like a character set composed of :s and the letters a, l, p, h, a, etc. So I found out about the \p{Class} formulation for property classes…

IDENTIFIER = <<[\p{L&}][\p{L&}]*>>

The \p formulations like \p{L&} for these weren’t working until I consulted Java’s own set of property classes, which list Alpha and Alnum, so it turned into:

IDENTIFIER = <<[\p{Alpha}][\p{Alnum}]*>>

This made it past the grammar build, targetting .NET for the tokenizer code. But when the compiler runs, from all appearances I gather that it is using .NET’s regular expression library, which does not accept {Alpha} and {Alnum} but prefers the Unicode block names instead like {L&}.

At a prime moment of head scratching I came up with a solution (though it’s not as pretty as the formulations above:

IDENTIFIER = <<[\p{Ll}\p{Lu}\p{Lt}][\p{Ll}\p{Lu}\p{Lt}\p{Nd}]*>>

This makes it through grammar compilation and parsing, matching the correct input. It matches any alphabetic letter (upper, lower, title case) as the first character, and any alphanumeric character for the rest.

The inability to use the Java style classes looks like a bug/oversight in the C# port of Grammatica, unless I miss a more elegant way to pull this off?

In any case it works, so that’s what I’m using. Grammatica has a very long release cycle and they *just* released version 1.5. It’s unlikely there will be a new version for quite awhile. So for anyone who needs Unicode-level identifier tokens, your welcome!