parsing script files…

C#, Programming, ade's stuff No Comments

this week I’ve been working on a single function for cakepawn (a script editor for the pawn language made in C#) that parses script files. originally it was meant as a bugfix for the previous sloppy method that grabbed functions from files in the “include” directory (for “method insight”, the little tooltip that comes up when you type a function name and a parentheses) but I ended up spending tons of work on it, so now I’m using the same function to parse the current edited document in realtime and show the user’s own functions in a little window.

the old function would lazily search for keywords like “native” “stock” or “public” and grab the function definition from that. however, pawn does not require you to use those words, so you can just define your function with its name and the parameters in parentheses, followed by a block ( { to } ) or one line. the language syntax isn’t following any strict syntax really, which created alot of work. it has a c-style syntax.

so anyway, the first few days i tinkered with some methods to make the parser read the functions like I wanted to, mainly using regular expressions and replacing stuff in the file. what i’d do was to first make a regexp search for the block comments, for example, and just remove them from the string. like this:

string parseStr = textBox1.Text; //remove block comments
string blockComments = “\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/”;
Regex re1 = new Regex(blockComments);
parseStr = re1.Replace(parseStr, “”);

Then i’d do the same for line comments, string/char literals etc. It took me a good while to figure out the expressions for that stuff, but I finally got it working.

Here’s what the function finder looks like.

functionDefs = “[\\r\\n]\\s*(native |stock |public |static |forward |.+:)*\\s*@?\\w+?\\s*\\([^\\)]*\\)”;

urgh… by the way, before I searched for functions, i removed all the text inside compound blocks ({…}), since method definitions are outside any blocks. i didn’t use regexp’s for that though, i just nested two while-loops and removed the text as I went along, removing top-level blocks one by one.

ok so I had all this working, there was only one problem, it was waaaay to slow. after I added the real-time parsing I loaded up my own main script, it’s about 4000 lines long, and the slowdown of the interface was noticeable. so I installed a HighPerformanceTimer and did some benchmarking. it took about 0.88 seconds to parse the file.

the next sessions of worktime I spent optimizing the algorithm and fixing bugs. there was alot of little bugs with the parsing, due to the complexity of parsing such a loosely formatted file. the first thing I did to optimize the algorithm was to make a list (System.Collections.Generic.List) of two-integer objects that kept track of the start and stop positions of removed text, instead of actually removing the text (and thus rebuilding the string tons of times). I later changed the list to a simple bool-array, one bool for every character position that said True if the character was marked as removed.

then I made a custom indexOf (simple string search) function that also checked if characters were removed. then I rewrote some regexps to custom iterations instead, because it was faster.

my first test run gave a whopping total time of 18 seconds (hrrm) but I believed in my code and fixed alot of bugs and stuff and soon it was below one second :) then I added in some more timers to check and optimize every part of the function and kept making it faster and faster. the result was good!

old time: 0.88 seconds – new time: 11 milliseconds! I could spend more time making it even better, but I don’t need it to right now. a file of 20000+ lines takes about 75 millisec to parse and spit out all method/#define definiton names and positions, and I don’t need to parse the file too often, so it should be OK.

here’s the class as it looks at the time of writing, in case you need any part of it. LibraryFile.cs
you can also download the cakepawn source to get all files (new version not up yet). http://ade.se/projects/cakepawn