Blog Relocated
I’ve now relocated my blog to my personal website (miorroring all the existing content). You can find it here. Please update your feeds list, as this WordPress.com blog will not be containing any new posts.
Playlist Synchronisation for Portable Devices
I have recently been attempting to properly set up synchronisation between Windows Media Player and my portable music player (which happens to be my phone). Though I found that the Windows Media Player synchronisation tool does the job pretty well, it does fail in one respect: it cannot copy over playlist (WPL) files. For me, this was a bit of a nuisance, since I rely very much on playlists to categorise my music collection.
The solution for me was to write my own tool that synchronises a given set of playlists with a portable device that is compatible with WMP (Windows Media Player) – as I believe many devices tend to be. The tool works simply by finding the appropiate place on the device to which to copy the playlist files (a known XML descriptor file on the device should specify this), and then copying over these files, with the locations of the media files updated to point to those on the device.
Naturally, my choice of technology with which to write the thing was .NET/C# – this does mean that it’s not a fully standalone application, though it does only consist of a single EXE. However, thanks to a few particularly convenient features of the language/framework (primarily LINQ to XML), the code was largely trivial to write, and the majority of the ~200 lines is in fact error handling.
You can download the program here. As mentioned, it requires the Microsoft .NET Framework 3.5 (SP1) to run, which is not installed on any current version of Windows by default, so it will need to be downloaded and installed firstly if you don’t yet have it. Also, if anyone is curious to see the code, I may be able to upload that at some point.
The tool should be run from the command line, and would seem to be very straightforward to use. (Run the program with no arguments to see the help information.) An example command line to syncrhonise the playlists in the standard location of your user profile with a portable device on drive F might be:
pps F “C:\Users\username\Music\Playlists”
That’s all it takes. The task should finish within a matter of seconds and then report some general information about the playlists it found and what it managed to successfully synchronised; else return an error message.
NB: If you’re wondering how the synchroniser matches the media files on the device with those in the playlist, I have a small admission to make. Because the directory structure is not guaranteed to be the same on the device as at the location of the source media, the current version simply matches media items by file name. This works perfectly well for me, though there is clearly a caveat. I am looking for an improvement on this method, and while I have a few ideas, I haven’t finalised my decision yet. Any recommendations by someone more knowledgeable on the subject would be appreciated.
Now, this program was designed primarily for my own use, but I did consciously attempt to make it usable with any WMP-compatible portable device, so hopefully people shouldn’t have any major problems using it.
Finally, it would be nice to hear any feedback regarding this little tool of mine, so please feel free to drop me a message (even if it’s just to say you’re using it). If I hear any suggestion for a worthwhile feature to add (or of course a valid bug report), I will gladly update the program.
Code Golf: Evaluating Mathematical Expressions
Yesterday I happened to stumble across a code golf question and for no particular reason (except for perhaps boredom) decided to create my own problem and to post it on StackOverflow for the community to reply with their solutions. It actually turned out to be much more popular than I might have anticipated.
A quick definition of code golf for those who are unaware of this enormous (though really quite enjoyable) time sink:
The objective of code golf is simply to write a program/function that solves a given problem using the fewest possible number of characters. This usually involves clever tricks related to the problem and whatever language you use, followed by heavy obfuscation.
Here is the problem specification, copied from my StackOverflow post:
Write a function that takes a single argument that is a string representation of a simple mathematical expression and evaluates it as a floating point value. A “simple expression” may include any of the following: positive or negative decimal numbers, +, -, *, /, (, ). Expressions use (normal) infix notation. Operators should be evaluated in the order they appear, i.e. not as in BODMAS, though brackets should be correctly observed, of course. The function should return the correct result for any possible expression of this form. However, the function does not have to handle malformed expressions (i.e. ones with bad syntax).
Examples of expressions:
1 + 3 / -8 = -0.5 (No BODMAS) 2*3*4*5+99 = 219 4 * (9 - 4) / (2 * 6 - 2) + 8 = 10 1 + ((123 * 3 - 69) / 100) = 4 2.45/8.5*9.27+(5*0.0023) = 2.68...
Now, my own solution isn’t particularly astounding, but I did get it down to 403 characters, and have since cut off a few more (though haven’t bothered to re-obfuscate it). It is in fact my first proper attempt at code golf, so I don’t consider it too bad.
Here it is, in all its obfuscated ugliness:
float e(string x){float v=0;if(float.TryParse(x,out v))return v;x+=';';int t=0;char o,s='?',p='+';float n=0;int l=0;for(int i=0;i<x.Length;i++){o=s;if( x[i]!=' '){s=x[i];if(char.IsDigit(x[i])|s=='.'|(s=='-'&o!='1'))s='1';if(s==')') l--;if(s!=o&l==0){if(o=='1'|o==')'){n=e(x.Substring(t,i-t));if(p=='+')v+=n; if(p=='-')v-=n;if(p=='*')v*=n;if(p=='/')v/=n;p=x[i];}t=i;if(s=='(')t++;} if(s=='(')l++;}}return v;}
And in a rather more readable form (identical in behaviour):
float Eval(string expr) { float val = 0; if (float.TryParse(expr, out val)) return val; expr += ';'; int tokenStart = 0; char oldState, state = '?', op = '+'; float num = 0; int level = 0; for (int i = 0; i < expr.Length; i++) { oldState = state; if (expr[i] != ' ') { state = expr[i]; if (char.IsDigit(expr[i]) || state == '.' || (state == '-' && oldState != '1')) state = '1'; if (state == ')') level--; if (state != oldState && level == 0) { if (oldState == '1' || oldState == ')') { num = Eval(expr.Substring(tokenStart, i - tokenStart)); if (op == '+') val += num; if (op == '-') val -= num; if (op == '*') val *= num; if (op == '/') val /= num; op = expr[i]; } tokenStart = i; if (state == '(') tokenStart++; } if (state == '(') level++; } } return val; }
The current leading solution in one written in Haskell (a mere 226 chars), with another in Python (237 chars) taking second place. This hardly surprises me – the functional and dynamic languages almost inevitably have more succinct syntax, besides generally being known to be more suitable for creating parsers. (If I hadn’t specified the absence of the BODMAS rules, I would have surely seen a solution containing little more than an “eval” statement!) Interestingly, the top two have both managed to avoid using regex altogether (though other solutions have with some success). In my opinion, it’s worth reading through the question to see how the various languages compare at performing the same task.
Please feel free to reply to the StackOverflow question or this post if you have a unique solution (in any language) that you’d like to share.
Update
I ended up spending just a bit longer on this task, since having seen some of the other solutions, it became pretty clear that I could get the char count down a good deal more. With the help of regex, my new solution stands at 294 characters. That in fact seems to be the winner amongst the set of solutions in C-style languages, so I’m quite pleased. (I have now promised myself not to entertain myself any long with this game, however.)
Here it is in a (relatively) clear form, in case anyone’s interested. (It assumes the System.Text.RegularExpressions namespace has been imported.)
float e(string x) { while (x.Contains("(")) x = Regex.Replace(x, @"\(([^\(]*?)\)", m => e(m.Groups[1].Value).ToString()); float r = 0; foreach (Match m in Regex.Matches("+" + x, @"\D ?-?[\d.]+")) { var o = m.Value[0]; var v = float.Parse(m.Value.Substring(1)); r = o == '+' ? r + v : o == '-' ? r - v : o == '*' ? r * v : r / v; } return r; }
Circular Buffer for .NET
This is a quick announcement that I’ve released my code for an implementation of a generic circular buffer for .NET (written in C# 3.0). The release also includes an implementation of a circular stream, which is a wrapper over a circular buffer of bytes. You can view the project and download the source code over at CodePlex, where I’ve licensed it under the MS-PL, which should hopefully be fairly unrestrictive.
If you’re wondering where this idea came from, I recently came across an interesting use for a circular buffer, and upon finding out that the .NET BCL contained nothing along the lines of a circular buffer (or stream), I decided to implement one myself. I’ve attempted to do everything properly of course (i.e. in the style of the BCL data structures), so hopefully it should be immediately useful to anyone familiar with the concept.
More about this to come later.
My Guide to Science Fiction and Fantasy
Note: This guide is currently partially incomplete. I will try to get around to expanding the summaries at some time in the near future (in particular in the Fantasy section). Not to mention that almost undoubtedly, there will have been one or two books I will have wholly forgotten, to my future disbelief.
This post is a summary of my recommendations regarding fiction; specifically, in the genres of science fiction and fantasy, which have proven to be the types that have typically attracted my interest, though not quite the only types. Certainly, I feel that I have read enough of a variety within these areas now, that I can produce something of a useful guide for anyone interested in finding the real gems of these forms of literature. It is important to realise that the summaries that I have provided are not meant to be synopses or reviews of the books, but rather brief overviews of my personal takes on them, as well as suggestions relating to why you might be interested in them.
So without further preamble, here is my guide. I’ve listed my current favourites (by that I mean both most enjoyable and having greatest creative and literary merit) at the top, with others in a fairly arbitrary order.
Science Fiction
- Dune and the Dune series by Frank Herbert
In my opinion (and seemingly that of many others), perhaps the greatest science fiction novel ever written. The scope and storyline are truly unparalleled by anything else within the genre, and maybe anything else in the entire body of 20th century literature. In a certain respect, this work almost belongs within the fantasy genre because of its nature and some of the motifs – it’s much more of a superb story than superb speculative science, with a bit of philosophy thrown in. Of course, it still very much deserves the categorisation of sci-fi, mainly because of the interstellar travel (run by the monopolistic Spacing Guild) as well as the ecology/terraforming ideas that are involved. Overall, I might think that classifying it as a planetary romance does it maximum justice, though in all fairness there’s no real way to get a decent understanding of its unique style and grand themes without reading the work in its entirity. It is certainly not so-called “hard sci-fi”, but that’s not especially what I’m interested in to be honest, and no-one who has a proper interest in literature should care either. I don’t way to say too much more about this book, since I think everyone with an interest in sci-fi should read this once (and preferably reread it) from a fresh perspective and to experience its wonder for themself. Now if you make it through the original Dune novel (I’m not sure how you can finish it and be anything less than stunned), then I would without hesistance suggest that you also pick up the second and third of the series (Dune Messiah and Children of Dune), which are excellent reads in their own rights, if not actually masterpieces. After that, I’m afraid to say that the standard of writing and storyline declines to a certain degree (with a short resurgence in the latter half of Heretics of Dune). The rest of it does tend to get slightly crazy (not helped by Herbert’s unusual and sometimes slightly opaque writing style), and even verges on becoming somewhat raunchy at times. Still, they are by no means poor works, albeit ones that I would only recommend to serious fans of Frank Herbert. (More precisely, they are still works of high quality, but only to a much diminished subset of people.) So that I don’t end on a seemingly sour note, I shall repeat and reinforce my statement that the the original Dune, written in 1965, is a piece of literature that should not be missed by anyone with an interest in speculative fiction. - Foundation series by Isaac Asimov
The original Foundation is perhaps the first truly epic sci-fi ever created, and to some extent the precursor of Dune and ultimately inspirational to the Star Wars universe (arguably alongside Dune itself). Unlike Herbert’s series, however, this one peaks somewhere in the middle, with both the head and tail ends being only marginally diminished in standard. Although it may not contain a single book that is the equal of Dune, the series as a whole is probably unbeatable. Its pace and sheer scale rarely drops, and there’s always a sense of the unexpected (though a few can be predicted, Ibelieve it was intended very purposely). Again, similarly to Dune, this book is what it is because of its storyline, although there is undoubtedly a greater emphasis on the science, not surprisingly given that Asimov is in fact a scientist (chemist) by training. It contains some genuinitely interesting and imaginative scientific concepts (as well as a few strange and outdated ones, having been written in the 50s), both in terms of hard science (mainly physics and astronomy) and soft science (psychohistory being the main one). The latter is particularly intriguing, as it demonstrates (albeit in superficial detail) an entire new branch of science, which is in essence a blend of history, sociology, and psychology turned into a mathematical study of analysis and prediction of the macro-events of the human race. As a matter of fact, I do believe that Asimov is showing some foresight in this respect. Though the level to which he proposes the usage of his psychohistory may not be very realistic, the core idea gets me thinking seriously about the possibility of such a field opening up in the future. To summarise, he explores a number of both scientific an human themes with exceptional insight, yet some amount of subtlety too. So if you want something that is enormous in scope, besides superb entertainment (yes, these novels even include a bit of humour!), you assuredly cannot go wrong with this series. - Space Odyssey series by Arthur C. Clarke
2001: A Space Odyssey was one of the first proper science fiction novels I read (as a young teenager), and probably what really grabbed my fascination with the genre. As a novel written alongside a (faithful) film script, the storyline in fact ended up producing both a fantastic book and film, and counts for much of what made the Clarke the celebrity he was. Sadly, the great author passed away less than a year ago now, though he continued to write with much of his former skill well into the 21st century. Clarke’s works have always had the tendency to focus on space, the universe, and other intelligent species, and though they are perhaps not as grand in style as Dune or Foundation (he is clearly a realist opposed to the more romantic styles of the other two authors), they do have astoundingly good (hard) science, as well an undeniable element of suspense, which is present throughout almost all of his creations. The Space Odyssey series as a whole is one definitely worth reading to the end. If I remember rightly, only one of the four in the series (the others have the same names, replaced by the years 2010, 2061, and 3001) is a slight letdown, though the remainder (importantly, including the first and the last) are most enjoyable indeed. The first of the series I must however single out and include among my “big three” of sci-fi, the others being the original dune Dune and Foundation (despite a number of excellent sequels), as you might have suspected by now. - Rendezvous with Rama and the Rama series by Arthur C. Clarke
Belonging almost as much in mystery/thriller category as it does in science fiction, it is nevertheless an astonishing read. The highly unique view that this story offers in relating the first contact of humans with other intelligent life (of a much more advanced form in this case) may not be an especially grand one, but the imagination that went into this creation was surely immense. Being a mystery, I’m not going to comment any more about the book, even in vague terms. (Or maybe I’m just not too sure what to think of it as a whole.) Specifically, I would recommend that if your introduction to the works of Arthur C. Clarke is (or has already been) a pleasant one beginning with with 2001: A Space Odyssey, then this should definitely be next thing on your reading list. Most unfortunately however, akin to the Dune series, the quality of successive books does deteriorate a certain amount. (I can confidently say that the second is worth a read, though, while the subsequent ones lose some, if not all, of their novelty.) If you need any more convincing, let me point out that Rendezvous with Rama won both of the highest commendations in science fiction, the Hugo Award and the Nebula Award, a feat that the original Dune similarly achieved. - Brave New World by Aldous Huxley
Who hasn’t heard of this work alongside The War of the Worlds as a prime example of classic science fiction? Again, not celebrated so much for its scientific content as the eloquent manner in which it conveys certain philosophical and speculative ideas. I won’t deny that this is a bit of a depressing read in some ways, but its philosophical and socilogical implications are beyond doubt not only captivating but also quite relevant to modern society. Perhaps I do have something of a penchant for philosophical prophecy in literature, but I don’t think anyone can finish this book not having had their own thoughts and outlook on life and society sincerely provoked, if not disturbed by this powerful portrayal of a dystopian world in the near future. - Farennheit 451 by Ray Bradbury
A curious title, let alone book, and perhaps science fiction is not the most obvious classification for this work, though I think in a sociological sense it is so speculative and strikingly relevant regarding our future that it should be deemed so. This book has many parallels with Brave New World, the evident ones being the examination of dystopian societies (albeit two different forms) and philosophical warnings with undeniable precautionary overtones. They are both surely championing the freedom of independent thought and behaviour over the horrors that extreme conformism might bring. And finally: a book about the destruction of books – is there not a wonderful irony (perhaps even mockery) in this predominant theme?
There are additionally a few books about which I have heard very positive reports but unfortunately haven’t gotten around to reading yet. I’ll update the summaries when I do get around to reading at least some of these books, which will most probably be some time over the coming summer break.
- Childhood’s End by Arthur C. Clarke
[Summary to come.] - The Robot series by Isaac Asimov
[Summary to come.] For the time being, maybe pointing out that the film I, Robot was based on Asimov’s book of the same name will be enough to convince some to read it?
Fantasy
- The Lord of the Rings trilogy and The Hobbit, or There and Back Again by J.R.R. Tolkien
How much do I need to say about this one? That it most likely surpasses all of the others in both of the categories I have listed here should say enough about my particularly high opinion of the works. (I am a self-confessed Tolkien fanatic, after all) In fact, if you need convincing to read this epic, then I would suggest that you stop reading this list now (not that I would have expected you to get this far anyway)! I’ve mainly just included this entry for completeness, not because it’s going to be of any supreme help to anyone. As a side note, The Hobbit, or There and Back Again (commonly simply referred to as The Hobbit) should without doubt be read alongside the trilogy – whether before or after ought not make much of a difference to its impact, at least as I see it. Though I’m not sure to what extent this perception exists, I will nonetheless suggest that you dispell all notions of this being a story for chlidren – indeed it is no less suitable for adults than it is for a child, despite its comparative light-heartedness, which has possibly given it such a reputation. While I, as many others, read the novel first when I was quite young, it has not since lost its endearing quality to me, and see no reason why it should for anyone else. - The Silmarillion by J.R.R. Tolkien
The epic historical prequel to The Hobbit and The Lord of the Rings, this may not be the easiest read when you first pick it up. (At least, it wasn’t for me, though perhaps that was because I was much younger at the time and the somewhat archaic language didn’t help – yet I can only say it adds to the character and feeling of the story nowadays.) Nonetheless, I would argue it is almost more magnificent than The Lord of the Rings in certain ways, being an true archetype of epic literature. (Though I haven’t read it myself, the Anglo-Saxon poem Beowulf may give some impression of its style; unsurprising, given that Tolkien was a professor who taught such works of literature.) To be honest, if you’re not immeresed in it by the half-way point, then put it down, but I suspect this will not be the case for any lover of The Lord of the Rings, and you will hopefully become immersed as I did by the histories. - The Wheel of Time series by Robert Jordan
This is a series that I’ve actually not managed to finish yet. (My excuse is that it’s composed of 12 books, each ranging from roughly 700 to 1000 pages. I’m never one to race through a book or series [at least not since I was younger], and in any case I’m at least getting some enjoyment out of the reading.) Influenced to large amounts by both Tolkien and Herbert’s works (and I didn’t even know this when I started reading it!), I believe this series will given time become as renowned as those two in its own right. This is the series to read if you’re looking for fantasy that is both entertaining and has great depth to its characters, something that arguably even the great Tolkien’s works were at times missing. - Shannara series by Terry Brooks
Shannara is Brook’s most well-known (and as of yet) unfinished series. With a total of 14 books (a set composed over individual stories and series in their own right), this beats even The Wheel of Time in that respect, though the fact that the various storylines are disparate to a large degree makes it significantly more manageable. Being epic high fantasy very much in the style of Tolkien (though not direct plagarism, as some critics were too keen to codemn The Sword of Shannara). In my opinion, this series needs to be read at least until the conclusion of the second book (The Elfstones of the Shannara), which is in my mind still the best fantasy work outside of Tolkien’s collection, since I initially read it seven years ago. - The Word and the Void trilogy by Terry Brooks
This is Brook’s lesser known series, though in fact considered by a sizable minority to be his best writing. With a dark, modern setting, this certainly isn’t his typical style (or at least the one by which he gained his reputation), though it is perhaps his most creative composition. It is additonally notable in representing his final break-away from the influence of Tolkien (not that I can deem this a wholly undersirable event), and really includes some very original content to its plot. Even if you were turned off by Shannara (or simply not particularly impressed), I would firstly respond in shock, but then suggest that this trilogy is worth a try regardless of your opinion.
Now, before some indignant Harry Potter fanboy comments on the absence of the series from my list (this is of course presuming I have any reading this post), I should stress that these are not books that have just slipped my mind. I’ve read them all (some more than once when I was a bit younger), and plainly, they are decent light entertainment, but nothing worth putting alongside the other greats, I’m afraid.
To end, I would only like to say that it would be very gratifying to hear whether anyone is making use of these recommendations. It would honestly be quite interesting just to gauge whether you as fellow fans of these genres concur with at least some of the views presented here, or conversely how you might view my summaries contrastingly. If not, I think I can still convince myself that I enjoyed writing this guide for its own sake!
Numerical Analysis for .NET
During my ongoing work on a computational project for university, I recently discovered the need to perform some serious numerical analysis from my C# code. Unfortunately, I must admit that the .NET world only now seems to be catching up in terms of the free and open source libraries it offers for various tasks, and initially I was disheartened to find that there seemed to be nothing available for doing calculations on large (sparse) matrices. After a fair deal of searching, only a couple of somewhat incomplete and no longer maintained matrix libraries turned up. Being an avid user of StackOverflow, however, I decided that if anyone was aware of some library that could do what I needed, I would most likely find them there.
The result was much better than for what I was even hoping. dnAnalytics is a general-purpose package for numerical analysis in .NET that does almost everything for which I might possibly ask – and from my first impressions, does it very well indeed. This wonderful find is a well-maintained, fully open-source, library with great API documentation (not a wholly unexpected thing, but surprisingly uncommon among so many open source projects). There are several features that stand out as particularly impressive. One undoubtedly is I/O classes for Matlab and delimited files (among other formats). What is more, the library seems to offer both a fully managed version and one that wraps the Intel® Math Kernel Library. I’m not sure how the performance compares between the two (I haven’t yet tried the latter), but it is surely nice to have the pair of options available, quite similarly to how you have alternatives of cryptographic algorithms in the .NET BCL, that is to say, a) a fully managed version, v) a version based on top of the Windows Crypto API, c) a version that uses the CNG (Next Generation) API introduced with Vista. Perhaps what appeals to me the greatest about this library is that the developers have clearly gone to an effort to make it user-friendly, not only with regards to the documentation, but also by adding an interface friendly to F# coders (likely to be a language of choice for future mathematical/scientific programming), and even visual debuggers for Visual Studio (possibly the only library to date I’ve seen include them).
My particular usage of the library requires me to use the linear algebra (specifically, sparse matrix) classes. Although I must point out that the specific algorithm that I was intending to employ for the project was not available (see my later discussion), it did include a host of other ones, primarily focusing on direct and iterative matrix decomposition, which would appear to be quite handy in many circumstances. I haven’t yet had a chance to play with the other areas of the library, but I have noticed that it offers some statistical functions and methods as well as a number of modern pseudo-RNG algorithms such as the Mersenne Twister.
To conclude, I should come back to the point that the most important part of the analysis I require was not (at least direclty) contained by the library – finding the eigenvalues or eigendecomposition of large (1000s of rows/columns) matrices, which happens to be in relation to spectral theory, in case you’re curious. Even so, being such a complex field and one fraught with difficulties when it comes to implementation (numerical instability is a huge problem), I was not surprised to find that an implementation of the Arnoldi or Lanczos algorithm was not present. Fortunately, after a bit more searching around (by this point I knew specifically what I was looking for), I came across the ARPACK library, written in the archaic Fortran77 language. It did however seem to be exactly what I was looking for: a set of fast routines to find the eigenvalues of large (either dense or sparse) matrices of various types. After only a small amount of pain messing about with MinGW, I managed to get the code compiled nicely into a DLL. At this point, I am of course perfectly able just to use the P/Invoke capabilities of .NET and do some hackery to integrate the ARPACK stuff with my existing code and dnAnalytics. Yet, I am also inclined to do this whole task properly and basically write a managed wrapper for ARPACK that is tightly conforms with dnAnalytics. I could then perhaps submit these wrapper types (along with a few unit tests?) as a repository patch to the dnAnalytics team in the hope that they’ll take it and add it to the next release. As with most other projects at this time, I will have to see what time permits me, though I would certainly hope to contribute something substantial to what truly is a terrific project that I would love to see expand further.
LINQ to YAML
LINQ to XML is one of the many technologies introduced with the .NET Framework 3.5, and one that is certainly a step forward in terms of usability. It allows querying in both the functional style (using LINQ and lambda expressions) and the more traditional imperative one, meaning that it’s a great tool for concisely working with XML data in any sort of application, and undoubtedly a significant improvement over the old XML DOM that resides in the System.Xml namespace.
In the spirit of LINQ, and with the advent of YAML, I recntly decided it was about time that this new “markup language” were integrated with LINQ. Surprisingly, there does not already exist anything akin to LINQ to YAML out there (though there are a couple of fairly usable implementations of a YAML reader/writer for .NET). This seemed to me like a good chance to potentially create something that might be used by more than the odd .NET developer or two. My plans are to implement a LINQ to YAML provider either from scratch or on top of one of the existing YAML libraries. (Which option I choose will depend on the state of the existing projects, which I haven’t yet investigated properly. I am however suspecting that it might be worthwhile writing my own, since it would a) teach me all the intricacies of YAML, and b) allow me to support the latest version [1.2], which the existing libraries do not.)
Before I launch into an overview of my intended implementation, here is a little bit about YAML itself, for those who aren’t already familiar with it. Although technically YAML isn’t a markup language (after all, the recursive acronym stands for YAML Ain’t Markup Language) – it is rather a serialisation format – it does essentially fulfill the the role that XML traditionally has, in a variety of common situations. I’m not going to try to sell the format to you right now, but it should suffice to say that you wouldn’t have reached this far in the post if you weren’t already at least intrigued! Without doubt, the format is actively gaining popularity because of it’s ultra-lightweight syntax and suitability for hand editing, perhaps the two points that summarise its advantages over XML.
Anyway, here’s a short example of a YAML document (taken straight from the Wikipedia page), so you can see precisely how pleasant it is to work with (at least for humans).
—
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
family: Gale
items:
- part_no: A4786
descrip: Water Bucket (Filled)
price: 1.47
quantity: 4
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
price: 100.27
quantity: 1
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Westville
state: KS
ship-to: *id001
specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
Pay no attention to the
man behind the curtain.
...
Of course, the great thing about YAML, which is demonstrated clearly by this example, is that you don’t have to have any real knowledge about YAML to understand exactly and immediately what the data represents, and as a bonus it doesn’t hurt your eyes to stare at for too long! Even the referencing syntax should be fairly self evident. (&id00 and *id001 would surely be nothing new to C programmers.)
The semantics as well as the syntax of YAML obviously differ to those of XML greatly, although there is almost always some sort of correspondence between the features and possibilities that the two formats offer. The only notable missing feature when contrasted to XML is attributes, yet their usefulness is questionable anyway.
Right, so now I ought to explain a bit about how I actually plan to design this library. The basic framework will be virtually equivalent to that of LINQ to XML. In other words, the hierarchy will be largely based around an abstract YamlObject (YObject?) class, and will look very much like the one contained within System.Xml.Linq.
Though LINQ to YAML must of course accomodate for the unique nature of the format, I would initially aim for minimal difference and only significantly adjust the hierarchy when it is found to be necessary. Classes such as XCData and XDocumentType would not apply at all to YAML, yet there would need to be a place for a YReference or such somewhere in the hierarchy. The referencing aspect of YAML will likely prove to be one of the more interesting challenges; while YAML’s lists, maps (dictionaries), and combinations thereof would seem relatively straightforward with regards to emulation of the LINQ to XML design, references would introduce a substantially novel concept. Some sort of implementation of lazy evaluation followed by concrete referencing should be able to solve the problem, but there’s no way to predict how well this might work in practice at this moment.
What I realised only after deciding to create a LINQ to YAML library is that among LINQ providers, LINQ to XML is somewhat special in that the LINQ aspect of it is built on top of LINQ to Objects (i.e. LINQ using IEnumerable<T> objects), with only a relatively small number of extension methods specific to LINQ to XML. Indeed, most LINQ providers (LINQ to Objects and LINQ to SQL among others) require you to implement the IQueryable and IQueryProvider interfaces to provide complex logic for interpreting and returning the results of expressions, as well as evaluating complex expression trees. All this means that I can pretty much just design a DOM to a certain style (i.e. one suited to functional code, like LINQ to XML), and let LINQ to Objects to everything else for me.
As I can’t think of anything more worth mentioning about my project at this time, I shall leave any more specific and complex details to a future post. Still, do by all means feel free to query me about my plans – I would be glad to answer any questions, and even gladder to receive some suggestions as how you think I might design LINQ to YAML, or simply a nod that you might find this useful at some point. I don’t anticipate this project to be a very long one, though I must say that both my work and free-time schedule are likely to be fairly messed up for the next month or two, therefore I’m not going to promise when I’ll get around to my initial release. Whenever it so happens, I will duly post the link to the project page on Launchpad (or wherever I decide to host it).
Strongly-Typed CSV Reader in C#
As part of a project on which I’ve recently started working, I found it necessary to write a class that reads entries from CSV files. Such a simple format, you might think, so why would I bother sharing such trivial code? Indeed, it is a relatively short class, but I thought I’d post it here nonetheless, primarily because I believe its usage promotes a design practice of which I am particularly fond, and I suspect (hope) other people may appreciate as well. There are also a few bits of code that might be considered interesting (and unusual) from a language/design perspective.
When I decided to formalise the logic for reading from CSV files, I firstly thought it would be nice to write something in the spirit of .NET 3.5 – in this case, easily compatible with LINQ, fully generic (strongly-typed), and attribute-oriented (as seems to be the trend in APIs nowadays). Before I launch into any further discussion, here’s the code for the class in full.
using System; using System.ComponentModel; using System.Collections.Generic; using System.IO; using System.Linq; using System.Reflection; using System.Text; namespace NetworkAnalyser { public class CsvReader<TEntry> : IDisposable where TEntry : struct { private StreamReader streamReader; private FieldTypeInfo[] fieldTypeInfos; private bool isDisposed = false; public CsvReader(string path) { streamReader = new StreamReader(path); Initialize(); } public CsvReader(Stream stream) { streamReader = new StreamReader(stream); Initialize(); } ~CsvReader() { Dispose(false); } public void Dispose() { Dispose(true); GC.SuppressFinalize(this); } protected virtual void Dispose(bool disposing) { if (!isDisposed) { if (disposing) { if (streamReader != null) streamReader.Dispose(); } } isDisposed = true; } public IEnumerable<TEntry> ReadAllEntries() { TEntry? entry; while ((entry = ReadEntry()).HasValue) yield return entry.Value; } public TEntry? ReadEntry() { var line = streamReader.ReadLine(); if (line == null) return null; var entry = new TEntry(); var fields = line.Split(new char[] { ',' }, StringSplitOptions.None); FieldTypeInfo fieldTypeInfo; object fieldValue; for (int i = 0; i < fields.Length; i++) { fieldTypeInfo = fieldTypeInfos[i]; fieldValue = fieldTypeInfo.TypeConverter.ConvertFromString(fields[i].Trim()); fieldTypeInfo.FieldInfo.SetValueDirect(__makeref(entry), fieldValue); } return entry; } private void Initialize() { var entryType = typeof(TEntry); fieldTypeInfos = (from fieldInfo in entryType.GetFields(BindingFlags.Instance | BindingFlags.Public) let fieldTypeConverterAttrib = fieldInfo.GetCustomAttributes( typeof(TypeConverterAttribute), true).SingleOrDefault() as TypeConverterAttribute let fieldTypeConverter = (fieldTypeConverterAttrib == null) ? null : Activator.CreateInstance(Type.GetType( fieldTypeConverterAttrib.ConverterTypeName)) as TypeConverter select new FieldTypeInfo() { FieldInfo = fieldInfo, TypeConverter = fieldTypeConverter ?? TypeDescriptor.GetConverter(fieldInfo.FieldType) }).ToArray(); } private struct FieldTypeInfo { public FieldInfo FieldInfo; public TypeConverter TypeConverter; } } }
(Please excuse the utter lack of comments in the code. Most of it is self-explanatory, but admittedly some parts are probably not. I put it together pretty quickly, but I may get around to commenting it some time soon. Some basic error handling might also be nice.)
At this point it may seem rather excessive just to read data from a CSV file, but I hope you’ll agree that it’s worthwhile once you see an example of typical usage.
The first step is to define a structure (struct) that holds each entry in memory. Here we’re going to define one that holds some basic information about a programming language.
public struct LanguageEntry { public string Name; public string[] Paradigms; public string LatestVersion; [TypeConverter(typeof(CustomDateTimeConverter))] public DateTime InitialRelease; [TypeConverter(typeof(CustomDateTimeConverter))] public DateTime LatestRelease; public float Popularity; }
The TypeConverter attributes are completely optional, and are only required when you’re reading some fields that have unusual formats and whose values you would like to convert to something simpler/more accessible (e.g. a string “Jun2002″ to a DateTime object in this case). For any field of a type recognisable by the default type converter, you don’t need to bother, as is shown for the double type. (This actually applies to a very large range of types within the BCL, including System.Drawing.Color, which can be specified in any format that you might use in the propeprty editor of Visual Studio, such as “DarkRed”.)
Finally, here’s a snippet to show how you might actually use the CsvReader<TEntry> class to read from a CSV file. This example reads all entries from the languages.csv file and prints out to the console the names of all functional languages.
using (var languagesReader = new CsvReader<LanguageEntry>("language.csv")) { var languages = from lang in languagesReader.ReadAllEntries() where lang.Paradigms.Contains("Functional") select lang; foreach (var lang in languages) Console.WriteLine(lang.Name); }
Hopefully that’s now convinced you that this is the right way to go about reading data entries from files. What this class provides is completely strongly-typed I/O (reading in this case, though it wouldn’t be very hard to create a similar CsvWriter class), and a declarative manner to defining entry types (or records, to use database termninology).
I’m not going to delve too deeply into the implementation of the class, but I think it’s worth highlighting a few specifics. Going back to the code for the class, the first thing to notice is the Initialize method – this is where much of the interesting stuff is happening. To summarise: it loops over all the public fields of the type specified by TEntry, gets the default type converter for the type of each field (or the one given by TypeConverterAttribute, if it exists), and then stores the FieldInfo along with the TypeConverter in a simple struct. The only other noteworthy point is the call to SetValueDirect in the ReadEntry method. This uses a keyword that’s almost wholly unknown (and undocumented!) to C# developers by the name of __makeref (there are other related ones by the names of __reftype and __refvalue) – I was certainly unaware of it before today. The problem that I initially encountered was one of using the SetValue method, which works perfectly well on classes, but presents a unique problem with structs: namely, because they are value-types, and the obj parameter is of type object, the argument must be boxed (wrapped into a reference type) and placed on the heap rather than the stack, meaning that the heap-based copy gets altered, and not the one you passed to the method (which is on the stack)! What the __makeref keyword does is create a TypeReference that directly references the stack-based object and thus allows SetValueDirect to set the field accordingly.
That’s enough explanation, I think. If you still aren’t sure about how it works precisely, then feel free to comment on this post. I’d also be quite happy to hear what anyone thinks of the general design and implementation, too.
Leave a Comment
Leave a Comment
Leave a Comment.jpg)
Searching for Exceptions in .NET
Filed under: Programming, Software | Tags: .net, c#, cil, clr, control flow, errors, exceptions, extension methods, instructions, libraries, methods, msil, reflection, stack, stackoverflow, third-party libraries, try-catch, variables, xml comments, xml documentation
I recently came across a rather interesting question on StackOverflow that posed the problem of discovering all the exceptions that a given method might throw under every circumstance.
Of course, in the great majority of situations, XML documentation for the BCL (and ideally any third-party libraries too) should provide information about any exception that might be thrown and any potential reason for it. Indeed, thisthis is generally all one needs to write largely error-safe code. However, not every exception is documented in any case, and for production-quality applications, it is often desirable to insure that there is no realistic chance of an unhandled exception ocurring. For this reason, it is sometimes desirable to do a rigorous check for all exceptions. Clearly, an application-level unhandled (fatal) exception handler would do the job to some extent, and although this is always a good fallback feature to have, it is the least elegant solution to coping with exceptions.
After some consideration, it became quite apparent that the task reduces to the halting problem. However, with a few simplifications, the problem does become relatively solvable. Most importantly, complex logic that determines whether an exception will be thrown must be ignored, and one must simply assume that any throw statement within a given method could possibly cause an exception under certain conditions.
Here is the complete code for the algorithm I wrote. The GetAllExceptions method is an extension method that returns a read-only collection of exceptions, which makes it very straightforward and efficient to use.
Notably, the code detects all of
Exceptions are only counted when the appropiate throw instruction is encountered at some level. Also, the stack and local variables are handled correctly, as far as I can tell, so this method should work soundly in pretty much all cases. (It has been tested with some a few quite complex methods within the BCL, as well as simpler user-defined ones.)
To be quite honest, I’m not sure whether I’ll need to use this code myself at any point, but I’ve posted it regardless for the benefit of anyone who might require such rigorous exception checking. It was definitely an interesting challenge, at the least.
Any further comments or suggestions would be welcome, as always.