Archive for September, 2008|Monthly archive page
Combining Ordered Lists in .NET
I recently came across an issue with LINQ involving the combination of ordered (sorted) lists. The problem does not seem to have a simple solution within the core libraries of .NET 3.5, so I decided to write my own (short) function to accomplish the task. Extensions methods and the static Enumerable class usually contain all the possible methods you might need for dealing with lists (or more generally enumerable collections). Combining ordered lists, however, isn’t quite so straightforward a process (to do efficiently) as it might first seem. Of course, you could simply do Enumerable.Concat(listA, listB).OrderBy(keySelector) but it should be apparent that this is very inefficient for large lists if you know your lists are already ordered. Moreover, the call will never return if either or both of the enumerable collections you pass are of infinite length. What you really want to do is select items from the lists by switching back and forth between them, picking whichever of the next items ought to come first (which is determined by the key selector and associated IComparable implementation).
public static IEnumerable<TSource> CombineOrdered<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector) where TKey : IComparable<TKey> { var firstEnumerator = first.GetEnumerator(); var secondEnumerator = second.GetEnumerator(); var firstCanAdvance = firstEnumerator.MoveNext(); var secondCanAdvance = secondEnumerator.MoveNext(); var firstCurKey = keySelector(firstEnumerator.Current); var secondCurKey = keySelector(secondEnumerator.Current); while (true) { if (firstCanAdvance && (!secondCanAdvance || firstCurKey.CompareTo(secondCurKey) <= 0)) { yield return firstEnumerator.Current; firstCanAdvance = firstEnumerator.MoveNext(); if (firstCanAdvance) firstCurKey = keySelector(firstEnumerator.Current); } else if (secondCanAdvance) { yield return secondEnumerator.Current; secondCanAdvance = secondEnumerator.MoveNext(); if (secondCanAdvance) secondCurKey = keySelector(secondEnumerator.Current); } else { yield break; } } }
To use the function, for example within a static class named Enumerable2, you can call it as such:
Note: The given implementation needs to take lists sorted in an ascending order and returns a combined list in the same order. To sort descending, change the <= sign to a >= sign in the code and insure that you pass lists sorted in a descending order.
Dynamic CSS for ASP.NET
I’ve recently being doing a bit of web development in ASP.NET and came across (as all web developers must commonly do) the problem of catering for IE’s poor standards compliance regarding rendering/CSS stylesheets. The common solution to the problem is to use CSS hacks, which perform some magic to get pages in IE looking as they ought to appear whilst still displaying correctly in good browsers (Firefox, Safari, etc.). When this fails as it often does, the best (yet slightly horrible) solution is to create separate stylesheets for IE (and possibly for different versions) and then put conditional tags around the references the CSS file(s) in the HTML head section.
<link rel="Stylesheet" type="text/css" href="StyleSheet.css" /> <!--[if lte IE 7]> <link rel="Stylesheet" type="text/css" href="StyleSheet-IE.css" /> <![endif]-->
This particular example only includes the stylesheet StyleSheet-IE.css if the browser version is less than or equal to IE 7.0 (whereas StyleSheet.css is always linked). Note that the conditional tags can be put around any HTML element so that they are only read when the browser matches the specified conditions. It can sometimes be useful to wrap JavaScript blocks with them.
The latter solution has until now worked well for all my purposes, though a more concise and elegant way of presenting variations of the CSS code to different browsers has always been asking. The problem arose when I was trying to reference stylesheets for ASP.NET themes (CSS files under the App_Themes folder) – because the ASP.NET runtime dynamically adds links to the stylesheets according to the current theme, there’s no way (that I could find) to implement the previous solution using conditional tags.
My own solution was to create a Dynamic CSS processor which runs on the server to process any CSS file according to the browser type/version before returning the content. I have implemented the processor as an HTTP handler that can enabled by a simple reference in the web.config file. The basic idea is similar to conditional tags in HTML, but instead the tags are in the actual CSS code and have the syntax.
[[?[!]{browser} {true result} | {false result}]]
The browser expression is simply the name of the browser for which to check (e.g. “IE”, “Firefox”) and can optionally be preceded by a ‘!’ symbol to invert the condition. The statement is simply evaluated to {true result} if the condition is found to be true, and {false result} otherwise. It can be applied to whole or parts of CSS declarations, but it is important to enclose one or more tags in comment tags (“/*” and “*/”) to avoid syntax errors generated by the IDE. The processor will not recognise the statement unless it is inclosed in comment tags.
Example:
div.main { /* margin-top: [[?IE -4px | 0 ]] */ }
Unfortunately I haven’t yet gotten round to implementing checking of browser version, but it is certainly something on my task list and I will post an update once it gets done.
The code for the HTTP handler is reasonably short and mainly involves RegEx expressions for parsing the CSS files with some lambda expressions for reducing the number of functions/loops. A caching feature is implemented to save the server from reprocessing a CSS file each time it is requested even when the file has remained unchanged.
You can view the full code for the HTTP handler here. Simply copy it to the App_Code folder and add the following references to web.config (the first is for II 6/Visual Studio test server and the second is for IIS 7).
<system.web> <httpHandlers> <add verb="*" path="*.css" type="StyleSheetHandler"/> </httpHandlers> </system.web> <system.webServer> <handlers> <add name="StyleSheetHandler" verb="*" path="*.css" type="StyleSheetHandler"/> </handlers> </system.webServer>
Feedback is welcome and I would be glad to hear if anyone is using this in their web projects.
Querying the Semantic Web
Although the Semantic Web is yet in its infancy and has a long way to go before widespread adoption, the evolution of some of its projects is finally starting to enable some interesting applications. DBpedia now provides a semantic framework for accessing much (though far from all) of the data in the 2.5 million articles currently on Wikipedia. Other projects are attemping to create semantic databases of music, books, geography, and photos, to name some of the larger ones. If you’re not very familiar with the concept of the Semantic Web, I recommend the Wikipedia article as a good introduction, though for the purposes of this post you won’t need to know the details. In summary, the eventual goal of the Semantic Web is to create a huge interlinked web of knowledge that can be accessed and utilised by computers for all sorts of tasks. This would ultimately enable a computer to perform most of the actions humans can currently perform on the WWW, such as researching knowledge, making bookings, or ordering products from online companies.
Having done some research into the current state of the Semantic Web, I have recently been considering the (admittedly rather ambitious) idea of querying the semantic web with human-language questions. The plan is to make use of two great sources of semantic data, DBPedia and WordNet (a lexical database of the English language) to give precise answers to advanced questions, similar to the Ask.com service but much more “intelligent”. The former allows a program to access an enormous amount of encyclopaedic information while the latter provides detailed specific information about the meanings of words and expressions in the English language. The data is accessible in RDF format and can be queried via SPARQL (an SQL-like language). RDF is the standard model for representing semantic data, consisting of simple statements called triples (see the RDF link for a detailed explanation). Combined with the appropiate AI, a computer could (at least in theory) answer any question contained, either explicitly or implicitly, by the contents of Wikipedia. The aim is to allow a person to enter a complex question in English and receive an accurate response (or set of ranked responses) from the system, displayed in whichever way is most appropiate. Examples of such questions are:
- “When was Microsoft founded and where are its current headquarters?”
- “Who succeeded Octavian as Emporer of Rome and when was he born?”
- “List all of the papers published by Albert Einstein in 1905.”
- “Through which countries do the Alps pass?”
- “Give me a list of all the computers costing more than £1000 manufactured by Dell between 1998 and 2000.”
It is clear that translating queries like these into computer-understandable ones is far from a simple process and will require a significant level of AI. Some can be directly queried against RDF with hardly any further processing but others will need some form of machine logic (to perform simple numerical or set operations, for example).
It is important to note that there are a few major obstacles to creating such a system and allowing it to achieve high accuracy, though some of them can be at least partially resolved by human training. Such training or evolution of the system could be accomplished effectively by making question askers utilise a user-interface that provides feedback.
- Human languages are inherently ambiguous methods of communication. Any algorithm used to interpret queries will necessarily involve a probabilistic model to resolve ambiguities. Also, more intricately phrased questions can be very difficult for AI to comprehend. A user would ideally use as simple and direct language as possible.
- DBpedia in its current form does not express in a semantic form a very large proportion of the information contained by Wikipedia since much of it is given in continuous prose. However, improvements in the quantity and density of information in DBpedia articles are likely to come in the near future as Wikipedia and the Semantic Web continue their growth. The system could additionally be expanded to search within other databases of knowledge apart from Wikipedia, such as Geonames and MusicBrainz.
- Similarly, WordNet is an incomplete lexical database of the English language; some words/expressions and links between them will inevitably be missing or poorly defined.
- There is no easy way to link the objects and concepts defined by WordNet to those in Wikipedia/DBpedia. In fact, it could prove all but impossible to do so without the aid of humans (or a very advanced and currently infeasable level of AI.) Still, there are various solutions to this issue and the topic will be a main focus point in upcoming posts.
- The processing or even actual intelligence required to accurately answer certain questions may be too great in certain cases. This does not present as big a problem as some of the other points, though it is desirable that either the human questioner or the AI recognises when a question is obviously unanswerable. An example of such a question would be:
“What was the mean average speed of computer processors between 1990 and 1995?”
Although there is a possibility that Wikipedia or other databases of knowledge implicitly contain the answer to such a query, it would require a very high degree of intelligence to answer it, which goes far beyond the purpose of the system. It should also be noted that this condition may not be differentiable to that where the information is not contained by the knowledge base. Questions which require opinionated replies however ought to be recognisable upon querying WordNet but before searching the knowledge base.
This post is only meant to be an introduction to my currently half-formed project idea of querying the semantic web for encyclopaedic information. I plan to discuss the details of high-level implementation in a series of following posts as I begin and continue work on this project. These posts ought to mainly include conceptual diagrams and images with explanations, plus some rare short snippets of code. I firmly believe that getting bogged down in low-level implementation details will not offer a good understanding of the system and should only serve to clarify key ideas. Explaining the architecture will be the focus of the series and certainly ought to fill enough posts! The project will become open-source once it (with with any luck) reaches the first development milestone, which I will define at a later date. Current plans are to carry out development in C#/.NET 3.5 using LinqToRDF to query RDF data.
Well, that’s all for now, but hopefully you now have a general understanding of the the core ideas. Comments on any aspect of the project are welcome.
Leave a Comment
Comments (5)
Leave a Comment