Archive for the ‘generics’ Tag
Strongly-Typed CSV Reader in C#
As part of a project on which I’ve recently started working, I found it necessary to write a class that reads entries from CSV files. Such a simple format, you might think, so why would I bother sharing such trivial code? Indeed, it is a relatively short class, but I thought I’d post it here nonetheless, primarily because I believe its usage promotes a design practice of which I am particularly fond, and I suspect (hope) other people may appreciate as well. There are also a few bits of code that might be considered interesting (and unusual) from a language/design perspective.
When I decided to formalise the logic for reading from CSV files, I firstly thought it would be nice to write something in the spirit of .NET 3.5 – in this case, easily compatible with LINQ, fully generic (strongly-typed), and attribute-oriented (as seems to be the trend in APIs nowadays). Before I launch into any further discussion, here’s the code for the class in full.
using System; using System.ComponentModel; using System.Collections.Generic; using System.IO; using System.Linq; using System.Reflection; using System.Text; namespace NetworkAnalyser { public class CsvReader<TEntry> : IDisposable where TEntry : struct { private StreamReader streamReader; private FieldTypeInfo[] fieldTypeInfos; private bool isDisposed = false; public CsvReader(string path) { streamReader = new StreamReader(path); Initialize(); } public CsvReader(Stream stream) { streamReader = new StreamReader(stream); Initialize(); } ~CsvReader() { Dispose(false); } public void Dispose() { Dispose(true); GC.SuppressFinalize(this); } protected virtual void Dispose(bool disposing) { if (!isDisposed) { if (disposing) { if (streamReader != null) streamReader.Dispose(); } } isDisposed = true; } public IEnumerable<TEntry> ReadAllEntries() { TEntry? entry; while ((entry = ReadEntry()).HasValue) yield return entry.Value; } public TEntry? ReadEntry() { var line = streamReader.ReadLine(); if (line == null) return null; var entry = new TEntry(); var fields = line.Split(new char[] { ',' }, StringSplitOptions.None); FieldTypeInfo fieldTypeInfo; object fieldValue; for (int i = 0; i < fields.Length; i++) { fieldTypeInfo = fieldTypeInfos[i]; fieldValue = fieldTypeInfo.TypeConverter.ConvertFromString(fields[i].Trim()); fieldTypeInfo.FieldInfo.SetValueDirect(__makeref(entry), fieldValue); } return entry; } private void Initialize() { var entryType = typeof(TEntry); fieldTypeInfos = (from fieldInfo in entryType.GetFields(BindingFlags.Instance | BindingFlags.Public) let fieldTypeConverterAttrib = fieldInfo.GetCustomAttributes( typeof(TypeConverterAttribute), true).SingleOrDefault() as TypeConverterAttribute let fieldTypeConverter = (fieldTypeConverterAttrib == null) ? null : Activator.CreateInstance(Type.GetType( fieldTypeConverterAttrib.ConverterTypeName)) as TypeConverter select new FieldTypeInfo() { FieldInfo = fieldInfo, TypeConverter = fieldTypeConverter ?? TypeDescriptor.GetConverter(fieldInfo.FieldType) }).ToArray(); } private struct FieldTypeInfo { public FieldInfo FieldInfo; public TypeConverter TypeConverter; } } }
(Please excuse the utter lack of comments in the code. Most of it is self-explanatory, but admittedly some parts are probably not. I put it together pretty quickly, but I may get around to commenting it some time soon. Some basic error handling might also be nice.)
At this point it may seem rather excessive just to read data from a CSV file, but I hope you’ll agree that it’s worthwhile once you see an example of typical usage.
The first step is to define a structure (struct) that holds each entry in memory. Here we’re going to define one that holds some basic information about a programming language.
public struct LanguageEntry { public string Name; public string[] Paradigms; public string LatestVersion; [TypeConverter(typeof(CustomDateTimeConverter))] public DateTime InitialRelease; [TypeConverter(typeof(CustomDateTimeConverter))] public DateTime LatestRelease; public float Popularity; }
The TypeConverter attributes are completely optional, and are only required when you’re reading some fields that have unusual formats and whose values you would like to convert to something simpler/more accessible (e.g. a string “Jun2002″ to a DateTime object in this case). For any field of a type recognisable by the default type converter, you don’t need to bother, as is shown for the double type. (This actually applies to a very large range of types within the BCL, including System.Drawing.Color, which can be specified in any format that you might use in the propeprty editor of Visual Studio, such as “DarkRed”.)
Finally, here’s a snippet to show how you might actually use the CsvReader<TEntry> class to read from a CSV file. This example reads all entries from the languages.csv file and prints out to the console the names of all functional languages.
using (var languagesReader = new CsvReader<LanguageEntry>("language.csv")) { var languages = from lang in languagesReader.ReadAllEntries() where lang.Paradigms.Contains("Functional") select lang; foreach (var lang in languages) Console.WriteLine(lang.Name); }
Hopefully that’s now convinced you that this is the right way to go about reading data entries from files. What this class provides is completely strongly-typed I/O (reading in this case, though it wouldn’t be very hard to create a similar CsvWriter class), and a declarative manner to defining entry types (or records, to use database termninology).
I’m not going to delve too deeply into the implementation of the class, but I think it’s worth highlighting a few specifics. Going back to the code for the class, the first thing to notice is the Initialize method – this is where much of the interesting stuff is happening. To summarise: it loops over all the public fields of the type specified by TEntry, gets the default type converter for the type of each field (or the one given by TypeConverterAttribute, if it exists), and then stores the FieldInfo along with the TypeConverter in a simple struct. The only other noteworthy point is the call to SetValueDirect in the ReadEntry method. This uses a keyword that’s almost wholly unknown (and undocumented!) to C# developers by the name of __makeref (there are other related ones by the names of __reftype and __refvalue) – I was certainly unaware of it before today. The problem that I initially encountered was one of using the SetValue method, which works perfectly well on classes, but presents a unique problem with structs: namely, because they are value-types, and the obj parameter is of type object, the argument must be boxed (wrapped into a reference type) and placed on the heap rather than the stack, meaning that the heap-based copy gets altered, and not the one you passed to the method (which is on the stack)! What the __makeref keyword does is create a TypeReference that directly references the stack-based object and thus allows SetValueDirect to set the field accordingly.
That’s enough explanation, I think. If you still aren’t sure about how it works precisely, then feel free to comment on this post. I’d also be quite happy to hear what anyone thinks of the general design and implementation, too.
Combining Ordered Lists in .NET
I recently came across an issue with LINQ involving the combination of ordered (sorted) lists. The problem does not seem to have a simple solution within the core libraries of .NET 3.5, so I decided to write my own (short) function to accomplish the task. Extensions methods and the static Enumerable class usually contain all the possible methods you might need for dealing with lists (or more generally enumerable collections). Combining ordered lists, however, isn’t quite so straightforward a process (to do efficiently) as it might first seem. Of course, you could simply do Enumerable.Concat(listA, listB).OrderBy(keySelector) but it should be apparent that this is very inefficient for large lists if you know your lists are already ordered. Moreover, the call will never return if either or both of the enumerable collections you pass are of infinite length. What you really want to do is select items from the lists by switching back and forth between them, picking whichever of the next items ought to come first (which is determined by the key selector and associated IComparable implementation).
public static IEnumerable<TSource> CombineOrdered<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector) where TKey : IComparable<TKey> { var firstEnumerator = first.GetEnumerator(); var secondEnumerator = second.GetEnumerator(); var firstCanAdvance = firstEnumerator.MoveNext(); var secondCanAdvance = secondEnumerator.MoveNext(); var firstCurKey = keySelector(firstEnumerator.Current); var secondCurKey = keySelector(secondEnumerator.Current); while (true) { if (firstCanAdvance && (!secondCanAdvance || firstCurKey.CompareTo(secondCurKey) <= 0)) { yield return firstEnumerator.Current; firstCanAdvance = firstEnumerator.MoveNext(); if (firstCanAdvance) firstCurKey = keySelector(firstEnumerator.Current); } else if (secondCanAdvance) { yield return secondEnumerator.Current; secondCanAdvance = secondEnumerator.MoveNext(); if (secondCanAdvance) secondCurKey = keySelector(secondEnumerator.Current); } else { yield break; } } }
To use the function, for example within a static class named Enumerable2, you can call it as such:
Note: The given implementation needs to take lists sorted in an ascending order and returns a combined list in the same order. To sort descending, change the <= sign to a >= sign in the code and insure that you pass lists sorted in a descending order.
Comments (1)
Leave a Comment