Skip to main content

Antonio Cisternino's Home Page

Go Search
Home
  
Antonio Cisternino's Home Page > My Blog > On Constraint of Generics in Java 1.5 and C# 2.0  

My Blog: On Constraint of Generics in Java 1.5 and C# 2.0

Title

On Constraint of Generics in Java 1.5 and C# 2.0 

Body

I've just read the posts on generics by Ian Griffiths (http://www.interact-sw.co.uk/iangblog/2004/03/14/generics) and Bruce Eckell (http://mindview.net/WebLog/log-0050) about Generics (both C# and Java).
The critics is essentially the following: why do they use constraints on types?
First of all let's try to understand the problem. Consider the following generic method (in C#):
 
public static void Frob<T>(T x) {
    Console.WriteLine(x.Bar());
}
 
In both Java 1.5 and C# 2.0 the compiler raises an error complaining about the fact that type T doesn't provide a Bar() method.
Why do they wonder? The reason for the error is easy to understand: both implementations guarantee type safety at definition point for a generic type or method. Parametric polymorphism à la C++ performs typechecking only at instantiation point.
Now if the compiler knows nothing about the type parameter and want to ensure that the type it represents is used correctly, the only way is to assume that T inherits from Object.
If the programmer want to do assumptions about type T a bound has to be specified on the parameter. In our example:
 
using System;
public interface BarMethod<T> {
  T Bar();
}
public class M {
  public static void Frob<T, V>(T x) where T : BarMethod<V> {
    Console.WriteLine(x.Bar());
  }
}
 
Now the compiler is aware that type T has a method Bar that returns whatever type (so it assumes Object), and because Console.WriteLine has an overload for Object the program is correct.
Can we constraint the compiler to what we want (like Ian suggests a couldbeanything constraint)? As recorded by Peter Sestoft (http://www.dina.dk/~sestoft/gcsharp/), when we were at Microsoft Research in Cambridge working with Andrew Kennedy and Don Syme (the authors of the first implementation of C# generics), I came up with a trick to work around the type checker:
 
using System;
public interface BarMethod<T> {
  T Bar();
}
public class M {
  public static void Frob<T, V>(T x) {
    Console.WriteLine(((BarMethod<V>)(Object))x.Bar());
  }
}
 
We have eliminated the constraint relying on subtype polymorphism: if x is of type T we can always upcast it to Object and then to whatever type U we like: ((U)(Object))x.
With this the type checker is happy, but beware! the generic type we have defined isn't really generic! It does assumptions not declared outside...
Now it's time to tell you a story: I contributed to develop a C++ search engine called IXE. It uses techniques of template meta-programming, thus we use lots of templates. It took me two years to notice that a generic method was buggy because noone has use it before and the type checker never ran on it! That was bad, because we were developing a library and somebody else may have found the problem!
It is plain that having to specify bounds on generic type parameters is an additional burden to the programmer. The real question is: is it worth to spend time in doing it? What benefits we get by these annotations?
The first answer is: ensuring that a generic type is well typed looking at the definition is better for code reuse and lead to develop more robust libraries. This applies to both Java and C#.
A second answer is: code bloat avoidance. A well known limit of generating a class (or method) for each instance of the type parameters is that we get a code bloat at runtime. Besides in execution environments like JVM and CLR, where most types are references, the implementation of a generic type can be shared among different instances. Thus Stack<String> and Stack<Form> will have a single class at runtime and not two.
As explained in my last post in my blog Java simply strips away the type parameters and for the JVM there aren't generic paramters at all.
For .NET is fairly different: the intermediate language has been extended so that type parameters are known to the execution environment. It is the loader that instantiate generic types by JITting the appropriate machine code. This approach leads to a more efficient code without the code bloat side effect: all the generic types instantiations where parameter types are bound to reference types (i.e. non value types) can share the same JITted code. Nevertheless the loader generates a different code for each instantiation with value types like int, and structs. Thus Stack<string> and Stack<Form> will share the same JITted code whereas Stack<int> will get a different version.
An approach to parametric polymorphism without bounds wouldn't allow these kind of optimizations.
Of course we can ask for more type inference so that the compiler infers types on our behalf! There is still much more to do!
At first sight I tried to understand C# generics using my past experiences with C++ templates, and I was wrong! Now I greatly prefer the more semantic approach taken by Java and C# than the macroexpansion system of C++, in particular because in a dynamic environment it is not really safe to rely on the good faith of programmers ;-)
 

Expires

 

Category

Programming 
Attachments
Created at 3/20/2004 11:14  by Antonio Cisternino 
Last modified at 3/21/2004 23:31  by Antonio Cisternino