Generalizing Generics: Type variables with both upper and lower bounds
Context
Generics are a type system facility that was added to the Java programming language in 2004 as a part of the Java 5 release; it represents the most significant update to the Java type system since its initial conception. Generics allow to track in the type system that a class or method may operate on objects of various types. To this end, type variables are employed in the typing of such generic classes or methods. This provides compile-time type safety and allows to omit redundant cast operations.
Generics are frequently used in the implementation of the Java Collection library. Perhaps the best-known example is a generic interface List<X>
, where X
is a type variable that may be bound to different types: List<String>
, List<Number>
, etc.
Other object-oriented languages such as C# and Scala have also adopted Generics, which illustrates that this is a very relevant topic today.
Problem
Historically, type variables are a concept from typed functional programming languages such as ML and Haskell. Instead, object-oriented languages predominantly used subtyping: since the subtype relation is based on the class hierarchy, this corresponds nicely to the concept of polymorphism in object-oriented languages.
When type variables are employed in an object-oriented language, it is very useful to provide some integration with the subtype relation. In Generics, this occurs primarily through the concept of bounded quantification. This means that the type variable is equipped with a subtype constraint, for example an upper bound. Consider the following example:
public static <X extends Number> X genUp(X x) { System.out.println(x.intValue()); return x; }
The generic method genUp
is defined with the type variable X
, which has the upper bound Number
. In other words, X
represents an unknown type which is a subtype of Number
. Examples of such types would be Number
itself, Integer
, Float
, BigInteger
, etc. As a consequence of the upper bound declaration, the body of method genUp
may employ a value of type X
as a Number
, in order to invoke the intValue
method. So the above example is legal Java code.
The concept symmetrical to upper bounds would be lower bounds. When a type variable is declared with a lower bound Integer
, it represents an unknown type which is a supertype of Integer
. Consequently, it is possible to treat a value of the lower bound type as a value of the unknown type represented by the type variable. For example, we may consider a method that - depending on some condition - returns either its argument or a fixed value:
public static <X super Integer> X genLow(X x) { if(someConditionHolds()) return x; else return 4; }
The method genLow
should return a value of type X
. Due to the lower bound of X
, we may return an Integer
, since any Integer
also belongs to type X
.
Unfortunately, upper bounds are not commonly considered in type systems, and the upper bound example is not legal Java code. A number of non-fundamental reasons are sometimes cited for this design, but a type system where type variables can have both lower and upper bounds (possibly at the same time) would provide an important improvement.
Approach
You would begin by studying the literature on Generics. Next, you would develop a proposal for a type system that includes both upper and lower bounds for type variables. This proposal should already deal with the specifics of including both kinds of bounds, possibly at the same time.
Afterwards, the thesis research may continue in one of two directions. The first direction is to investigate the practical implementation of the proposal. In Java, Generics is implemented by means of an erasure of the type variables. Other systems preserve the type variables at runtime. One of these systems would need to be selected and adapted to the proposal. The second direction would be to do a theoretical investigation of the proposal. To this end, you would extend the Featherweight Generic Java calculus with your proposal for both lower and upper bounds. We have experience formalizing and evaluating this calculus with the Coq proof assistant. This software allows the expression of formal assertions and mechanically checks the proofs of these assertions (it is not an automated theorem prover but includes some automated theorem proving tactics and various decision procedures). Compared to the traditional paper-and-pencil proofs, mechanical proofs checked by tools such as Coq rule out the possibility of mistakes by the proof author. They represent the current state-of-the-art in formal rigor. It would therefore be appropriate to apply this system here.
Over the course of the thesis, there may be a possibility to collaborate with researchers on the topic at École des Mines de Nantes, France and/or Aarhus Universitet, Denmark.
Contact
Contact Bruno De Fraine to discuss details or possible variations of this thesis proposal.
References
- Maurice Naftalin and Philip Wadler. Java Generics and Collections. O’Reilly Media, Inc., October 2006.
- Atsushi Igarashi, Benjamin Pierce, and Philip Wadler. Featherweight Java: A minimal core calculus for Java and GJ. In Proc. of OOPSLA ’99, pages 132–146. ACM Press, October 1999.
- Yves Bertot and Pierre Castéran. Interactive Theorem Proving and Program Development — Coq’Art: The Calculus of Inductive Constructions, volume XXV of Texts in Theoretical Computer Science. Springer, 2004