« Implications of persistent SI standard units | Main | Mint Properties LLC »

The tense relationship between JPA, enums, and generics

In the last two months, I've come to understand in excruciating detail the various tradeoffs between using generics and enums in my JPA-ready entity library. Most recently, I've been inspired to write down some of my notes, to save myself and others some headache in the future.

First of all, as always, a pattern is necessary to illustrate the problems I've seen. Consider the task of persisting a unit definition. Here are some examples of instances of a Unit object: kilogram, second, meter, candela.

Clearly these objects all would benefit from having a name, description and ID field. But consider these as well: joule, watt, newton. Now, the first three units were SI base units. The second three can be defined in terms of the first three. For example, a newton is equal to m∙kg∙s−2. So it becomes clear that units need to be able to be defined according to an underlying terminology.

So, let's say we have a Term class, with a coefficient, a radix and an exponent. The newton Unit instance would now contain a List of three Term instances. The first term has a coefficient of 1, a radix of the "meter" instance, and an exponent of 1. The second term is much like the first, save for the fact that its radix is equal to the "kilogram" instance of the Unit class. The third follows a similar pattern, but in addition to referencing the "second" Unit instance, it also has an exponent of -2.

Now, for consistency and convenience, we'll define base units in terms of themselves. So, a "meter" instance contains a list of one Term, with a coefficient of 1, a radix of the "meter" instance, and an exponent of 1.

Finally, we can also give the Unit class a boolean field, in order to distinguish between base units and derived units. This very basic definition of units and terms will suffice for what I'm trying to explain.

Now, so far, we have two entities, Unit and Term, and a very simple Many-to-Many relationship between them. But here's where the trouble starts.

Part of the purpose of capturing the concept of a Unit in an object-oriented manner is so that we can use them to create constraints on logical behavior that are more rich and efficient than we'd be able to accomplish were they mere text fields. We also want to capture, in some way, the relationships between different units.

To illustrate the first point, consider that you're writing an application where you want to add together a collection of units to determine their sum total. Now, imagine your assumption is that you're trying to get a combined measurement of mass. If you have an object that represents "3 kilograms" and another that represents "5 kilograms" you can easily accomplish this. But what if in addition to these, someone slipped in an object representing "4 seconds"? What is 3kg + 5kg + 4s?

Now we're dealing with dimensional analysis. There are rules that stipulate, you cannot add these together. It'll break your logic. So, what you want to do--what you need to do, is to somehow capture the idea that your "kilogram" units are tied to the concept of "mass", that your "second" objects are tied to the concept of "time", that your "tesla" objects are tied to the concept of "magnetic flux density", and so forth.

But what are these concepts "time", "mass", "magnetic flux density", "photoelastic work", "molar entropy", and so forth? Well, it turns out there is some fog in the answer to that question. Depending on which poorly written wikipedia article you find, these concepts are collectively called "quantities", "dimensions", "magnitudes" or even "quantitative properties of particles". It turns out there is no highly rigorous name for them, but since Object-Oriented programming is nothing if not nominalist and aristotelian in nature, they needed to be named. In order not to limit my future use of the above terms, I decided against adopting any of them and named these objects according to their relationship to the Unit. Since these concepts serve to limit the scope of what the unit can be used to quantify, I refer to them as the Quantitative Scopes of a unit. Please, if you're a physicist studying dimensional analysis, don't be upset.

Because, whatever these are called, we are now straying into trouble with Java. Case in point, how does one best represent a quantitative scope?

Well, generics give us one tantalizing option. I'd like to be able to create a new Unit<Mass>(), and keep it in a Set<Unit<Mass>>. I think this would afford me with the best and easiest way to constrain the use of these units. However, that leaves us with the difficult problem of how "Mass" is represented.

We have only two options here. Class or Interface. Class is problematic, because every instance of "Mass" would be identical to every other. So, it makes more sense to use Interfaces. Ah, but herein lies the rub, because our original goal was to make these objects persistent. And Interfaces, bless their bytecode, certainly do not fit this bill.

So, it seems objects are the only option. But this, once again, leaves us with the problem of instance control. I could rattle off 126 examples of a QuantitativeScope object, each differing from the other only in name. But if we define each as a class, then presumably we'd have 126 database tables filled with carbon copied records, which is just not going to happen. Thus, the quandary. Interfaces cannot be made persistent, but classes are the wrong instrument to accomplish the goal.

Well, what about enums? It's an idea--an enum would nicely solve the persistence problem, but it doesn't save us on the application layer because enum-valued objects cannot be used in generic fields. Furthermore, enum-valud objects cannot be given generic fields themselves. This makes sense, given what enums are for, but it leads us back to the same problem. How, given all three of these tools, are we to accomplish the goal of being able to discriminate easily between units of different quantitative scope while not abandoning the ability to persist the objects?

I came up with an ugly hack to solve this problem. The good news is that it makes the best use of the available technology that I'm able to determine. The bad news is that it's an ugly hack and it fills me with doubt about Java and JPA. But I'm invested in making this work, so here goes:

First, I created a "Scope" enum, with 126 different values in it. Scope.Mass, Scope.Time, etc. Then, I made 126 corresponding interfaces, "Mass", "Time", etc., that extend a base "QuantitativeScope" interface. Third, I made a generified "Graft" class that serves to bind one of these interfaces "Q extends QuantitativeScope" to one of these enum-valued objects. Finally, I defined a library class containing 126 public abstract final instances of this "Graft" class, each mapped to the appropriate "Scope" enum-valued object. With these graft instances, I was set.

Now, when defining a Unit, I can pass one of these "Graft" objects to the UnitFactory. It can get both the generic type from this Graft object, and assign the Graft.getScope() enum-valued object to a "scope" field in the Unit class. When I persist the Unit into JPA, the "scope" field, which is defined as @Enumerated(EnumType.STRING), goes into the database. When I get a collection of Unit objects back out of the database, I can run them through a seive and inspect each of their Unit.getScope() values in a switch statement, then place them into appropriately generified Set objects. This piece works sort of like a coin sorter, but when I'm done I can ask for the Set<Unit<Mass>> and know that my results are reliable.

The main problem with this workaround is that I had to duplicate a lot of data and encase it into interfaces, a large enum, and a graft object library in order to make it function. There are other lingering problems with this approach as well, and I'm sure that I'll uncover more and more of them as I continue.

What this has taught me is that enums and generics are exceedingly tricky to use. Although with generics, Types can now be used as compile-time constraints on behavior, they don't help you at runtime, and are therefore tough to work into a persistence application. This worsens the intrinsic impedence mismatch between the application layer and the persistence layer in application design. Further, enums in Java behave like pseudotypes, somewhere between Interfaces and Classes, but because they cannot be used as Generic Types, they even further aggravate the impedence mismatch when worked into a persistent application design. If I could have used the enum valued objects in the generic fields, this would be a non-problem.

Finally, the best solution for my particular problem might have nothing to do with enums or generics after all. What I'm trying to replicate through this design is actually Invariants, Preconditions and Postconditions on method behavior, class definitions, and collection compositions. There are languages such as VDM-SL and Eiffel that wonderfully exemplify this sort of language feature, and old tools such as iContract that might make it useful in Java, but it's a shame that these useful tools are not built into the language itself.

TrackBack

TrackBack URL for this entry:
http://blog.lib.umn.edu/cgi-bin/mt-tb.cgi/75490

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)