« Goodbye, JScience. Hello corm-quantity | Main | The tense relationship between JPA, enums, and generics »

Implications of persistent SI standard units

(Log of current work)

I'm very glad I've emphasized the creation of cascading behavior tests for CORM. I caught a potentially very tricky issue tonight while doing so, and it relates to standardized SI base units and derived units. Ideally, for the sake of maintainability on the database end of things, each of the SI base units would have a single row in an SIBASEUNIT table. There are very few of these base units. So few, in fact, that it might not even be necessary to add them to the database. It might be better to create them as an enum and represent them as a string-valued column in another table, perhaps TERM.

The problem with this approach is that Term objects can also be constructed with derived units. It shouldn't matter how people construct their Terms and Units, so long as the data is able to be kept around and made useful at a later time.

So, imagine this scenario: I create a class called "SI", and it contains several hundred units. These are all units the end user could create on their own. SI is just a convenience class that provides a giant collection of pre-built units, and maybe even a collection of conversion contexts. There is no reason end users could not reproduce every bit of functionality provided by this class.

Imagine also that I implement Unit.java in such a way that if you created two units with identical data, unit1.equals(unit2) would evaluate to true.

Furthermore, imagine that each Unit instance is an entity with its own ID field. Now, imagine you call:

Unit unit1 = SI.KILOGRAM;
Unit unit2 = UnitFactory.baseUnit(SI.mass, "kg");
// unit1.equals(unit2) == true

This is just a pre-built unit, provided for your convenience, right? Great. Now, use a JPA persistence manager to push unit1 into a database, then you'll find that unit1.getID() returns a nonzero value. Here comes the kicker--ready for it? what happens when you persist unit2?

Moreover, what happens if you refer to SI.KILOGRAM from other classes? Do they all now use the same object in the datastore? Probably not. But should they? Maybe. How do we tell? What's the correct way to build this?

It's a sticky problem. Equivalence on the logical tier might not imply equivalence in the datastore. If we have duplication of the same data across numerous rows in the datastore, what are the implications for table efficiency, sorting, and the like? Should I push for 3rd normal form? How far down should I disassemble these units?

So, that's what I'm brooding over tonight.

TrackBack

TrackBack URL for this entry:
http://blog.lib.umn.edu/cgi-bin/mt-tb.cgi/75381

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)