Goodbye, JScience. Hello corm-quantity
I finished stripping the JScience dependencies out of my CORM project modules, and replacing them with the new corm-quantity module. Like a proud new parent, I'm very excited to see corm-quantity in action. It is not perfect, but it's a good solution for a JPA-capable representation of units of measurement. Plus, even with all its warts, the first cut of corm-quantity is 56k, while JScience 4.3.1 weighs in at 668k. That's a 12-fold improvement!
While working on this package I learned a lot about dimensional analysis. In short, dimensional analysis is a fancy way to describe the algebraic manipulation of units of measurement. The best example I can conjure would be Google's unit conversion utility. Or:
1 watt / 1 joule = 1 hertz
Since "1 watt" equals "1m2∙kg∙s−3", and "1 joule" equals "1m2∙kg∙s−2", "1 watt / 1 joule" equals "1m2∙kg∙s−3 / 1m2∙kg∙s−2". Or, "1m2∙kg∙s−3 ∙1m-2∙kg-1∙s2". When you really think about it this way, it's no different from "aX2∙bY3 ... "
Each of these terms (1m-2, for example) is composed of a coefficient, a radix, and an exponent, just like any other algebraic term. Once you have represented units as a name mapped to a list of these terms, you can perform simple and elegant conversions to create derivative units. In our example above, we could perform the following transformations:
- Suppress all coefficients of "1", as they are redundant. i.e., don't print them.
- Add the exponents of terms with the same radix.
- Any term with an exponent of 1 and a coefficient of 0 reduces to a term of unity, and is removed.
These transformations would yield:
"m2-2∙kg1-1∙s2-3", or
"m0∙kg0∙s-1", or
"1∙1∙s-1", or
"s-1", which is the definition of
"1 hertz".
Now, the important part in all of this is that the transformation logic used to build and combine units is not part of the units themselves!. It can be implemented in any number of packages--an entire library of useful unit building packages and pre-built unit types could be designed. But, the bottom line is that each unit consists of the following elements:
- A name
- A description
- A symbol
- A list of terms.
Each term consists of:
- A coefficient
- A radix, which is another unit
- An exponent
If the unit is a base unit, it contains exactly one term, which is self-referential. For example, a "meter" contains the term "1 meter1". Thus, base units are self-defined. Derived units, on the other hand, are defined in terms of base units. Now, with only three entities, we can represent units of all kinds.
Almost.
I found, for the sake of unit conversion, that it were best for my purposes to represent the coefficient as a rational number. This helps reduce the risk of overflow during unit conversions. I struggled for a long time trying to determine the best way to implement this, including using Number and permitting users to retrieve the value using longValue(). However, this didn't work. Spectacularly. It has to do with the inability to distinguish between an integral type and a floating point type in Java. If I could create an "Integral" object, and reliably get int, long or short out of it, that'd be great. But with the ever-looming threat of precision loss because of floating-point multiplication, I gave up and fell back on rational representation using long numeric primitives. Remember--my most important goal is persistence with this package. So, is this the best way to get the job done? It depends. For my job, certainly. So I re-jiggered a Rational implementation to use longs and brought the entity count to four.
And my work was nearly complete. But for a few important considerations. First, unit conversion. I want to be able to represent a US pound as somehow related to a US ounce. Now, for many units, the relationship is fixed and will never change. However, for many other units, particularly those in a market context, the exchange ratio between units is in flux and subject to change. So, I used dimensional analysis for the simplest possible answer. I created the notion of a conversion context which is itself very similar to a unit. A conversion context is just a list of terms, from which arbitrary unit transforms can be made.
For example, if I define a unit of "gold" with the symbol "Au" as a base unit, and a second unit, "Thorium" with the symbol "Th" as a base unit, I could create a conversion context containing the terms "1 Au1, 100 kg-1, 1 Th-1". This essentially represents 1Au/100 Kg Th" Now, if I multiply 200 Kg Th by this ratio, I get the result of "2 Au". The actual formulas that provide the conversion can be written by anyone capable of high-school algebra. The important part is that the units themselves, as well as the conversion definition, are all simple, small, and able to be persisted in an EJB3 system.
In order to illustrate these conversion formulas, I build a derived unit factory and provided a stub collection of SI units and simple utilities, as well as a smattering of US and British units. And at this point I was very nearly finished.
But not quite. There was one outstanding matter, and it was more difficult than either of the above. It involved what JScience referred to as "Quantity". Examples of this include "length", "area", "time". I did a lot of research on this in the "dimensional analysis" field, and basically boiled down the following observations:
- There are a LOT of names for this thing--JScience called it "quantity". Others call it "dimension". Still others call instances of this concept "quantitative properties of matter", or "quantitative properties of things". I struggled between all of these names and was never satisfied until I realized that:
- A unit is limited in the scope of what it can quantify. A "meter" cannot quantify radiation dose absorbed or weight. A "second" cannot quantify mass. So "quantitative scope" seemed like a more useful name for the object.
- Each unit should have a reference to this quantitative scope, so that units can be sorted and more efficiently searched by this field.
- In Java, it is particularly useful to be able to say "new Unit
()", and deal units and their quantitative scope as generic types. This is especially useful in collections. Thus, each instance of these quantitative scopes could not be gathered into a Java enum. - The list of quantitative scopes is not conclusive or final. This supports the earlier conclusion not to represent these using a Java enum.
- If the relationship between the unit and its quantitative scope is to persist, each type (area, length, time, etc.) cannot be interfaces. Interfaces cannot be persisted with JPA.
- Thus, the only option is to represent each of these which is currently known as its own class object.
- I found 48 of them by digging online.
- Every instance of "area" or of "length", etc., should be identical to every other instance of the same type.
- The best way to represent these objects is as a single table hierarchy with a discriminator column. It's efficient, and I can use a non-generated primary key, so application-defined primary key field.
These were my observations, so that's how I built the system. It isn't as perfect or elegant as I could have written in another language or if persistence were not a requirement. But it's pretty clean.
Now I can extrapolate from this and be able to represent prices of different quantitative scopes, including "money" and "product", and use it to implement a currency market or even a commodities market (or a combination of both!). For my needs, it's a better tool. Smaller, cleaner, JPA compatible, and it has a very clear separation between entity representation, systemization of units, and transformational logic. No funny business under the hood, either. Just a small, clean, OO solution for a commercial object relational model.
I'm going to finish some tests for it over the weekend, tag an interim release, and then write all of the JPA annotations and draft up a handful of persistence tests. Expect news of a release very shortly.
