« The Dataphor Codex | Main | They charge for that? Seriously? »

Entity = Class ...don't do it!

In a recent discussion with a colleague Bryn and I expressed that we though the entity=class equation was a bad, he of course asked us to clarify, which is what I’ll lightly attempt to do with this posting.  Note that this post doesn’t attempt to address the argument some have made for the equation: class=scalar type, we’ll save that for another day.  In this post, I’ll call the real-world things we are modeling in an application (e.g. Customer, Order, Part), Entities, to be consistent with common term usage and to avoid overlapping relational or OO nomenclature. 

In a classic OO language the closest thing to a tuple from relation land is a class, where properties/field pairs are like attributes; collections, lists, hashtables, etc. are roughly analogous to relations. Though it has become almost standard to capitalize on this mapping, there are reasons that we believe doing so is not a good idea.  Primarily:

1. Abstraction begs separation.  In the absence of a framework we are left accommodating the same patterns repeatedly… the user is presented with a string of text for an attribute, which has a title, is valid if xyz , etc.  With a framework, we are able to declare the essence of the structures, there is an entity X with a,b,c attributes, xyz rules, and title, etc., and everything we need to manipulate that entity is already accommodated.  The trouble is that it is difficult in a not-so-dynamic language, to model the entity and the framework at the same level due to the separation of code and data.  In C#, for example, if our hypothetical framework knows about entities in the abstract, and entities are classes, then in order to “talk” about classes we must use reflection or have our entity data generated.  These happen to be exactly the two mechanisms used by today’s OO entity frameworks: reflection and generation.

2. OO languages stink as a data model.  The structural aspect is best described as a complicated network of structs and lists; the manipulative aspect is textbook imperative; integrity is provided through a combination of built-in types combined with information hiding.  As a result, it’s just plain harder to get things done as reliably and quickly as it would be in a relational system.  It's hard to ensure that business rules are met.  It's hard to “query” the data (even with declarative query syntax such as LINQ, the hierarchical nature of OO data makes querying difficult, and intrinsically tied to physical representations).  Seemingly simple things like, “does this entity equal that one”, become complicated by issues of reference vs. value semantics.

3. Generated entity classes are based on base table types, not any tuple type.  Not every derived conceptual entity for an application looks like some nested combination of base tables.  If generation doesn’t yield an entity class with the structure needed, then the developer must create their own;  Due to the overhead of entity frameworks, this is not trivial.

4. The time and resources it takes to generate, compile, load, and serialize a large number of entity classes can be a serious detriment to productivity and performance.

We’ve been working for some months on a project that utilized NetTiers, a set of code generation templates for producing entity classes en mass from table definitions.  We inherited this particular aspect of the architecture, and went along with until recently.  Unfortunately, our initial concerns turned out to be justified and we ended up carving it out and replacing it with a custom-rolled solution.

So what did we replace the generated entity classes with?  Well, if you must work in a non relational language—as most of us often must—then why not at least construct a pseudo relational sub-system.  This doesn’t mean necessarily building a full blown relational engine.  A workable compromise is to build classes that roughly mimic the most essential aspects of the structural, manipulative, and integrity aspects of the relational model.  For instance, a tuple can be simulated by a generic Entity class that is of some EntityType, which are capable of describing any entity (defined at run time).  Attribute and relation variable level constraints can be modeled as constraint objects associated with the Entities and their columns.  Queries can be defined through declarative query structures, such as filters and such.  In this way, many of the benefits of the relational model can be attained while preserving the surrounding front-end and back-end logic.

Here are some of the arguments that were made for generated/reflected entity classes, and some responses:

1. Provide a means to bring the entity into the “native” language of the application.  This would be true, but is subject to the assumption that the native language has reasonable isomorphisms for what is being modeled (more below on this).

2. In theory, one might assert that getting and setting column values from application code would be faster with entity classes.  In practice, however, the entity class requires things like notification, state management, and validation on each property setter, so the generic entity is probably nearly the same “speed” (not likely to be source of any measurable performance impact on an application anyway). 

3. "Strong typing”.  That is, because “native” types are being used for the types of the properties, there will be an added measure of integrity.  This is of course not the case overall; at best it might catch a major mismatching of types at compile time rather than at runtime, but a) this assumes that there are native types that adequately represent the entity’s attributes, and b) this issue can be mitigated in a runtime based framework by using typed assessors (e.g. AsInteger).  To expand on point A, note that an application schema may demand all manner of rules on columns; far richer constraints than “the value must be a string”, etc.  Where a property of type string might be used in an entity class, the real requirement may be a 30 character valid identifier, which is actually more “strongly typed”.Though the client was nervous about making such major changes late in the project, the benefits of the new framework were immediately apparent.  They include:

1. Much faster compile times – Several minutes down to a few seconds.

2. Schema changes are now much easier – it used to take ½ day to change, regenerate, commit, and test a single schema or template change.

3. Integrity and security are consistently applied throughout the application

4. Queriability has dramatically improved, most procedures and custom access paths are eliminated.

5. Most of the server side code is simply gone.  Thousands upon thousands of lines of it.

6. Load and runtime performance is much better, even before tuning.

7. Almost all CRUD can be done without writing data access code.

TrackBack

TrackBack URL for this entry:
http://databaseconsultinggroup.com/blog-mt/mt-tb.fcgi/13

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)