« March 2008 | Main | May 2008 »

April 16, 2008

They charge for that? Seriously?

I have always been under the impression that the Developer Edition of SQL Server 2005 was free for development use. In fact, I'm pretty sure that was the case at some point. However, as I was informed by the MSDN Online Concierge today, it does require a license. Oh, and by the way, the license is not included with your MSDN Premium Subscription.

It makes no sense to me to charge for the development edition. As Nate pointed out, even Oracle with all its bloat is free for development use, but never mind that. In an effort to show my dismay, I resorted to a childish display of sarcasm which I thought was sure to elicit a free license. After all, it's only $50 we're talking about. However, as the following chat excerpt shows, no such luck:

 

4:50:39 AM

 
 You
 The SQL Server Developer edition is not included in Visual Studio Professional with MSDN Premium.
4:51:07 AM
 
 Bryn
 Wow.
4:51:31 AM
 
 Bryn
 And just when I was starting think SQL Server was a good alternative to MySQL.
4:51:58 AM
 
 Bryn
 So let me get this straight, Microsoft wants to discourage me from developing solutions for their technologies?
4:53:23 AM
 
 You
 No, if you buy a Visual Studio Developer, or above Edition with MSDN Premium you can get SQL Server Developer edition.
4:54:55 AM
 
 Bryn
 Okay, I'll start looking for a different technology. Thanks.
4:55:10 AM
 
 You
 You're welcome.

I guess I'll have to try harder next time :)

Bryn

April 03, 2008

Entity = Class ...don't do it!

In a recent discussion with a colleague Bryn and I expressed that we though the entity=class equation was a bad, he of course asked us to clarify, which is what I’ll lightly attempt to do with this posting.  Note that this post doesn’t attempt to address the argument some have made for the equation: class=scalar type, we’ll save that for another day.  In this post, I’ll call the real-world things we are modeling in an application (e.g. Customer, Order, Part), Entities, to be consistent with common term usage and to avoid overlapping relational or OO nomenclature. 

In a classic OO language the closest thing to a tuple from relation land is a class, where properties/field pairs are like attributes; collections, lists, hashtables, etc. are roughly analogous to relations. Though it has become almost standard to capitalize on this mapping, there are reasons that we believe doing so is not a good idea.  Primarily:

1. Abstraction begs separation.  In the absence of a framework we are left accommodating the same patterns repeatedly… the user is presented with a string of text for an attribute, which has a title, is valid if xyz , etc.  With a framework, we are able to declare the essence of the structures, there is an entity X with a,b,c attributes, xyz rules, and title, etc., and everything we need to manipulate that entity is already accommodated.  The trouble is that it is difficult in a not-so-dynamic language, to model the entity and the framework at the same level due to the separation of code and data.  In C#, for example, if our hypothetical framework knows about entities in the abstract, and entities are classes, then in order to “talk” about classes we must use reflection or have our entity data generated.  These happen to be exactly the two mechanisms used by today’s OO entity frameworks: reflection and generation.

2. OO languages stink as a data model.  The structural aspect is best described as a complicated network of structs and lists; the manipulative aspect is textbook imperative; integrity is provided through a combination of built-in types combined with information hiding.  As a result, it’s just plain harder to get things done as reliably and quickly as it would be in a relational system.  It's hard to ensure that business rules are met.  It's hard to “query” the data (even with declarative query syntax such as LINQ, the hierarchical nature of OO data makes querying difficult, and intrinsically tied to physical representations).  Seemingly simple things like, “does this entity equal that one”, become complicated by issues of reference vs. value semantics.

3. Generated entity classes are based on base table types, not any tuple type.  Not every derived conceptual entity for an application looks like some nested combination of base tables.  If generation doesn’t yield an entity class with the structure needed, then the developer must create their own;  Due to the overhead of entity frameworks, this is not trivial.

4. The time and resources it takes to generate, compile, load, and serialize a large number of entity classes can be a serious detriment to productivity and performance.

We’ve been working for some months on a project that utilized NetTiers, a set of code generation templates for producing entity classes en mass from table definitions.  We inherited this particular aspect of the architecture, and went along with until recently.  Unfortunately, our initial concerns turned out to be justified and we ended up carving it out and replacing it with a custom-rolled solution.

So what did we replace the generated entity classes with?  Well, if you must work in a non relational language—as most of us often must—then why not at least construct a pseudo relational sub-system.  This doesn’t mean necessarily building a full blown relational engine.  A workable compromise is to build classes that roughly mimic the most essential aspects of the structural, manipulative, and integrity aspects of the relational model.  For instance, a tuple can be simulated by a generic Entity class that is of some EntityType, which are capable of describing any entity (defined at run time).  Attribute and relation variable level constraints can be modeled as constraint objects associated with the Entities and their columns.  Queries can be defined through declarative query structures, such as filters and such.  In this way, many of the benefits of the relational model can be attained while preserving the surrounding front-end and back-end logic.

Here are some of the arguments that were made for generated/reflected entity classes, and some responses:

1. Provide a means to bring the entity into the “native” language of the application.  This would be true, but is subject to the assumption that the native language has reasonable isomorphisms for what is being modeled (more below on this).

2. In theory, one might assert that getting and setting column values from application code would be faster with entity classes.  In practice, however, the entity class requires things like notification, state management, and validation on each property setter, so the generic entity is probably nearly the same “speed” (not likely to be source of any measurable performance impact on an application anyway). 

3. "Strong typing”.  That is, because “native” types are being used for the types of the properties, there will be an added measure of integrity.  This is of course not the case overall; at best it might catch a major mismatching of types at compile time rather than at runtime, but a) this assumes that there are native types that adequately represent the entity’s attributes, and b) this issue can be mitigated in a runtime based framework by using typed assessors (e.g. AsInteger).  To expand on point A, note that an application schema may demand all manner of rules on columns; far richer constraints than “the value must be a string”, etc.  Where a property of type string might be used in an entity class, the real requirement may be a 30 character valid identifier, which is actually more “strongly typed”.Though the client was nervous about making such major changes late in the project, the benefits of the new framework were immediately apparent.  They include:

1. Much faster compile times – Several minutes down to a few seconds.

2. Schema changes are now much easier – it used to take ½ day to change, regenerate, commit, and test a single schema or template change.

3. Integrity and security are consistently applied throughout the application

4. Queriability has dramatically improved, most procedures and custom access paths are eliminated.

5. Most of the server side code is simply gone.  Thousands upon thousands of lines of it.

6. Load and runtime performance is much better, even before tuning.

7. Almost all CRUD can be done without writing data access code.

The Dataphor Codex

We have often talked about having a public forum where developers can share code, from interesting tidbits all the way up to complete applications and everything in between. We now have such a forum: The Dataphor Codex.

The Codex is an open-source project repository that allows projects to share common libraries. Each new project is simply a new top-level folder in the repository. Anyone can commit to the Codex, but submitters are encouraged to register as users of the repository to allow commit-level tracking in the repository.

The projects in the Codex are all released under the same Modified BSD license as Dataphor, allowing the source code to be freely distributed and used in any endeavor.

Currently, the Codex contains a Common project with a Codex.Base library that contains a few simple types likely to be used in any application, and the first few skeleton modules of the Brooks Project Management System, yet another defect tracker with less code!

In its current form, the Codex is nothing fancy, just a single repository, so issues like version control and change management will have to be considered at some point. However, we feel that something is better than nothing, and hope the Codex will finally provide a place for Dataphor developers to freely exchange ideas and solutions.

For the url to access the repository, visit the Source Code Access page on the Dataphor.org site:

http://www.dataphor.com/index.php?title=Development:Source_Code

Regards,
The Dataphor.org Team