« August 2007 | Main | November 2007 »

September 20, 2007

A Name Does Not a Data Type Make!

I've always been dissatisfied with the level of user-defined type support available in existing SQL-based DBMSs, namely very little. Take Microsoft SQL Server, for example, which at least allows for a pretty close approximation of a user-defined scalar type. Or at least they used to. In the interest of clarity, I will describe the existing support:

Microsoft SQL Server allows the developer to create an alias for a system-provided type. For example, one could define a type called Phone as an alias for varchar(20). In addition, the type alias is allowed to specify a default nullability for columns defined of this type, and a 'rule' expression can be applied, similar to a column-level check constraint, except that the developer only has to provide the definition one time, on the type alias. Then all columns declared to be of that type (loosely speaking), are subject to that rule.

Of course, it is not an actual data type, because local variables declared using these type-aliases are not subject to the rule defined for the type. The documentation even has this little gem:

"Because rules do not test variables, do not assign a value to an alias data type variable that would be rejected by a rule that is bound to a column of the same data type"

But hey, I'll take what I can get. So I tend to use this functionality rather extensively in Microsoft SQL Server applications, as they provide the closest approximation to the full-fledged type support that I consider indispensible. In Microsoft SQL Server 2005, they even introduced syntax for them to make it easier to use these beasties, the create type and create rule statements. It's not complete though, because you still have to use the sp_bindrule procedure to actually enforce the rule, but at least they tried, right?

At any rate, one of the projects on which I am currently working uses one of these type-alias bound rules to enforce a particular constraint. Recently however, the DBMS began rejecting inserts for a table with one of these rules bound to a column. So I determine that the error is occurring inside a stored procedure generated by netTiers for the purpose of inserting data into the table. However, when I attempt to insert the data myself by copying the insert statement out of the procedure, it works!

How is this possible? Well, according to the DBMS, the error is

'insert or update statement conflicted with a rule previously imposed by a create rule statement'

But the insert has a null for the column that is causing the violation! So if you have a rule defined on a column and you have an insert statement that attempts to insert a null into that column and that insert statement is compiled into a stored-procedure, you will get an error, even though the same insert statement executed in a batch will work fine. Great. So I go looking for the bug report, but find nothing. So I verify my database version. By the way, if you don't have this link on your top 10 list, you should, this site maintains an excruciatingly detailed list of build numbers for SQL Server and what the officially sanctioned Microsoft title is for them:

http://www.sqlsecurity.com/FAQs/SQLServerVersionDatabase/tabid/63/Default.aspx

But I have the latest version, so no help there. And now we come to the most painful part. Other people have encountered this problem, but the official line is that rules are a deprecated feature so don't count on getting this defect resolved. Aaarrrggghhh!!!!

So I workaround the issue by redefining the rule to explicitly test for and allow the column to be null. But what is going on here? Why is Microsoft deprecating the best reason for defining type-aliases in the first place? Without rules, they're just names, and not terribly useful. The rules actually allowed the developer to approximate first class types, which are a crucial part of enforcing the integrity of the database. I've said it before and I'll say it again, 'A Name Does Not a Data Type Make!"

Bryn Rhodes
Database Consulting Group LLC

September 19, 2007

Cursor you Microsoft SQL Server!

At least one ideal to which a database language should strive is the ability to express complex logic simply, concisely, and above all, expressively. This means that having to invoke imperative statements to solve a particular problem should be a last resort within the language. And I suppose that was the case in the story I am about to relate, but the point is, it shouldn't have been!

The problem at hand is a classic hierarchical query with a twist: the tree structure is expressed in a partially temporalized manner that allows for historical as well as future-dated information. Without being able to get into the details, I'll just say that the query to get the actual information as of a given date requires the use of an aggregate. And this leads me to the first problem I had: bit values cannot be used in aggregates. Because no-one would ever want to take the Max() of a bit value. Right.

 Well it just so happens that I did need to take the Max() of a bit, and I assumed that would just work when I was writing the stored procedure embedded in an XML document produced by a C# class generated by a CodeSmith template, but I digress... The point is, it takes about 20 minutes to generate all the classes that actually write the SQL and then rebuild the database with the newly generated SQL procedures. Only then do I actually get the error that Max() cannot be invoked on a column of type bit. So what's the work around:

Max(case when BitColumn = 0 then 0 else 1 end)

Aarrgghhh... But if there's one thing I've learned about T-SQL, it's that the only thing that is consistent about the language is that it is inconsistent. Moving on to the recursive bit of the problem, now that I'm able to express (albeit with some circumlocution) the necessary core part of the query, I decide to use the fancy new common table expression feature of Microsoft SQL Server 2005 to provide a recursive solution to the hierarchical query problem. Beautiful! Or so I hoped...

After quite some time digging through the documentation on these things, I finally figure out how to actually express the solution to my particular problem as a recursive query using a common table expression (the syntax is almost impenetrable, but again, I digress...) and now the compiler blissfully informs me that, wait for it:

'agreggate invocation is not allowed within a recursive query definition'!

Are you kidding me! So once again, a sparkly new feature of SQL Server is hamstrung by an artificial restriction. Hey, at least the product is consistent. Every time I try to use a new feature, I find out that there is some restriction on its usage that renders the feature useless for any real-world scenario.

Take cascading referential integrity, for example. Yeah! We finally have cascading referential integrity! No wait, you can't use it to maintain a hierarchy, that would be far too useful!

Take instead of triggers. Yay! We finally have instead of triggers! No wait, you can't use them to distribute an update to a view that also has an instead of trigger defined, that would be far too useful!

 Take user-defined aggregate functions. Yay! We finally have user-defined aggregate functions! No wait, you can't define them in T-SQL, you have to write a .NET assembly to implement them!

Take disabling of foreign key constraints. Yay! We can disable foreign key constraints instead of having to drop and re-create them! No wait, you still can't truncate a table with foreign key constraints defined, even if they are disabled! That would be far too useful.

Take table-valued functions. Yay! We finally have table-valued functions! No wait, you can't have a table-valued function that takes a table-valued argument! That would be far too useful.

And this list is by no means exhaustive, I could go on, but I seriously digress. The point of all this is that to solve the problem at hand, I was forced to use a cursor. Not because of some theoretical limitation of relational algebra, but because of an implementation detail. Now that's frustrating.

I have encountered problems that I have been unable to express as a cursor, usually things involving inherent ordering or side-effects, but this was not one of them. I should have been able to, and I couldn't, end of. My database language should help me solve problems, not create problems of its own. And that's why we need a new database language. Enter Xix...

Bryn Rhodes
Database Consulting Group LLC