« July 2007 | Main | September 2007 »

August 21, 2007

Testable User Interfaces by Layering

User interfaces are notoriously difficult to test.  In recent architecture work, I've noticed that today's development culture, especially Test Driven Development (TDD) advocates, will significantly alter their frameworks and methodologies to facilitate a higher degree of testability.  For example, a primary reason cited for the added time and complexity of implementing the Model View Presenter (MVP) pattern is the purported testability benefits.  The pattern's advocates assert that separating a maximal amount of user interface logic into a non user-interface encapsulation, minimizes the amount of logic tied to the more difficult to test user interface.  Considering the TDD stance, where tests are written before the code, such layering would seem particularly imperative.

Let's step back a moment, however, and ask why user interfaces are difficult to test.  One reason might be that user interfaces are nearly the last thing to mature in an application, making them subject to the most variables.  Another reason, perhaps, is that GUI frameworks are often not designed with enough reflective APIs to allow programmatic automation of testing practical.  Probably the primary reason, however, is merely complexity.  Unlike most programmatic APIs, even a simple user interface typically consists of myriad potential action sequences, modalities, and interdependencies.

I'll get to my point.  Though user interactions may be complex, standard software engineering practices should be applied to mitigate the complexity.  Specifically, abstraction, layering, encapsulation and such.  This may seem obvious, but it is the case that standard practices involve building user interfaces at a relatively low levels of abstraction.  There are several conceptual levels of abstraction between forms, controls, and GUI logic, and the desired logical outcomes of data manipulation and analysis.  For instance, there is a layer that exists at the level of "here are a set of attributes that are to be edited by the user", which is typically stated at a much lower level ("here are some controls which are bound to...").  Among other advantages, making more layers explicit allows for each layer to be tested and verified independently.

These ideas may all seem theoretical, but are in fact based on my experience with Dataphor.  Because more of these conceptual layers were explicit, I found that user interface testing was almost never an issue.  Defects in the user interface were almost entirely either: a) in one of the abstraction layers (e.g. controls, or derivation engine); or b) manifestations of logical errors.  In the case of a, the defect could be resolved in the system and would fix all similar scenarios.  These instances were relatively rare, however, because it is practical to build automated tests for limited scope layers and the reuse of lower level layers naturally flushes out defects.  In the case of b, the exposure of logic errors is a desirable side-effect of declarative systems.  Due to these facets of Dataphor, I was able to build (generate mostly) literally hundreds of almost defect-free forms in a matter of a few weeks.

To conclude, rather than let the testing tail wag the architecture dog, it would seem wiser to better identify and automate the various conceptual abstraction layers and use the deductive method to maximize quality.  Architects should identify and automate more of what is common between their user interfaces, and concentrate on testing that automation.  With that, it won’t be necessary to test things like tab order, and accelerator key conflicts, because such things will have been automated.

August 16, 2007

So You Want To Know What You're Bound To?

While the Data Binding infrastructure provided by the Windows Presentation Framework is an extremely powerful and flexible tool, that added power comes with a significant amount of complexity. One issue I ran into recently seems like a simple enough question to answer, and one that would come up often in daily development: 'To what am I bound?'

The simplest answer is, the DataContext for the control. This works for most scenarios because DataContext is a dependency property and will return whatever DataContext it finds up the containment hierarchy. However, there are ways to override the binding source for a particular binding, so this is not always the right answer.

So after a fair amount of digging we find the BindingExpression.DataItem property. This is better because it gives us a way to find the data source for the binding independent of the way the binding is actually configured. However, because of the power of the PropertyPath, we still only know the root item to which we are bound, not the actual instance that immediately contains the property to which we are bound.

So I turned to the web. Unfortunately, I found nothing. After hours of searching MSDN documentation, WPF blogs, and anything I could get my hands on, I still had nothing. But it's clear (because binding works) that internally, the BindingExpression knows what it is bound to.

 So I turned to ildasm! Through a combination of run-time watches and ildasm references, I found the structures which contain the resolved property path for the binding expression. Unfortunately, they are all hidden in the MS.Internal.Data namespace deep in the private bowels of the framework.

So I turned to reflection! And now I have the answer. If I had access to all the MS.Internal.Data namespace structures (and private members :) I could simply write this expression to get the information:

SourceValueState[] svs =
 ((MS.Internal.Data.ClrBindingWorker)be._worker)._pathWorker._arySVS;
return (WeakReference)svs[svs.Length - 1].item;

The _arySVS array contains weak references to each instance variable along the resolved property path for the binding. So all we have to do is use reflection to get at the values and we can determine the immediate data source for any binding!

BTW, Sorry for the formatting, I'm still trying to figure out how to get that to work :)

public static object GetImmediateDataSource(BindingExpression be)
{
  FieldInfo workerFieldInfo =
    typeof(BindingExpression).GetField("_worker", BindingFlags.Instance | BindingFlags.NonPublic);
  object worker = workerFieldInfo.GetValue(be);
  FieldInfo pathWorkerFieldInfo =
    worker.GetType().GetField("_pathWorker", BindingFlags.Instance | BindingFlags.NonPublic);
  object pathWorker = pathWorkerFieldInfo.GetValue(worker);
  FieldInfo sourceValueStateFieldInfo =
    pathWorker.GetType().GetField("_arySVS", BindingFlags.Instance | BindingFlags.NonPublic);
  Array sourceValueState = (Array)sourceValueStateFieldInfo.GetValue(pathWorker);
  FieldInfo itemFieldInfo = sourceValueState.GetType().GetElementType().GetField("item");
  return ((WeakReference)itemFieldInfo.GetValue(sourceValueState.GetValue(sourceValueState.Length - 1))).Target;
}

And there you have it! A generic method for retrieving the immediate source instance for any particular binding expression. I used this as part of a larger class that handled the differences between the different types of bindings (binding, multi- or priority) so I could generically find the set of objects to which any given control was bound. Couple this with the LocalValueEnumerator to get all the bindings for a control, and you can do some pretty generic wire-up code for things like field audit history display and the like.

I hope this post will help others that have had the same question!

Bryn Rhodes

August 05, 2007

Why References Aren't Pointers

I recently found myself confronted with the question, should all tables have a surrogate key?  Well not exactly confronted in this manner, rather Bryn and I were asserting that a certain database should utilize some of its natural keys rather than use all surrogates.  In being challenged on this point, I choked, reciting some vague statement about references being data not pointers.  My argument seemed weak so I felt compelled to more rigorously address the topic.

Let's start with some bad reasons for exclusively using surrogate keys:

  • References to surrogates take less space than natural ones.  Those familiar with the relational model will instantly recognize this argument as crossing logical/physical boundaries.  In other words, an issue such as "space" is purely dependant on the system which implements the given logical schema.  A hypothetical system, for instance, might store related relation variables in a common structure where the common attributes are not duplicated, or may have some surrogate mechanism built in.  Even for systems with more direct physical isomorphisms, altering the logical schema for such physical considerations is paramount to optimizing for storage.
  • No need to worry about deducing the natural keys.  It should be clear that if the database designer isn't considering the cardinality of various attribute combinations, that designer is not likely to arrive at a successful schema design.
  • Allows auditing of what was changed in a particular data "slot".  It is not very likely that the auditing requirements call for knowing that the current home phone number was entered in place of the former work phone number.  I've only seen this as meaningful in cases where there is a natural order to the data, in which case a monotonically increasing series of numbers--as is often used for surrogates--actually is a natural key.
Reasons for using natural keys:
  • References to natural keys automatically enforce certain constraints.  This mechanism allows for referencing relvars to state dependencies that would otherwise be complicated to state.
  • Simplicity.  By introducing additional attributes and keys, surrogates increase the complexity of the schema.
  • Queriability is often improved by proliferating meaningful rather than meaningless attributes.  Joins can be avoided in cases where the key contains the desired information.  Depending on the implementation of the system, this might also provide a performance benefit.
  • Logical defects manifest themselves earlier.  Surrogates often hide design problems until systems are far along or are in production.  This is so for at least two reasons: 1) it is easier to neglect careful thought about the key; and 2) it is possible to enter data that might otherwise violate an integrity rule.
Reasons for using surrogate keys:
  • When a row in the relation variable (relvar) is logically unique regardless of the other attributes.  This goes back to the notion of a "slot".
  • If there is no clear natural key.  Much has been written on this topic, but basically: only choose a natural key that is relatively concise, unchanging, and unquestionably unique.
  • If the natural key is compound.  This is a gray area because referencing compound keys often allows for the automation of otherwise difficult to enforce integrity constraints.  At the same time, heavily compound references often imply integrity constraints that are not desired.
  • Surrogate's don't violate some logical pureness rule.  The real world is littered with examples of surrogates, so in fact surrogates are often natural!  The real world is also a useful reference for examples of where surrogates work well and not so well (think SSNs).

The practice of using surrogate keys for all relvars is probably as common as it is due to the industry's familiarity with Object Oriented languages, which provide purely surrogate semantics for addressing.  For those looking to get more out of their database system than data storage for their objects, natural keys should be considered thoroughly before resorting to surrogates.

In conclusion, as has been stated many times in other places, this is a design issue and thus I can offer no solid rules, just general ideas.  I will say though, that considering all of the above trade-offs, it should be pretty clear that any design that takes the extreme of using all surrogates is probably not the best design.

August 02, 2007

Impressions of an Agile Process

Tomorrow is the last day of my first scrum sprint, and after a good deal of initial distaste, I wonder if I'm not warming up to the idea of agile development. My first impressions of agile are that as a methodology, it places too little faith in the experience and architectural capability of developers. Indeed, it seemed to me to force developers to abandon their natural instincts for abstraction in favor of a 'get it done now' mentality that actually encouraged hackery and sloppy development. Then I was given this paper by Kent Fowler:

http://www.martinfowler.com/articles/designDead.html

Not that Kent Fowler gave me the paper, but that he wrote the paper. Anyway, it's well worth the read, and has given me a renewed hope that agile just might have something to offer the struggle to produce better software. I remain skeptical of some of the more extreme aspects of extreme programming, but this paper does a good job of explaining some of the why behind the agile approach instead of the relentless drumbeat of the how.

For me, the paper highlighted two extremely important points: 1) Agile does not dispense with design, it simply takes a different approach to it, and 2) It is a central tenet of agile to write code once, and only once.

With respect to the first point, the paper describes two broad approaches to design, Planned, and Evolutionary, and argues that neither approach works in and of itself. Furthermore, that planned design is doomed to failure, and that only the evolutionary approach can be made to work, but only if the disciplines of agile development are applied, such as continuous integration, testing, and (most importantly for me) refactoring.

Now it seems to me that refactoring is just another word for abstraction, which is something developers have always done (at least the good ones, in my experience). After all, development is really about the pursuit of laziness, even to the point of being ambitious about it. At the end of the day, we want to write as little code as possible to achieve the most functionality. And it seems to me that this is the part of agile that I had misunderstood. In order to make the evolutionary approach to design work, the code base must be constantly refactored (i.e. abstracted). There are probably lots of XPerts out there gnashing their teeth at the fact that I have equated refactoring and abstraction, but that's how I see it currently (I reserve the right to change my mind).

And therein lies the rub. If the discipline of refactoring is left out of XP, what you end up with is the worst case scenario of evolutionary design, a lavaflow of spaghetti and any other metaphor you can think of to describe horrific code. And that was probably my biggest problem with agile, in that I saw it as encouraging this kind of hackery.

Which brings me to the second point, 'Once and only once.' The takeaway being that agile does not dispense with design, it simply argues that the best way to achieve a good design is to develop only what you know (and really know you know) and leave the rest out until you actually have a case for it. In this light, it seems to me that developers have already been doing this for a long time. Chris Date calls it the Principle of Cautious Design, namely that unless you know exactly what the requirements are, you can't design a proper solution and so you're better off leaving it out.

In short, I highly recommend reading the above paper to anyone who is skeptical of agile development. While I'm not completely sold yet, I'm at least convinced there is something to it.

Bryn Rhodes
Database Consulting Group LLC