Migrating a Remoting Service to WCF

The Need To Migrate

In extending Dataphor to include a Silverlight client, one of the biggest changes that had to be made was the communication layer. Since its initial version, Dataphor has always used .NET Remoting as the primary communication technology. However, since Silverlight does not natively support .NET Remoting, we had a choice to make. We could opt for a more primitive model and drop all the way down to sockets and take care of the messaging and marshaling ourselves, or we could migrate the existing .NET Remoting infrastructure to WCF and take advantage of Silverlight's built-in WCF support. In the end, we chose to migrate to WCF, mostly because it allowed us to increase the potential number of technologies in which a Dataphor client could be built. This post details some of the roadblocks we encountered along the way, and the solutions we came up with. Hopefully, it will shorten someone else's journey.

Asynchronicity I

The first issue to be tackled was the lack of a synchronous model for invoking a WCF service from Silverlight. The Dataphor CLI was designed as a set of interfaces, somewhat resembling a traditional DBMS CLI, with layers corresponding to the different layers of calls that can be made (such as Server, Session, Process, Cursor, etc.). Each of these layers exposes calls for performing operations against the server, and each call is by design a blocking call. Because the server supports multiple connections, asynchronous operations can be built as a layer above the CLI if necessary. However, because Silverlight does not support synchronous service invocation, we needed a way to wait for the results of every call.
 
Now, there is no shortage of material on the relative merits of synchronous versus asynchronous calling, and this post is not going to add anything to that debate. Suffice it to say that without completely re-engineering the client side, we need to be able to invoke our CLI calls synchronously. So the first step towards a solution was to verify that a simple service could be synchronously invoked from a Silverlight application. The idea was to use an invocation thread that would perform all the network communication, waiting on the AsyncResult.AsyncWaitHandle returned by the Begin call of the service operation. Once that returns, we invoke the End to get the result and voila, we have a synchronous call. So long as we keep that call off the main thread everything works fine, problem solved:
 
      IAsyncResult LResult = LService1.BeginGetData(4, null, null);
      LResult.AsyncWaitHandle.WaitOne();
      return LService1.EndGetData(LResult);
 
The reason we have to keep the call off the main thread is that all network traffic in Silverlight appears to be threaded through the main UI thread, so if you block that thread waiting for the result, you'll never get the callback. That's probably an overly simplified description of what's happening, but the solution we've come up with works fine.
 

Asynchronicity II

One of the things that became clear as we were building this proof-of-concept is that when you define an operation (at least a two-way one) in a service contract, you are really defining both a message and its associated response. As a result, a service can be invoked asynchronously on the client, even if the service contract is defined synchronously on the server. (This is probably obvious to everyone but me, so bear with me). For example, if I define the following service contract:
 
      ///<summary>
      /// Describes the interface for the Dataphor listener.
      ///</summary>
      [ServiceContract(Name = "IListenerService",
Namespace = "http://dataphor.org/dataphor/3.0/")]
      public interface IListenerService
      {
            ///<summary>
            /// Enumerates the available Dataphor instances.
            ///</summary>
            [OperationContract]
            [FaultContract(typeof(ListenerFault))]
            string[] EnumerateInstances();
      }

I can consume this service synchronously using the IListenerService interface directly, or I can define an asynchronous version:

      [ServiceContract(Name = "IListenerService",
Namespace = "http://dataphor.org/dataphor/3.0/")]
      public interface IClientListenerService
      {
            ///<summary>
            /// Enumerates the available Dataphor instances.
            ///</summary>
            [OperationContract(AsyncPattern = true)]
            [FaultContract(typeof(ListenerFault))]
            IAsyncResult BeginEnumerateInstances(
AsyncCallback ACallback, object AState);
            string[] EndEnumerateInstances(IAsyncResult AResult);
      }

Of course, this is exactly what the Add Service Reference feature of a Silverlight project in Visual Studio is doing, which leads to the conclusion that (unless the Silverlight version of the WCF communication code is substantially different than the standard .NET one) there is no technical reason that a Silverlight client couldn't invoke synchronously. Which leads to the conclusion that so long as the actual service invocation is kept off the main thread, a synchronous version of the service should work. Unfortunately, attempting to feed the synchronous version of the interface to the ChannelFactory in Silverlight gives the error "The contract 'IListenerService' contains synchronous operations, which are not supported in Silverlight…" I for one am convinced that this is not a technological limitation, just an error thrown in to try to force developers to adopt the asynchronous programming model in Silverlight.

Migrating MarshalByRefObject

The second major issue to be tackled was the fact that the Dataphor CLI uses the instancing and lifetime management services provided by .NET Remoting. Each layer of the Dataphor CLI is modeled by a MarshalByRefObject descendent that implements the interface containing the calls appropriate to that layer. WCF, on the other hand, is essentially solving a different problem, and does not have any facilities for cross-process instancing. As a result, we were faced with another decision. Either we re-engineer the entire CLI to work without instancing, or we recreate the lifetime and instance management facilities provided by .NET Remoting and expose them via a WCF service.

Because the Dataphor CLI was already layered into a 'developer-friendly' version meant to be used directly from code, and a 'network-friendly' version optimized to reduce network traffic, building the instancing and lifetime management facilities could be done relatively easily and would enable all the existing client and server side infrastructure to be used as is.

The Way It Was

First, a little background; the core CLI is defined by the IServerXXX interfaces. This is the development-level interface actually exposed to the code, and is designed to be as easy as possible to use from a development perspective. On the server-side, these interfaces are implemented directly by ServerXXX classes that make up the actual running server.

The network-level CLI is defined by the IRemoteServerXXX interfaces, and is designed to minimize network round-trips and message sizes. On the server-side, these interfaces are implemented by a set of RemoteServerXXX classes that sit on top of the ServerXXX classes and route the calls to and from the network layer.

On the client-side, the IServerXXX interfaces are implemented by the LocalXXX classes, which are responsible for consuming the IRemoteServerXXX proxies returned by the remoting layer and converting the network-level CLI back into the development-level CLI. The result is that whether a client is accessing the Dataphor Server in- or out-of-process, the programming model is identical.

The Way It Is

In order to preserve this programming model (and the mountains of code written on top of it), the WCF-enabled architecture effectively acts as a shim between the server- and client-side implementations of the IRemoteXXX interfaces.

To avoid multiple channels on the client, instead of a group of interfaces, the entire CLI is exposed via the IDataphorService interface, and each level of the CLI is modeled with handles. On the server-side, a DataphorService implements the actual service and simply wraps up the existing RemoteServerXXX classes. Each object that would have been marshaled in .NET Remoting is assigned a Handle and tracked by the DataphorService. Information that would have been marshaled via properties of those objects is now packaged in Descriptor structures.

On the client-side, the IRemoteServerXXX interfaces are implemented by ClientXXX classes that mirror the RemoteServerXXX objects on the server side. All communication is channeled through the DataphorService, and the object state is unpackaged by the ClientXXX and exposed through the IRemoteServerXXX interfaces back to the existing LocalXXX implementations. As a result, all the existing client-side code still works, it just uses WCF now instead of remoting.

Watch Out For Out

Another aspect of the .NET version of the CLI that had to be changed was the use of ref and out parameters. Of course, these work fine for the synchronous version of the service, but in the asynchronous version, the ref and out parameters were never being set. Of course, this makes sense if you think about it, but if there was ever a good place for an exception, this would be it. How about: "Ref and out parameters cannot be used with asynchronous invocation."

Lifetime Management

When the CLI was exposed via .NET Remoting, we were able to take advantage of the fact that .NET tied the lifetime of the proxies to the lifetime of the connection. Using lifetime services, if a remote object failed to renew its lease, the remoting infrastructure would disconnect the object and notify the RemoteServerXXX layer that disconnection had occurred. In the new WCF architecture, no such services exist.

It should be noted that we looked at using WCF sessions to enable this functionality and decided against it for several reasons. First, the session management built in to WCF isn't an exact fit to the way sessions are managed in the Dataphor CLI, so we would have ended up having to build a shim architecture on top of that anyway. Second, the session management required the use of the WsHttpBinding, which at least at the time of the migration, was not supported in Silverlight, our primary target for the migration in the first place.

In the .NET version of the service, we used a client-side thread that simply posted a do-nothing message (a ping, if you will) to the server on a timer. The lifetime lease for each object was set to renew for a little over twice the time of the client timer, and so long as the client could reach the server, the remote object would stay live.

In the WCF version, we left the client-side mechanism alone, and simply added a daemon to the Dataphor Service to check the last 'ping' time for each connection. If the last ping time occurred before the idle timeout, the connection is assumed to be lost and all the sessions it supported are closed.

Because the ping is running on a separate thread in the client, it will occur even when the client is busy, so the service does not need to do anything to track activity occurring below the session.

Exception Management

When the CLI was exposed via .NET Remoting, exception management was fairly simple. All the RemoteServerXXX layers had to do was make sure that any exception that hit a remoting boundary was serializable, and deserializable by the client (i.e. the exception class was available to the client app domain). We did this by making sure that all exceptions thrown across remoting were descended from our own DataphorException class, and that all relevant exception classes were available on the client (that is another story).

In the WCF implementation, however, exceptions always cross the service boundary as a fault. The simplest solution was just to turn on IncludeExceptionDetailInFaults. This was safe from the service perspective because we already knew that every exception coming out of the service was a known-good remotable exception. However, the problem was that when the exception was surfaced on the client-side, it became a FaultException<T>, with T being a basic ExceptionDetail class. There were several problems with this. First, the ExceptionDetail class only has the information carried by the base Exception. Our exception classes carry other information (such as syntax and compiler error line information, system-level error codes, etc.) and this information was being lost. And second, the client-side code downstream from the service was written to expect exceptions to be of the appropriate type.

To solve these problems, we introduced a DataphorFault. This fault class was simply a combination of all the information that could be carried by a DataphorException or any of its descendents. Then each operation contract was marked with a fault contract specifying this fault type. In the implementation of the DataphorService, each call is wrapped with a catch that converts any exception into a FaultException<DataphorFault>. With that in place on the server-side, we no longer need the IncludeExceptionDetailInFaults on the service behavior.

On the client side, each call is also wrapped with a catch block that converts any FaultException<DataphorFault> back in to the appropriate DataphorException descendent with all the relevant information from the fault. In this way, exceptions are transported across the WCF boundary without the server or client ever being the wiser. All the existing exception management code on both sides remains the same.

No Configuration Files Required

An aspect of WCF that we wanted to avoid was the astonishing proliferation of .config files that are required to enable even the simplest WCF scenarios. Of course, configurability is a good thing, but in this case, we already had configuration for the important aspects of the server (host name, server instance, port number, etc.), and we did not want the migration to WCF to add any administrative overhead if we could avoid it.

So rather than specify service behavior and endpoint configurations in config files that would become part of the deployment, we built that in programmatically. We were able to control every aspect of WCF service and hosting behavior programmatically, and added zero configuration to the deployment of a standard, network accessed Dataphor Server.

For Silverlight, we had to tackle the problem of 'cross-domain access'. For this, we simply built a Cross Domain Service to serve up a clientaccesspolicy.xml file. The only tricky part here was figuring out how to get the 'Web' behavior specified so that a URI request coming in would be treated as a web request, rather than a SOAP action. This can be done programmatically by adding a WebHttpBehavior to the behaviors of the newly created Endpoint. However, because we implemented a separate service, it was easier just to use a WebServiceHost rather than a ServiceHost.

Conclusions

So at the end of the day, what did we get out of the migration? Well, besides a deeper understanding of Yet Another Remote Procedure Call Technology From Microsoft (YARPCTFM), we did get some pretty substantial benefits:

·         Increased Exposure – A Dataphor Server can now be exposed via http/s as an industry standard Web Service. Something we never had before. And with both the standard CLI and the new Native CLI exposed, accessing a Dataphor Server is possible from pretty much any technology now known.

·         Network Resilience – Communications with a Dataphor Server are now stateless from the networking perspective. A dedicated connection is no longer required, with session management being built in the CLI and calling protocol rather than baked into the network layer. This will give Dataphor clients much greater resilience to intermittent network connections.

·         Silverlight Capability – Following from the increased exposure bullet above, it is now possible to build a Silverlight Dataphor client, a project that is nearing completion.

·         Leverage On Existing Code – By building the WCF replacement the way we did, we were able to preserve the existing Dataphor code base on both sides of the network boundary. We dropped in an entirely new communication layer and neither side knows the difference. Fantastic.