Friday, December 14, 2007

JMS and Distributed Software Development

Nice article on Java Lobby by Roger Voss.


Using JMS For Distributed Software Development

Let's start with some background on JMS. Java Messaging Service is an API specification only. It does not define any standard for how JMS messages are conveyed over the wire. This means that JMS implementations from different venders cannot directly interact with one another. However, most vendors have bridge products that enable integration with other messaging services. (Also, it is not very difficult to write a bridge. For instance at our company we wrote a bridge that is deployed as a JMX MBean service in a JBoss application server and bridges Oracle AQ to Tibco EMS. This enables PL/SQL developers to publish messages to AQ within Oracle and have those messages bridged into Tibco EMS queues or topics. Hence we can disseminate server-side push notifications from the rear-most back-end tier of our n-tiered distributed architecture.)

Some JMS Implementations

There are a number of JMS implementations - both proprietary and open source. Here are some of the proprietary products:

And some open source implementations:

This listing is not exhaustive but merely representative.

I use Tibco EMS in my professional software development so my experience will be from the perspective of that product. In conducting a JMS evaluation for our company a few years ago, I also became familiar with others as well and will interject an occasional comparison or contrast with Tibco.

Central Server, Star Topology

JMS is typically implemented as a star topology hubbed to a central server. Messaging end-points connect to the server where they - in loosely coupled fashion - interact with other end-points via queues and topics. Enterprise-ready JMS implementations will offer a high availability clustering capability, and in some cases an option to scale the JMS service via clustering. A typical approach for high availability is the active/passive cluster. At least one physical server is the active JMS server while other servers in the cluster are in passive standby. Tibco offers this approach to a HA solution so at our customer sites we establish two JMS servers, one running actively while the other is in standby mode.

Persistent Connections and Push Notification

Because JMS messaging can take place bidirectionally, it is usual that JMS end-points connect to the central server using a persistent TCP/IP socket connection. A very useful capability in JMS is publish/subscribe topics. This is a loosely coupled one-to-many notification pattern. Business logic in the middle-tier can publish to a topic (which is the only thing it has to know how to connect to) while zero or many end-points may be connected to the topic as consumers. Thus the persistent connection makes it feasible to do server-side push of notification messages out to end-points.

Pub-sub pattern

JMS doesn't mandate the over-the-wire implementation so it is conceivable that JMS might be implemented on top of connectionless UDP. This approach might be useful in respect to wireless devices. (Our company makes use of industrial hand-held wireless devices, for instance. Tibco offers another product, though, that is perhaps better specialized for this called SmartSockets. Tibco also offers a bridge for tying a SmartSockets sub network into a JMS-based enterprise messaging system.)

Scalability Issue

Because JMS is usually predicated on having end-points establishing persistent connections, this represents a significant issue in terms of scaling a JMS service. Prior to the introduction of Java NIO with the Java 1.4 release, a JMS server written in Java would be hampered in that a socket connection would consume a thread. Most operating systems have a relatively low number of threads that can be concurrently active before context switching becomes too much overhead - effectively, say, a few hundred. As a consequence of this problem with old-style Java socket servers, Tibco chose to implement their JMS server in the C language where the socket select call could be leveraged. This enables their server to scale to thousands of end-point connections per a single physical server. None-the-less, the Tibco EMS server is available on a large array of platforms: IBM RS6000 AIX, Sun Solaris, Linux, Microsoft Windows, and Apple Mac OS X. (Of course Tibco EMS has Java client bindings for Java end-points.) SonicMQ is implemented in Java but makes use of the Java NIO library such that a few threads can service many socket connections. Consequently it too is able to scale well on a single physical server. (JBossMQ is currently being rewritten to address performance and scaling concerns.)

Ironically, in some circles the fact that the Tibco EMS server is implemented in C is a comfort factor - obviously to those that don't know or feel comfortable with the Java language. What computer language the JMS server is written in is immaterial to the JMS software programmer as he/she will deal only in the client bindings. Yet for old school C/C++ programmers and even some C# folks, the notion of a C implemented server tends to lesson their skepticism toward messaging a little when going into it for the first time.

Fail-over State Management

High availability cluster solutions for JMS that I am aware of use two different approaches for managing active state. Tibco EMS requires that the active and passive standby server pair have access to the same shared-state on a file system. When fail-over occurs and the passive standby server becomes active, it will look to the persistent state in the shared file system by which to resume operation. (Non-persistent messages could be lost during the transition, though, so if the message data is crucial, then one should choose persistence when queuing it.)

SonicMQ offers an option to use a distributed two-phase commit replication of state between the active and passive standby server pair.

Minimal HA Installation

The difference of the two solution approaches boils down to performance vs monetary cost. A shared file state can be higher performing - but the quality of the cluster-aware file system that is used is all crucial as a poor implementation of such a file system could cripple performance and even lead to catastrophic data loss. A distributed replication of data means that the local file system of each server is ultimately used to store persistent state but with the overhead of performing lock step synchronous disk writes on two different machines.

The shared state approach can be significantly more expensive monetary-wise as a fully redundant hardware storage device must be established - down to redundant drive interface cards and redundant power supplies. Not having such redundancy would risk this storage device becoming a single point of failure - which would defeat the purpose of even having a high availability cluster.

FiroanoMQ offers an interface to JDBC storage such that a SQL relational database could be used for persisting state. If the database itself is clustered for high availability, then this might be a feasible solution. However, a SQL relational database used for messaging persistence will be slower relative to the solutions that use the file system directly. For modest messaging loads and where the back-end database is already redundant, this approach might be rather economical while still offering integrity against data or service loss.

Heterogeneous Client Support

If you write enterprise distributed software systems as I do then chances are you find that you're often faced with integrating software spanning across platforms, languages, and software providers (i.e., vendors). In other words, you deal with a heterogeneous computing network environment. When choosing an enterprise-capable messaging system, support for diverse client end-points was weighted very heavily in my decision matrix.

With Tibco EMS, I get client bindings for Java, C (and thus C++), .NET, and Compact Framework .NET for WinCE. For software I personally create, I write my middle-tier using Java and for my rich client apps I use C# .NET. I also have WinCE embedded computers that run Compact Framework .NET and that participate in JMS messaging. As to my external vendor relations, some use C# .NET while others use C++ (and thus use Tibco's C client library). Internally in my company we have an important product group that adopted Tibco EMS for use and they write their software throughout in C# .NET.

All of this software uses XML for the message payload format, which naturally can be readily processed by any of these end-points. A JMS messaging system is the one ring that binds them all.

At the time I made my JMS product selection other vendors in the running did not have client support for Compact Framework .NET for WinCE (an important platform for my company because of industrial WinCE devices we use that are a part of our distributed software systems). The Tibco JMS client binding for WinCE is 100% .NET managed code so there is no complication during deployment as to which CPU the WinCE devices are designed with (MIPS, StrongARM, Intel, Hitachi, etc.). It also has support for SSL connections.

Now I recall from my JMS product evaluation that there was one particular client implementation for .NET where C# wrapper classes were draped over an underlying implementation that was written in C++. This limited client support to desktop Windows .NET while greatly complicating that JMS vender's prospects of porting to Compact Framework .NET on WinCE. I also uncovered issues in this particular implementation in my testing. My word of advice is be sure that you select a vendor that has a 100% managed code implementation for .NET if that particular platform is important to you.

Administrative Configuration

While dwelling on the subject of client bindings, I should note that Tibco EMS provides administrative APIs for both Java and .NET. These APIs are specific to Tibco and are relevant to control and operation of their servers. Once you become immersed in using messaging, you're likely to find yourself writing various tools and utilities that interact with your messaging server at this level.

Out of the box Tibco EMS provides an interactive command line admin tool. This turns out to be an easy tool to use and it has mostly adequate online help. Of course there is more thorough HTML-style electronic documentation and PDF versions as well. Any configuration action can be performed using this command line admin tool. It also can remotely attach to the JMS server that is to be configured. Administrative username and passwords can be set to restrict access.

Tibco has yet another command line tool that can be used to process a configuration script text file in a batch fashion. This is the approach I currently employ. I created an all-encompassing configuration script and checked it into my Perforce source code repository. This one script addresses all of the configuration that is necessary for the distributed software products that my organizational group is responsible for. I use a java tool to preprocess this script file such that replacement variables of the form are replaced with java system property values of the same name. I apply this script from an ant build file so that the property values are easy to set up. This makes it straightforward to customize some aspects of the configuration with regard to the specific customer site.

Configuration is all important with JMS servers as the queues, topics, user accounts, permissions, bridge rules, etc., will all need to be administratively setup prior to running the messaging software. My group manages numerous customer sites so we want to be exact in our deployments. The batch script approach to configuration makes that straightforward to do.

Tibco additionally offers an optional product called Hawk that can be used to administer a domain of JMS servers across the enterprise WAN. It has a GUI interface. SonicMQ likewise offers a nice graphical UI domain administration tool.

Special Messaging Features

The JMS API specification is aimed at software developers that are using asynchronous messaging for distributed communication. There are some features which JMS implementations offer that exceed the specification. However, the ones I'm going to high light do not require using any special extensions to the JMS API, and so do not impact the portability of software.

Bridge Rule Selection Expressions

This is a feature offered in the Tibco EMS server that has become one of my very favorite over time. It builds on top of the message filtering and routing capability that is inherent with JMS via message properties and selection expressions. A JMS client can filter the messages it receives from, say, a topic that it is a subscriber to by specifying a selection expression that uses message property values to select which messages will actually be conveyed to the client. (Standard JMS message filtering and routing is a hugely useful feature and it is very difficult to imagine that there are other kinds of messaging systems that exist which have no such capability.) The expression syntax is based on SQL so will be familiar to most developers.

A bridge rule uses this same JMS selection expression syntax for specifying the routing of messages amongst queues and topics that exist in the server. Administratively one can create a bridge rule in the server that will copy messages from one place where they are being published to yet another place where there is a newly defined interest in consuming that message.

This is a feature that begins to make sense once one has accumulated an extensive portfolio of messaging applications. As the application population grows one becomes more focused on deployment and configuration issues. It is a simplifying matter if a JMS client application is written to consume messages from a single queue or topic and to publish to a single queue or topic. Think of this as an inbox and outbox per messaging application.

Now if some application is currently publishing messages to its outbox topic and another existing application needs to be altered to begin consuming some of those messages, it is easy to add a bridge rule that will copy the desired messages from where they're being published to where they now need to be consumed. It will not be necessary to recode the consuming application to become a subscriber to yet another topic and establish yet another JMS session. Instead the programmer need merely add new message handling code for the new type of message to be consumed. (Indeed, in C# .NET or Java it doesn't take much effort to build an architecture for applications where message handlers can be added as a plug-in modules.)

There is also a case where sometimes a message being published into a JMS topic must be copied into a JMS queue before it can be consumed by an intended end-point application. This arises typically where a JEE Message Driven Bean (MDB) is consuming messages from a queue. If one is using a cluster of JEE application servers then it is necessary to specifically use a queue in conjunction to the MDB in order to prevent a message from being processed more than once. (Will note here that the JMS server also does round-robin load balancing to a cluster of queue end-point subscribers.) So a bridge rule can be used to very nicely solve this particular dilemma.

Server-to-Server Bridging

This form of bridging is similar to the previous bridge rule feature but this time messages will be routed and copied between distinct JMS servers. It is a capability that will usually be found in the enterprise class of JMS products. If applications have been built to use messaging for distributed computing interactions, then using this feature is an effective way to tie applications together that are geographically far flung.

For instance, let us say that at one location a data center exist hosting relational database storage and JEE middle-tier application servers. At various branch offices rich client applications are in use that are designed to interact with the data center middle-tier via messaging. A JMS server would be set up at the data center site and a JMS server would be installed locally at each branch office. Client applications in the branch office would connect for their JMS sessions against their local JMS server. That server would forward all message traffic to the data center JMS server over a corporate WAN connection (or out over the public Internet using secure SSL connections). The JEE hosted middle-tier applications would likewise sustain only local connections against their network local JMS server.

The benefit of this arrangement is that the JMS servers can easily and reliably sustain hundreds of connections to local client applications while establishing and managing just a few remote connections over the WAN for the purpose of bridging JMS servers together. A client will consequently start up and establish connectivity very quickly as it only has to establish network local socket connections. Message publishing operations by client apps will also happen very quickly as out-going messages are swiftly queued up in local servers. The message traffic out-going over the WAN will be efficiently buffered as data is held safely in a JMS server until transmitted over the WAN connection and received by the JMS server at the other end. Momentary interruptions of WAN-spanning connectivity will not result in breaking the network local connections of end-user client applications - lending a better overall user experience.

SalesFore.com style SOA

If an enterprise needs to funnel many far flung users into a common data center (turning the data center into the messaging-based SOA analog of SalesForce.com), then bridged JMS servers are a better way to do that rather than have remote client applications attempt to connect directly to the data center middle-tier. Yet to take advantage of this manner of arrangement, end-user applications need to be designed to use asynchronous messaging. All of my end-user applications are built to use messaging exclusively (with exceptions to that rule being some streaming video and HTTP as used for retrieval of images from an archive server). After a few years of creating such software I can say it is very much doable and is indeed very much a superior way to design applications.

Message Trace Logging

Tibco EMS has a feature for enabling trace logging of specified message traffic. All JMS message types support the toString() method in some manner. When dumping a message for tracing, all properties and time-stamp information of a message will also be displayed. Of course this can be handy for debugging and verifying that the contents of messages are coming through as expected (if the content is clear text such as XML, name-value pair data, or JSON). With Tibco EMS, tracing should not be enabled for production as trace output is logged to file (the unix tail tool is handy for viewing these). It thus incurs the additional overhead of the file i/o, but also, if the messaging traffic being logged is heavy, the log file could grow to exhaust file storage and cause an operating system crash. On a production JMS server such would not go unnoticed.

Shadow Queue Monitoring

Another technique that is easy to rig up, though, is to create what I call a shadow queue. A bridge rule can be added to the JMS server that will copy messages to this shadow queue. One can then run a JMS console-based client app that will dump messages from the shadow queue to the console display. Most console mode programs on Windows, Macintosh, or Linux can have the console memory buffer significantly increased. This makes it possible to capture a fair amount of message traffic that one can scroll back and forth through to examine.

A message expiration policy of no more than a few seconds should be set on the shadow queue. When the message dump tool is not connected to the queue, messages copied into this queue will just expire and be harmlessly expunged. (One would not want messages to ever pile up in this shadow queue without being consumed as that could exhaust virtual memory and ultimately crash the JMS server.)

What is nice about this approach is that it can be used with production situations as the overhead from the shadow queue is very minimal such as to be of no consequence.

My company deals with hardware-based automation systems that result in the publishing of JMS messages. It is very useful to attach to a production JMS server to examine these messages when something may have gone out of adjustment or otherwise ceased to function correctly. Needless to say, the shadow queue monitoring technique has become a staple of my JMS messaging toolbox. (Notice how the bridge rule feature figures into this technique? It is just a wonderfully handy feature.)

JMS Auto-Discovery

When I first started using JMS messaging, one of the very first things I did was write a JMX mbean that is hosted as a high availability singleton in my JEE application server cluster. Its purpose is to provide a UDP-based auto-discovery service that client applications can use to discover the JMS cluster to connect to. My JMS client applications broadcast UDP datagrams that the JMS mbean will respond to by sending back the JMS connection URL (which for a high availability JMS cluster will consist of multiple URLs). The client applications will then proceed to connect to the JMS cluster. Once connected to JMS, the client app can proceed to retrieve a global configuration file from a config file repository. The retrieved config file will contain such things as what queues and topics to connect to.

This approach makes it possible to deploy zero configuration client applications - which is not bad considering it is distributed software. Of course, if necessary, it is possible to fall back to using a local config file containing JMS connection URL info. On occasion my product group encounters customer sites where the UDP broadcast approach is problematic and we have to fall back to old-fashioned manual local configuration.

I've mentioned this home brew feature to various relevant folk such that I think we'll see this manner of JMS auto-discovery showing up in JMS implementations as an out of the box feature. It doesn't break with compatibility to the JMS spec as it is just a way to determine what JMS service to connect to. Once the connection URL is determined, the JMS APIs are programmed to as usual. If the JMS product you're interested in doesn't support anything like this, then ask for it.

Conclusion

I've been developing software as a livelihood since 1986. Like a lot of folks, I've built my fair share of network distributed software over the years. JMS messaging is the best technology I've yet used for doing this. Now I didn't really dip down into the programming of JMS - there are books on how to do that. Instead I wanted to present a higher level perspective of using this kind of software technology - giving consideration to things that are important in enterprise computing: high availability, scaling, platform support, configuration, deployment, monitoring, etc. etc. I hope this experience write-up will be helpful to those considering JMS.

Using JMS For Distributed Software Development

No comments: