Sunday, December 07, 2008

Someone to Give Me the Time

It's been really interesting to see the responses from Blitz, Fly Object Space and GigaSpaces concerning state management as well as Newton and Rio concerning service discovery. I'm definitely learning as I go, but the good thing is that it seems like there are many in the community eager to help.

Now I'm working on another issue with enterprise service development - scheduled services. There are some services out there who may want a have an event fire in 1000 milliseconds, or five minutes, or an hour, or somewhere in between. This would appear to be an easy thing to solve at first blush - until you consider volume, quality of service and scalability. It's a steep drop into complexity at that point.

Here's the thing: you could easily just do a scheduled executor in J2SE, but once your VM dies then your pending events die too. You could submit a scheduled job to something like clustered Quartz instances, but then you must have a reliable back-end database to write to (no native replication). You could use something like Moab Cluster Suite, but it seems to live outside the muuuuuuuuch more simple realm of event scheduling.

So let's think outside the box and use some replicated object store that isn't necessarily meant for scheduling. How about we slap a time to live (TTL) on a JMS message, throw it on a queue and wait for it to hit the dead letter queue? That might work at times, but TTLs are really intended for quality of service and not for scheduled events. Unless you have a consumer attached to the former queue constantly polling for messages you're not guaranteed to land in the latter dead letter queue.

How about using Camel's Delayer Enterprise Integration Pattern? Nope - that's just a Thread.sleep on the local VM. Doesn't do you much good once the VM dies. How about a delayed message using JBoss Messaging? I've heard tell that it exists, but I can't find much reference to it in the documentation.

This isn't a new problem - there's even JSR 236 that is intended to address this problem. But it's been hanging around since 2004 with very little activity of note, so I doubt it's going to have much hope of working by Monday.

Until JSR 236 is addressed I'll likely have to just find a way to deal with this on my own. Maybe create a JobStore for Quartz that's backed by a JMS topic? Or just suck it up and build a clustered Quartz instance with a fault-tolerant database?

Gah. Sticky wicket.

5 comments:

  1. Anonymous3:36 PM

    Hi there-

    How to send a scheduled message is explained here http://www.jboss.org/file-access/default/members/jbossmessaging/freezone/docs/userguide-1.4.1.Beta1/html/configuration.html#conf.destination.queue.attributes.scheduledmessagecount in the JBM documentation.

    I admit this is somewhat buried and hard to find, but we'll be doing a big overhaul of the docs for JBM 2.0.

    Hope that helps :)

    Tim Fox

    Disclosure I'm the JBM project lead

    ReplyDelete
  2. Anonymous3:38 PM

    Hmm, your blog truncated my link. But in any case it's in section 6.7.2.1.11 of the 1.4.1 docs. :)

    ReplyDelete
  3. Ah-ha! There it is! Thanks so much for the link Tim... I was seconds away from drilling a hole in my head. I'm going to take a few hours & try to get this rolling with Camel.

    I'm glad to hear that the docs are getting a big overhaul in 2.0 - sounds like lots of good things are coming up for that release.

    ReplyDelete
  4. Holger Hoffstätte3:17 AM

    The subtle problem with queues and TTLs is that a) most queue backing stores are pretty inefficient when it comes to handling fine-grained TTL handling with decent accuracy (for many events), and b) queue congestion can mask TTL expiry.
    This problem is much, much more difficult than many people realize.

    ReplyDelete
  5. Definitely true. And it appears to be compounded by the fact that (in ActiveMQ) TTL expiry is an event-driven mechanism, requiring either Queue initialization or consumption. For example, I can't get an idle queue to even begin thinking about calculating expiry until I connect a queue browser - using something ridiculous like session.createBrowser(queue, "SchrodingersCat IS NOT NULL").close().

    ReplyDelete