It's been really interesting to see the responses from Blitz, Fly Object Space and GigaSpaces concerning state management as well as Newton and Rio concerning service discovery. I'm definitely learning as I go, but the good thing is that it seems like there are many in the community eager to help.
Now I'm working on another issue with enterprise service development - scheduled services. There are some services out there who may want a have an event fire in 1000 milliseconds, or five minutes, or an hour, or somewhere in between. This would appear to be an easy thing to solve at first blush - until you consider volume, quality of service and scalability. It's a steep drop into complexity at that point.
Here's the thing: you could easily just do a scheduled executor in J2SE, but once your VM dies then your pending events die too. You could submit a scheduled job to something like clustered Quartz instances, but then you must have a reliable back-end database to write to (no native replication). You could use something like Moab Cluster Suite, but it seems to live outside the muuuuuuuuch more simple realm of event scheduling.
So let's think outside the box and use some replicated object store that isn't necessarily meant for scheduling. How about we slap a time to live (TTL) on a JMS message, throw it on a queue and wait for it to hit the dead letter queue? That might work at times, but TTLs are really intended for quality of service and not for scheduled events. Unless you have a consumer attached to the former queue constantly polling for messages you're not guaranteed to land in the latter dead letter queue.
How about using Camel's Delayer Enterprise Integration Pattern? Nope - that's just a Thread.sleep on the local VM. Doesn't do you much good once the VM dies. How about a delayed message using JBoss Messaging? I've heard tell that it exists, but I can't find much reference to it in the documentation.
This isn't a new problem - there's even JSR 236 that is intended to address this problem. But it's been hanging around since 2004 with very little activity of note, so I doubt it's going to have much hope of working by Monday.
Until JSR 236 is addressed I'll likely have to just find a way to deal with this on my own. Maybe create a JobStore for Quartz that's backed by a JMS topic? Or just suck it up and build a clustered Quartz instance with a fault-tolerant database?
Gah. Sticky wicket.