Monday, May 22, 2006

XML Integration Gate, or a lesson in how systems interact :)

Warning, the follow contains generalizations of technical information that you would only understand if you worked at my former employer at the same time I did in the programming department. Unless you did, you probably won't understand any of what I'm talking about.

Well, recently there has been some excitement several thousand dollars my former employer ended up having to pay a vendor due to a system error. My interest in the subject (no longer being employed there) is that there was a possibility that one of my systems was the cause of this error. Now, not that all of my code is perfect and never breaks, but I was pretty sure my code (or my code with out significant help) could not be the cause of such an issue. So here are my thoughts on the subject, and why ultimitily, I beleive that my code wins in the end.

The issue was the system getting caught in an infanant loop placing orders. When an order is placed in my former employers system, it automatically makes a request to a particular vendor which charges my former employer a small fee. The issue was some how, a queuing system I wrote for an XML Integration got caught in an endless loop replacing the same test order over and over again (presumably in thier production environment).

When I first heard of this issue, and was explained given the inital data on the cercomstances surrounding what was happening I made a suggestiong that the person investigating the issue check into a hack another programmer (lion) had added to the XML integration. This "hack" was to allow the system to cancel orders before our system actually placed them, while they still existed only in our queue. I'm not sure how much invetigation this person actually did as they happen to be in extreamly high demand over there. I doubt they had much time to check into it.

This is where the saviour comes in, a person DreamTeam dubs, Constallation.

What constillation found is that the issue was caused only when the middle man in the integration sent the system in our events in the following order (All for the same transaction identification number)

Order Placement
Order Cancellation
Order placement (new product)

The reason this was an issue is because of the nature of how this particular integration works, when I wrote the queing system I was forced to write the system to deail with multiple records in our database as a signle trasaction (as if they were transmitted in the same XML document) This was to ensure extra orders were not placed in my former employers system. Because of this, what ended up happending was the following:

Orders placed, new order records at status 'CREATED'
Orders cancelld, new order records presumably at status 'COMPLETE'
Orders placed, some records for this transaction at 'COMPLETE', some at 'CREATED'
Queue picks up the 'CREATED' protion of the transaction, marks the entire transaction 'LOADED'
Only the records which were previously at 'CREATED' status are processed, records which were previously at 'COMPLETED' are now set to created (thus the previously cancelled order is alive again and will be processed)

Here is where my lack access to the code limits my understanding of the siutation... Rather than the previously cancelled order simply being processed and marked 'COMPLETED' as they should have been, the system ended up in an endless loop replacing the "cancelled" order.

Now one aspect of the cancellation system that Lion wrote which I have not touched on yet, is a reposting mechinism. If for some reason the cancellation fails the system is supposed to automatically repost the cancellation and try again. At this point I can only guess that perhaps Lion's reposting system considered the cancellation a failour and endlessly retired to cancel the orders, while my order processing queue endlessly considered the order placement event for those orders a failour and endlessly tried to replace the order.

Now, admittingly the order entry queue could have been a bit smarter and done a few things to check that the order exsited in the system, however this is yet another project that I ended up writing in less than a week so.. I guess I'm not a super duper hero, just a super one (or something like that).

But at the same time, Lion's order cancellation system could never have worked (at least in this situation). The best he could have hoped for is for the system to simply not cancel the order. So, clearly more testing was required in this project, apparently it shouldn't have gotten past QA.

So in the end, I feel that the order entry queue was fine (though obviously had *some* weaknesses), the system worked exactly as it should have as long as people didn't start screwing with it externally. So in the end, Amlett's assesment that my system was broken, was incorrect. My system wasn't broken, it was just being affected by a broken order cancellation system.

Well, that's all for tonight. Some time later on Bear and I will post the story of how dream team came into being on our blogs respectivly. It is super exciting stuff so I'm sure none of you will want to miss it.

Good night

2 Comments:

Anonymous Anonymous said...

Don't you miss all the good times at MIS? I sure do! I can see it now... (queue bell tree and wavey screen effect)

Jeff: "What the fuck! Something is broken. Carlo! Jon! Fix it!"

Jon: "Jesus fucking christ, not again... Why do we always get stuck with this crap?"

Carlo: "Because we're the only ones who can fix things the right way."

Jon: "Oh yeah."

Time passes. Carlo and Jon look at some code.

Jeff: "What are you two doing?"

Jon: "We're looking at code."

Jeff: "Well... What's wrong?"

Carlo: "If we knew we would fix it."

Jeff: "I don't like your attitudes!"

Carlo and Jon: "Fine, we quit! Hooray!"

6:26 PM  
Blogger Carlo said...

Yeah, that sounds like the average day at the old office... Thank god the new office is so much better!

7:37 PM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home