Wednesday, March 7, 2012

When did “almost” become “good enough”?

Has the expression “mission critical” become too ubiquitous and is this diluting the message of late? When should business truly consider mission-critical computing?

Does a time finally arrive when we become blasé about tags and brands? Do we reach a point, eventually, where we simply make assumptions and where we convince ourselves that yes, we know exactly what the label means?

For the past couple of months I have been following the news coming out of Australia detailing the outages to their ATM networks, where the major national banks have all experienced considerable embarrassment from the bad press that these outages have produced. This is a circumstance I have chronicled in my most recent post to the ATMmarektplace ePublication – for more on this check out “Hey Wally, what’s up?” - I don’t plan to revisit all that has transpired in this post, except to say that these outages appear to be happening far too often.

The latest outage that I referenced was the National Australia Bank (NAB) where a remark that really stood out for me was one made by a NAB spokeswoman that appeared in the Feb 22nd, 2012 edition of the Sydney Morning Herald (SMH). “We can confirm that NAB experienced system problems last night which meant some customers were unable to successfully transact using NAB banking channels including ATMs, Eftpos, HiCAPS, Internet Banking and Telephone Banking,” she said, then added “the issue was identified at 8:20pm and the issue was resolved, with customers able to transact as per normal, from 2:00am.”

By my calculation, this is an outage of some 340 minutes and comes after an earlier, almost as lengthy, outage back in November of 2011. All up, the bank has struggled to maintain just three 9s of availability and this raises the question, of course, does the NAB view their many banking channels as mission critical? Or has the definition of mission critical carry an understanding that has somehow evolved (and gained acceptance) along the lines of well, “almost” is “good enough” these days!

Following an earlier outage at NAB, back in November 2010, NAB’s chief executive Cameron Clyne was reported to have told reporters “unfortunately in any large organization these things happen from time to time.” Well, perhaps this is de rigueur at the NAB, but for every one of us who has worked with NonStop, this is a surprising turn of events. When you look at what is included within the overall scope of mission-critical applications, such direct customer-facing interactions that ATM transactions represent must be at the fore of all that is considered mission-critical.

Or at least, I would have thought so, but if major banks are experiencing outages of this ilk, and regularly at that, is all the effort we are putting into designing applications that are available 24 X 7 not important anymore? Could we even suggest that mission critical is not as critical to the mission of service as they once were? If PCs fail all the time and phones drop calls routinely, have we become a generation of users who are not all that concerned about availability?

In a recent exchange with comForte’s Thomas Burg on this topic, when it came to defining what constituted mission critical, Burg proposed rather informally that “if the service is down and it REALLY, REALLY, hurts, and costs (you) a lot of money its operation is mission critical.” And the applications at the heart of this, according to Burg, are “networks of ATMs, POSs, mobile phones and even stock exchanges prior to millisecond trading.” In other words, where money is at stake and where not having access to our funds for whatever reason – spending, buying, talking about and perhaps, even investing - generates real pain whenever we are forced to change direction and do something else then yes, the application is mission critical.

These mission critical applications are every bit as important today as they ever have been and the scope of mission critical applications has changed little over the years. When HP Senior VP and General Manager, Martin Fink, gave his keynote at last year’s HP Discover event, among the slides he used was one carrying the title “Best fit solutions for critical workloads”, where workloads were broken into three categories – Improved reliability, Mission-Critical resiliency and then, Zero downtime.

Grouped within “Improved reliability” were Windows and Linux solutions with ProLiant servers, and within “Mission-Critical resiliency” was HP-UX with Integrity servers. When it came to the best fit solutions for critical workloads that mandated “Zero downtime”, there was only NonStop. Yes, mission-critical remains a very important classification and from HP there comes the Integrity NonStop systems whenever zero downtime is a requirement. “Everything we develop at comForte,” Burg added “is focused on how best to maximize availability and for comForte, ensuring we do not contribute to any lessening of the NonStop server’s ability to support mission-critical applications remains an important consideration.”

Again, for every one of us who has worked with NonStop, these are not new or surprising assertions. Mission-critical is important but it covers a very broad spectrum of application scenarios, so much so, that for many within the industry it’s hard to exclude any application from being considered mission-critical. But there are some applications where failure and outages really cost the business its reputation, its customers and its bottom line. For this, thankfully, there is NonStop.

I guess I must have stepped out of the meeting, or somehow missed the key determination that yes, when facing thousands of users depending on the instant access to their cash, “these things do happen” and almost is good enough would have to do. Again.

However, mission-critical is just too important an issue today to be ignored and becoming satisfied with just three nines, or even fewer, of availability will hurt any business and influence customer allegiances long after public apologies have faded from our memories. But it still leaves you wondering why so many business executives simply don’t get it, and why more CEO’s are pushing back and saying no! These things should never happen, at any time!


  1. Almost became good enough when MBAs got involved in software development (i.e. prodding product/development management). Then it became "look god" instead of "being good" (so bonuses et al are met).

  2. IMHO, the average programmer today, cares little about dealing with exceptions and reporting same, and having a program survive this or that. Because of this, it becomes more and more complicated to isolate what causes a program or application to fail.

    This is WHY these things are happening, and will continue to happen even more so, in the future!

  3. The programmers I know would tend to disagree with this observation – they are definitely passionate about their craft. On the other hand, if this is the case then these programmers should look for alternate careers ...

    The career path for programmers is based on exceptional skills and those with average ambitions, looking it as a job and nothing more, will always struggle even as the work they turn in ends up being expensive to maintain ...

    But again, that's my observation but I may be the exception here on this ...