[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Web server manager speculates on Mizuho Bank service outage

From: Akira Urushibata
Subject: Web server manager speculates on Mizuho Bank service outage
Date: Wed, 3 Mar 2021 07:41:20 +0900 (added by address@hidden)

On Sunday February 28 (Japan time) the ATM network of Mizuho Bank,
Japan's third largest bank suffered a massive service outage.  More
than 4,000 thousand machines stopped working.  There were more than
5,000 cases in which the affected machines stopped processing after
taking the customers' cash cards leaving them incapable of retrieving
them.  Problems persisted during the next Monday.

Mizuho Bank explains that an internal data relocation process
involving 700,000 saving accounts overloaded the entire system and
affected ATM cash withdrawals.  The bank's chairman apologized for
failing to anticipate the effect of the procedure on the system.


A comment to a Japanese-language online article of the snafu, posted
by an engineer who has been maintaining web serves for financial
organizations for nearly twenty years caught my attention.

He (or she) is not working for Mizuho Bank and admits he can only
speculate what actually happened, but his story sounds familiar.
I'd like to share the tale with you through a rough translation.

  Some companies do not take their system administration and
  maintenance sections in high regard.  When engineers do their job
  right, there are no problems.  Management sees this, erroneously
  concludes that the maintenance crew is "doing nothing" worthy of
  their expensive salaries, and decides to axe them.
  The nasty part of this is that problems won't surface for a while
  thanks to the fine job done by the former crew.  Problems may occur
  but they can be dealt with using makeshift solutions improvised by
  less qualified engineers.  At this point it appears that the
  management did the right thing by laying off highly-paid specialists.
  Each fix may seem small, but in the aggregate they turn the system
  into chaos.  At some point a "last straw" is thrown upon the pile and
  breaks the mule's back.  The system breaks down in a calamity
  affecting a large number of clients.  Management may hire expert
  engineers to deal with the emergency but with the system in such a
  mess even they can't figure things out.

It was an episode like this that spurred me to look for better ways to
do things and eventually I noticed free software.

Experience tells me that the likes of the less qualified crew
described above are not likely to keep proper records of the changes
made.  When the emergency team arrives, lack of documentation becomes
a serious obstacle.  Paradoxically this puts the less qualified crew
who made the mess in an advantageous position.  The emergency team
members have to beg to get necessary information; they can't say or
do things that would offend them lest they stop providing information.
Indeed those that caused the mess may even argue that the newly hired
hands are technically inferior, with claims such as "those outside guys
do not know real systems."

reply via email to

[Prev in Thread] Current Thread [Next in Thread]