|
Date: 27th Feb 2010
Early detection technology for cloud computing
system-failure from Fujitsu
Fujitsu Laboratories has developed a technology to detect
system failures before they happen, by improving the ability
to analyze cloud system data and gather information, narrowing
down the causes of failures, and automatically resolving
them. This new technology reduces the workload of administrators
and allows users the utilize the cloud with confidence.
Technologies such as this improves reliability and stability
to cloud systems
Fujitsu Laboratories has developed two technologies to
detect signs of failures depending on the type of failure.
(1) Detection of failures through the analysis of system
messages:
This technology focuses on specific patterns in messages
that are generated just before failures occur and detects
warning signs. By comparing the pattern of generated messages
with messages from previous system failures, the technology
can pick up on signs of failure.
(2) Detection of potential failures that do not generate
messages:
When configuring equipment such as servers, human error
can lead to the input of incorrect settings. In this kind
of situation, the server will operate according to the settings
and may not generate any error messages. An effective method
for detecting failures in this instance is to gather and
analyze data packets that travel across networks that link
servers and systems, and then analyze minor changes on the
packet level - such as data loss, resent packets and transmission
delays. In order to monitor large-scale systems that are
involved in cloud computing, Fujitsu Laboratories has developed
a technology that is compatible with 10Gbps high-speed communication
technology, and which detects network and server system
failures in real time.
2. Narrows down causes of failures
The technology scans through detected signs pointing towards
system failure and makes inferences about the most likely
areas that have generated these signs. Using the observed
symptoms as a point of origin, the technology employs network
and system configuration information to trace the symptoms'
causes. It then overlays the results of evaluations taken
from multiple points of origin, generating inferences about
the most likely causes based on the areas with the most
overlap or with no proper activities.
3. Resolves causes of failures
The system leverages past knowledge of how to deal with
system failures, including system log information, and presents
administrators with the most suitable methods for dealing
with the determined causes of the failures. Due to the fact
that previous failures will often occur again, the system
stores previous cases of system failures and the procedure
history to resolve them in its knowledge base, so that it
can quickly determine a solution in order to resolve the
cause of the failures.
|