Title: Cooperative Recovery of Multiple-Node Failures in Distributed Storage Systems
Abstract: In cloud storage systems, data are encoded in order to guarantee high data reliability. Regenerating codes is a new class of erasure codes with two design objectives. Firstly, a data collector should be able to decode the original data by connecting to certain number of storage nodes. It does not matter which storage nodes the data collector download data from. The original data should be successful decoded as long as the number of connections reaches some threshold value. Secondly, upon the failure of some storage nodes, the failed nodes can be repaired by contacting some surviving nodes. The total traffic generated in the repair process is called the repair-bandwidth. As the amount of data stored in a storage node is large, we want to
minimize the repair-bandwidth. Most of the studies on regenerating codes focus on single-failure recovery, but it is not uncommon to see two or more node failures at the same time in large storage networks. To exploit the opportunity of repairing multiple failed nodes simultaneously, a cooperative repair mechanism, in the sense that the nodes to be repaired can exchange data among themselves, is investigated. In this talk, we present the fundamental tradeoff between the repair-bandwidth and the amount of encoded data in each storage node. The result is derived by reducing the problem to a single-source multi-casting problem in network coding.
Biography：K. W. Shum received the B.Eng. degree in Information Engineering from the Chinese University of Hong Kong, and the Ph.D. degree in Electrical Engineering from University of Southern California in 2000. He is now a research fellow in the Institute of Network Coding, the Chinese University of Hong Kong. His research interests include coding for distributed storage system and sequence design.