MD-POR: Multisource and Direct Repair for Network Coding-Based Proof of Retrievability

When data owners publish their data to a cloud storage, data integrity and availability become typical problems because the cloud servers are never trusted. To address these problems, researchers proposed the Proof of Retrievability (POR) protocol which allows a verifier to check and repair the data stored in the cloud servers. Based on the POR protocol, the network coding technique is commonly applied to increase the efficiency in data transmission and data repair. However, most previous schemes neither consider a practical scenario nor use the network coding efficiently. In this paper, a lightweight network coding-based POR scheme, called MD-POR (Multisource and Direct Repair for Proof of Retrievability) is proposed. Unlike previous schemes, the proposed MD-POR scheme allows multiple clients who have different secret keys to participate in the scheme. Moreover, the MD-POR scheme supports the direct repair feature in which a corrupted data can be recovered by the servers without burdening the clients. The MD-POR scheme also supports public authentication feature in which a third party auditor is employed to check the servers, and the client is thus free of the responsibility of periodically checking the servers. Furthermore, the MD-POR scheme is constructed based on a symmetric key setting.


Introduction
Since data is increasing exponentially, database owners trend to publish their data to storage providers called clouds in order to reduce the burden of data storage and maintenance.Clients can thus access, manage, and share their data from anywhere via the Internet.However, such service providers are untrustworthy and present three basic challenges to data security: (i) integrity, (ii) availability, and (iii) confidentiality.In confidentiality, there are two research approaches: the cryptographic approach (e.g., RSA) and the information-theoretic approach (e.g., secret sharing scheme).Compared to the cryptographic confidentiality approach, the information-theoretic confidentiality approach achieves a security level determined by a threshold.We choose the information-theoretic approach because our security analysis derives purely from information theory.In this paper, we deal with integrity, availability and information-theoretic confidentiality.
To check the cloud servers, researchers proposed the Proof of Retrievability (POR) protocol [1][2][3] that enables the servers (provers) to demonstrate to the verifier whether the data stored in the servers is intact and available and enables the clients to recover the data when an error is detected.Based on the POR protocol, the integrity and availability assurance are mainly based on three techniques: replication [4], erasure coding [5], and network coding [6][7][8][9].In the replication technique, the client stores file replicas in each server.When a corrupted server is detected, the client uses one of the healthy replicas to repair it.However, the drawback of this technique is high storage cost because the client must store a whole file in each server.Erasure coding technique is then applied to reduce the storage cost.Erasure coding allows the client to store file blocks in each server redundantly instead of file replica as replication.However, when the corrupted data is repaired, the client has to retrieve the entire original file before the client generates new coded blocks.Therefore, its computation and communication costs are increased during data repair.Network coding technique is then applied to improve the efficiency in the data repair.The main advantage of network coding is that the client does not need to retrieve the entire file before the client generates new coded blocks.
International Journal of Distributed Sensor Networks Consequently, in this paper, we focus on the network coding technique.Our goal is to construct a network-coding POR which satisfies the following aims.
(i) Practical scenario: the system should consist of multiple clients, each client keeps a different secret key.This is because in many distributed storage systems today such as Dropbox, each client has a personal data; and hence, each client should use his own secret key to satisfy integrity and confidentiality.(ii) Lightweight: firstly, the clients should be free of two heaviest tasks: periodically checking the servers and repairing the corrupted servers.Secondly, the system should be constructed in a symmetric key setting which is a well-known lightweight cryptography rather than an asymmetric key setting.
Network Coding-Based POR Schemes.A few notable networkcoding PORs were proposed.Dimakis et al. [10] were the first applying network coding to the distributed storage system.Li et al. [11] proposes a tree-structure data regeneration for the network coding to optimize network bandwidth by using a maximum spanning tree.Chen et al. [12] then adapted the scheme of Dimakis et al. to propose the Remote Data Checking for Network Coding-based distributed storage system (RDC-NC) scheme which provides an elegant data repair by recoding encoded blocks in healthy servers during repair.Cao et al. [13] applied the Luby transform (LT) code for reducing the computation cost because the LT code is a special network code which works in the finite field of order two and only uses exclusive-OR (XOR) operation.Chen et al. [14] proposed the NC-Cloud scheme to improve the cost-effectiveness of repair using the functional minimum storage regenerating (FMSR) code and lighten the encoding requirement of storage nodes during repair.However, all these schemes cannot hold our aims.These system models only have a single client.Furthermore, the check and repair phases in these schemes bring a lot of burden to the client because (i) the client has to periodically check the servers and (ii) when a corrupted server is detected, the healthy servers provide their blocks to the client; the client then has to verifies them, computes the new blocks, and sends these new blocks to the new server.Le and Markopoulou after that proposed the NC-Audit scheme [15] in which a third party auditor is employed and is delegated the responsibility to check the servers instead of the client.The authors also discussed a new repair mechanism in which the new server can compute the new blocks by itself without the need of the client.We call that mechanism as direct repair.Unfortunately, their direct repair is not completed because they mainly focused on how to prevent the data leakage from the third party auditor.Furthermore, their scheme is constructed in an asymmetric key setting and does not deal with multiple clients.
Contribution.In this paper, a new network-coding POR named as MD-POR is proposed.To the best of our knowledge, we are the first to propose a symmetric key settingbased direct repair for the POR; furthermore, the proposed MD-POR scheme also supports multiclient and public authentication.
(i) Direct Repair.If a corrupted server is detected, the healthy servers are required to provide their coded blocks directly to the new server instead of sending these coded blocks back to the client.Afterwards, the new server verifies the coded blocks it received and computes the new coded blocks for itself without disturbing the client.This mechanism can reduce the communication cost and the burden for the client.
(ii) Multiclient.To enable multiple clients, our method does not simply duplicate the process of a single client to multiple parallel processes for multiple clients.Instead, in the proposed MD-POR scheme, the processes of multiple clients are mixed together without loosing the data confidentiality of individual clients.
To enable such a multiclient setting, we employ the InterMac technique [16] which was proposed for network scenario.The InterMac technique allows multiple sources to send their packages to the network using different secret keys and allows the recipients to verify the packages they received.
(iii) Symmetric Key Setting.The MD-POR scheme uses only secret keys without any public key, unlike an asymmetric key setting.
(iv) Public Authentication.Not only the client but also any entity who has a given information can check the cloud servers while learning nothing about the secret key of each client.We employ a third party auditor (TPA) on behalf of the clients to check the servers periodically.By delegating the responsibility of checking the servers to the TPA, the clients are free of the burden of checking the servers.Otherwise, for the nonexistence of TPA, the clients have to periodically check the servers, and the public authentication feature cannot be supported because only the clients can check the servers.Although the MD-POR scheme supports public authentication, our method does not use an asymmetric key setting.
Organization.The system model, the backgrounds of the Proof of Retrievability, the network coding technique, the InterMac technique, the notations, and definitions are described in Section 2. The adversarial model is presented in Section 3. The MD-POR scheme is proposed in Section 4. The security analysis and efficiency analysis are given in Section 5 and Section 6, respectively.The performance evaluation of the MD-POR scheme is shown in Section 7. The conclusion and future work are drawn in Section 8.

Preliminaries
2.1.System Model.The system model of the MD-POR scheme is depicted in Figure 1.There are three types of entities.
(i) Clients: these entities have data to be stored in the cloud and rely on the cloud for data storage, computation, and maintenance.These clients can be either enterprises or individual customers.
(ii) Cloud servers: the cloud servers are managed and monitored by a cloud service provider to accommodate a service of data storage and have significant and unlimited storage space and computation resources.
In the cloud storage service, the clients can store their data into a set of servers in a simultaneous and distributed manner.
(iii) Third party auditor (TPA): this entity is delegated the responsibility to check the servers on behalf of the clients.The TPA is assumed to be trusted to perform the task of periodically checking the servers.
Originally, the system model which consists of only the client and the servers without the TPA is enough for data check.To enable the public authentication feature, the TPA is employed with an assumption that the TPA is a honestbut-curious entity.Several previous papers also use the same assumption of the TPA, for example, [15,[17][18][19].

Proof of Retrievability (POR).
To check the servers, researchers proposed the Proof of Retrievability (POR) [1][2][3] which is a challenge-response protocol between a verifier (client) and a prover (server).The POR has four phases as follows.
(1) keygen(1  ): given a security parameter , the client runs this algorithm to generate a secret key (sk) and a public key (pk).For the symmetric key setting, pk is set to be null.
(2) encode(sk, ): the client runs this algorithm to encode an original file () to an encoded file (  ) and then sends   to the server to store.
(3) check(sk): the client uses his secret key sk to generate a challenge () and sends  to the server.The server then computes a response () and sends  back to the client.Finally, the client verifies whether the file  is intact based on  and .
(4) repair(): the client runs this algorithm only when a failure is detected in the check phase.The technique of the repair phase depends on each specific technique, for example, replication, erasure coding, or network coding.
To be suitable for our system model, we modify the POR such that the verifier is the TPA and there are multiple clients as follows.
(1) keygen(1  ): given a security parameter , the algorithm generates a set of secret keys {sk  } ∈{1,...,} for  clients and a secret key  for the TPA.
(2) encode(sk  ,   ): each client  uses his secret key sk  to encode his original file   to an encoded file    and then sends    to the servers.Each server then linearly combines all    ( ∈ {1, . . ., }) and stores the combined blocks.
(3) check(): the TPA uses his key  to generate a challenge  and sends  to the servers.Each server then computes a response  and sends  back to the TPA.Finally, the TPA verifies whether each   is intact or not.(4) repair(): this algorithm is executed when a failure is detected in the check phase.The technique of the repair phase depends on each specific scheme.

Network Coding.
Network coding [6][7][8][9] is commonly used in network transmission to obtain a good trade-off in term of bandwidth and data repair.Network coding is proposed firstly for the network scenario.It then is applied to the distributed storage system scenario.
Fundamental Concept.In the network scenario, suppose that a source node  wants to send its message to a receiver node .Before transmitting,  breaks the message into  blocks V 1 , . . ., V  ; each file block belongs to F   where F   denotes a -dimensional vector space over a finite field F with a prime . augments each file block V  ( ∈ {1, . . ., }) with a vector of length  in which a single "1" is in the th position and "0's are elsewhere.Let  1 , . . .,   be the augmented blocks.Each augmented block has the following form: These augmented blocks are then sent as packets to the network.When an intermediate node  in the network receives  packets,  will generates  coefficients, linearly combines  packets using the generated coefficients, and transmits the result to its adjacent nodes.Consequently, the receiver node  can receive combinations of all augmented blocks. can recover  augmented blocks using any set of  combinations.Suppose that  receives  packages  1 , . . .,   ∈ F +  , and  solves all  augmented blocks  1 , . . .,   ∈ F +  using the accumulated coefficients which are contained in the last  coordinates of each package .Afterwards, the file blocks V 1 , . . ., V  can be obtained from the first coordinate of each augmented block.Finally, the original message can be reconstructed by concatenating all file blocks.

International Journal of Distributed Sensor Networks
Application in Distributed Storage System.In the network scenario as described above, there are multiple types of entities: source node, intermediate nodes, and receiver node.However, when the network coding is applied to the distributed storage system scenario, there are two types of entities: a client and servers.Suppose that a client has the original file  which consists of  file blocks (V 1 , . . ., V  ).The client wants to store redundantly encoded blocks in the servers in a way that the client can reconstruct the original file  and can repair the encoded blocks in a corrupted server.From these file blocks, the client firstly creates  augmented blocks ( 1 , . . .,   ).The client then chooses  coding coefficients ( 1 , . . .,   ∈ F  ) and computes coded blocks using the linear combination as  = ∑  =1   ⋅   and then stores these coded blocks in the servers.To reconstruct the original file , any  coded blocks are required to solve  augmented blocks  1 , . . .,   using the accumulated coefficients contained in the last  coordinates of each coded block.After these  augmented blocks are solved,  file blocks V 1 , . . ., V  are obtained from the first coordinate of each augmented block.Finally, the original file  is reconstructed by concatenating the file blocks.Note that the matrix consisting of the coefficients used to construct any  coded blocks should have full rank.Koetter and Medard [20] proved that if the prime  is chosen large enough and the coefficients are chosen randomly, the probability for the matrix having full rank is high.Once a corrupted server is detected, the client repairs it as follows: the client retrieves coded blocks from the healthy servers and linearly combines them to regenerate new coded blocks.An example about the data repair of network coding is given in Figure 2.

InterMac.
Before describing how the InterMac works, we explain why it is used in our proposed MD-POR scheme as follows.We consider a network in which multiple sources are simultaneously supported and each source owns a different secret key.The data of each source cannot be checked alone.Instead, each source uses the secret key to compute an additional information which is Message Authentication Code (MAC) for each data block.A MAC is also called as tag.Each source then transmits the packets consisting of the data blocks and the corresponding tags to the next adjacent node in the network.A node in the network will linearly combine the received blocks and the homomorphic tags.Herein lies the difficulty of the task: when a recipient node receives a packet, how can this node verify the received linear blocks based on the linear homomorphic tags without any information about any of the secret keys.The traditional methods, that is, MAC or HMAC, are inadequate to solve this task.Some recent schemes related to this problem have been proposed, for example, [21][22][23]; unfortunately, they all use an asymmetric key setting, which is not our aim.
The InterMac technique [16] is a suitable technique to generate such secret keys for multiple sources.The characteristic of this technique is that the key of the source C  ( ∈ {1, . . ., } where  denotes the number of sources) is orthogonal to all the augmented blocks which do not belong to C  .This characteristic can help the verifier check the received packets without needing the information on any of the secret keys.
Construction.Let  11 , . . .,   ∈ F +  be the augmented blocks that have span , and let them represent as row vectors (where  denotes the number of sources,  denotes the number of file blocks per source, and  =  ⋅ ).For each  ∈ {1, . . ., }, let   be the matrix whose rows are vectors in the following set: In other words,   is the matrix consisting of the augmented blocks of all other sources except C  .rank(  ) =  − .Let    denote the space spanned by the rows of   .

Notations and Definitions.
Throughout this paper, the list of notations and definitions is given in Notation section.

Adversarial Model
In the MD-POR scheme, only the clients are trusted because they are the data owners.The following entities are untrusted and considered to be adversaries: (i) attackers outside the system; (ii) the cloud servers in the system; (iii) the TPA in the system (the TPA is assumed not to collude with the servers.We explained about this assumption in Section 2.1).
Concretely, the adversaries can perform the following the attacks.

Mobile Attack. This attack is performed by an adversary
A outside the system.A potentially corrupts all the servers across the full system lifetime.A restriction on A is that he/she can control only (ℎ − ) out of ℎ servers in any given

Client-side
Server-side

New coded block
Figure 2: From three augmented blocks { 1 ,  2 ,  3 }, the client computes six coded blocks and stores two coded blocks in each of servers  1 ,  2 ,  3 .Suppose that  3 is corrupted, the client requires  1 and  2 to create new blocks using linear combination, and then the client mixes them using linear combination to obtain two new coded blocks and stores them in the new server.time step.Let epoch denote a given time step.In each epoch, the servers are checked.If a corruption is detected on a certain server, the blocks stored in that corrupted server will be repaired from redundancy in the intact servers.Without the server checks, the adversary A can corrupt all the servers of the system in ℎ/(ℎ − ) epochs.

Curious Adversary.
This attack is performed by the TPA or a new server.In the check phase, the TPA is given a key  which is constructed from all the secret keys of the clients.In the repair phase, a new server is given another key   which is also constructed from all the secret keys of the clients.When they are given their keys, these adversaries try to learn the secret keys because once all secret keys are obtained, these adversaries can fake a valid response when they are checked.

Response Forgery.
This forgery is performed by the servers.In the check phase, the verifier checks all the servers to ensure that they are not corrupted.Each server has to send a response to the verifier in order to demonstrate that the server is healthy.However, a checked server may forge the response to deceive the verifier.If the forged response from the adversarial server satisfies the verification, that server can pass the check phase.

Pollution Attack.
This attack is performed by the servers.The purpose of this attack is to break the linear independence of the encoded blocks.In a network, if a node is malicious and forward invalid package, receivers then obtain multiple packets and cannot tell which of their received packets are corrupt.In other words, the purpose of this attack is to inject invalid packets to prevent data recover.In the POR, this attack happens when a malicious server uses correct data to pass the check phase but then provides invalid data in the repair phase.For example, the client encodes the augmented blocks  1 ,  2 , and  3 to six coded blocks:  11 ,  12 (stored in the server  1 ),  21 ,  22 (stored in the server  2 ), and  31 ,  32 (stored in the server  3 ).In the check phase, suppose that  3 is detected as being corrupted.Then, in the repair phase,  3 should be repaired using two coded blocks:   31 (which is a linear combination of  11 and  12 ) and   32 (which is a linear combination of  21 and  22 ).However, at this time,  1 is malicious without being detected because this time is the repair phase, not the check phase any more.The client still thinks  1 is healthy; thus, to recover  3 , the client requests coded blocks from  1 and  2 but  1 will provide an invalid coded blocks   31 to the client instead of   31 .

The Proposed MD-POR Scheme
Before describing the proposed MD-POR scheme in detail, the technical roadmap is depicted in Figure 3.The file blocks are used to generate the augmented blocks.Then, the augmented blocks are combined with random values to compute the keys.Meanwhile, the augmented blocks are linearly combined into the coded blocks using the network coding.Finally, the coded blocks are tagged using the keys.The coded blocks and the tags are the outputs.The network coding is used because it is related to the repair feature (Section 2.3).The InterMac is used because it is related to the multiuser feature (Section 2.4).Both the network coding and the InterMac are constructed based on linear combinations; therefore, they are suitable to combine together in the proposed scheme.Let C 1 , . . ., C  be the set of  clients.Each client C  ( ∈ {1, . . ., }) keeps a secret key   and has a file   = (V 1 , . . ., V  ) where  is the number of file blocks.Each file block V  ∈ F   ( ∈ {1, . . ., }).C  creates  augmented blocks ( 1 , . . .,   ) from  file blocks (V 1 , . . ., V  ).Each augmented block   has the form as in [16]   = (V  , 0, . . ., 0 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ,  ⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞ 0, . . ., 0, 1, 0, . . ., 0 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟  , 0, . . ., 0 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ where  ∈ {1, . . ., },  ∈ {1, . . ., }, and  = .Each client C  uses his secret key   to compute the tag   for each augmented blocks   .The augmented blocks and the tags are then linearly combined and transmitted to all the servers.In every epoch, when the servers are checked by the TPA, the servers have to combine the coded blocks and the tags again and send them back to the TPA.The TPA can finally verify the aggregated coded blocks and the tags International Journal of Distributed Sensor Networks even though the TPA does not know any secret key {  } ( ∈ {1, . . ., }).
The proposed MD-POR scheme is now described in detail via each phase of the POR as follows.Using the InterMac (Section 2.4), a key set { 1 , . . .,   } is created.Then, each   ∈ F +  is assigned to the client C  as the secret key, and the sum of all the keys  =  1 + ⋅ ⋅ ⋅ +   ∈ F +  are assigned to the TPA via a secure channel.The security of the secret keys will be proved later.

Dynamic Keys for a New Server (Keygen2).
When a repair phase is executed, the new server will be given a key   = ( 1 +⋅ ⋅ ⋅+  )+ repair = + repair .The new server will use the key   to check pollution attack during the repair phase. is already computed in Keygen1.Only  repair is different in each repair time.This is to ensure that an adversary cannot attack the new server to obtain  repair for passing the pollution attack check in the later repair phases (we thereafter explain in Section 5.4).When  repair is constructed in the first time, the basis of  1 , . . .,  + is computed and saved for the later times.In the next repair times, the basis will be reused to save the computation cost, and only the random coefficients   are regenerated again to compute   .
repair has to be orthogonal to all augmented blocks of all the clients.Keygen2 is quite similar to Keygen1.However, the different thing is that  ∉ {1, . . ., },  is randomly chosen in F  such that  >  in every repair time.Since  repair is orthogonal to all augmented blocks of all the clients,   is now the matrix consisting of all the augmented blocks of all the clients.Put differently, the rows of   are vectors in the following set: The set consists of  =  augmented blocks and each augmented block belongs to F +  .For the  × ( + ) matrix   , the rank-nullity theorem yields rank (  ) + nullity (  ) =  + .
Since rank(  ) = , thenullity(  ) is nullity(  ) =  +  −  = .The basis of the null space of   is now { 1 , . . .,   }.Let   be another PRF: K× (P × [1, ]) → F  , where P denotes the domain of 's space.The following steps are used to generate the key   : (i)   ← (, , ) ∈ F  , ∀ ∈ {1, . . ., }; Let  repair denote   (to distinguish with the notation   from the Keygen1).The Keygen2 is only executed and   =  +  repair is given to a new server only if a repair phase happens.The key  is already computed in the Keygen1 as a static information, and only  repair is different in each repair time.

Correctness of the Verification ( * )
Consider As described in Section 4.1.(iv) send the package consisting of {  ,   } to    .
Step    ⋅   ⋅   .To recover the original files,  augmented blocks ( 11 , . . .,  1 , . . .,  1 , . . .,   ) are viewed as the variables that need to be solved.To solve such  variables, at least  coded blocks are needed such that the coefficient matrix has full rank because the number of variables in an equation International Journal of Distributed Sensor Networks system has to be less than or equal to the number of independent equations:

Security Analysis
Therefore, at least  servers which collectively store  =  ⋅  coded blocks in each epoch are required.⌈/⌉ ≤  < ℎ.

Security against Curious Adversaries.
The following theorem gives the probability of the adversary to recover the secret keys and shows that the probability is negligible.
Theorem 2. The secret keys of the clients are secured from the TPA and the new server.
Proof.The TPA checks ℎ servers ( 1 , . . .,  ℎ ) in the check phase using the key  =  1 + ⋅ ⋅ ⋅ +   .Similarly, the new servers    check  healthy servers in the repair phase using the key   = ( 1 + ⋅ ⋅ ⋅ +   ) +  repair .The problem of security is now the problem of solving  variables (in the case of the TPA) and +1 variables (in the case of    ) given one equation.The only method to solve these variables is to try all possible variable sets and test whether they satisfy this equation by using trial-and-error method with brute-force search.Let K denote the key space.Each   ( ∈ {1, . . ., }),  repair , , and   belong to the finite field F +  (which has ( + )log 2  bit-length), and therefore K =  + .The number of testing times is (K) −1 in the case of the TPA and (K)  in the case of    .Therefore, the probability for choosing  variables is 1/ (+)(−1) in the case of the TPA and the probability for choosing  + 1 variables is 1/ (+) in the case of    .If  is chosen as a large prime (e.g., 160 bits),  1 , . . .,   , and  repair cannot be solved in a polynomial time.Ergo, the probability of TPA and    are negligible.

Security against Response
Forgeries.After controlling   , suppose that, in the check phase, the adversary A sends a pair of forged coded block and forged tag (   ,    ) to the TPA, instead of a valid pair of (  ,   ).

Theorem 3. The advantage of a forgery adversary to pass the check phase is
Proof.To be able to generate (   ,    ) which holds the verification    =    ⋅ , the adversary A has to obtain .Since the TPA is assumed not to collude with any server and  is sent to A though a secure channel, a possible way for A is to attack the Keygen1 in which the key   of C  ( ∈ {1, . . ., }) is computed as (i)   ← (, , ) ∈ F  , ∀ ∈ {1, . . .,  + }; The advantage of A on   is Adv A (PRF).Since   ∈ F +  , the advantage of A on   is 1/ + .The advantage of A on  = ∑  =1   is 1/ (+) .Therefore, Adv A (verify) = Adv A (PRF) + 1/ (+) .If  is unforgeable and  is chosen large enough, for example, 160 bits, the advantage of A is negligible: Adv A (verify) < .

Security against Pollution
Attack.Suppose that the server   is checked as a corrupted server and   1 , . . .,    are checked as healthy servers in the check phase.Then,   1 , . . .,    are required to repair   by providing their coded blocks and tags to the new server    .In the repair phase, the adversary A attacks    (  ∈ { 1 , . . .,   }) and then provides an invalid packet to the new server    (pollution attack).Similar to Theorem 2, the advantage of A to pass the pollution attack check (Step 2 in the repair phase) is The different thing is that the advantage of A on   =  repair + ∑  =1   is 1/ (+)(+1) , not 1/ (+) as Theorem 2 because the adversary does not own   .
We also consider a stronger adversary A who attacks    right after the repair phase to steal   from    .A then uses   to pass pollution attack check in another later repair phases.However, since  repair is different in each repair time as explained in the Keygen2 (Section 4.1.2),the advantage for A to guess  repair is Adv A (PRF) + 1/ (+) .

Efficiency Analysis
Table 1 compares the features and efficiency of the proposed MD-POR scheme with some previous schemes.The RDC-NC [12] and NC-Audit [15] schemes are chosen for the comparison because they have the same scenario as the MD-POR scheme at most.One notable thing is that because the RDC-NC and NC-Audit schemes only consider a single client unlike the MD-POR scheme, we assume that  clients participate in the RDC-NC and NC-Audit schemes so that the comparisons are fair.However, these  clients in the RDC-NC and NC-Audit schemes can only perform in parallel instead of simultaneously combination as the MD-POR scheme.That parameter  in the RDC-NC and NC-Audit schemes does not affect the checking and repairing complexity because only one client can check and repair the servers.That  only affects the storage cost on server-side and the communication cost of the encode phase in the RDC-NC and NC-Audit schemes.6.1.Storage Cost 6.1.1.Client-Side.In the RDC-NC scheme, because the client keeps five secret keys in F +  , the client storage is (5( + )log 2 ).In the NC-Audit scheme, because the client keeps only one secret key in F +  , the client storage is (( + )log 2 ).Meanwhile, the MD-POR scheme has  keys for  clients, each in F +  , and thus the storage cost per client is (( + )log 2 ).

Server-Side.
The size of a file block is |V| = ||/.The form of an augmented block is   = (V  , = ⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞ 0, . . ., 0, 1 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟  , 0, . . ., 0) as indicated in Section 2.3.In the RDC-NC and NC-Audit schemes, since  = 1, the size of an augmented block is || = ||/ + .In the MD-POR scheme, since  = , the size of an augmented block is || = ||/ + .Furthermore, the size of a coded block is || = || because each coded block is a linear combination of augmented blocks.The number of servers is ℎ.Each server stores  coded blocks. clients are assumed to participate in the RDC-NC and NC-Audit schemes in parallel.Therefore, the server storage in the RDC-NC and NC-Audit schemes is (ℎ(||/ + )).The server storage in the MD-POR scheme is (ℎ(||/ + )).
6.1.3.TPA-Side.The RDC-NC scheme does not have a TPA.In the NC-Audit scheme, the TPA not only keeps a key in F +  for verification (which is (( + )log 2 )) but also stores the coding coefficients in F  which are used to compute all coded blocks (which is (ℎlog 2 )).Hence, the total TPA storage in the NC-Audit scheme is (( +  + ℎ)log 2 ).In the MD-POR scheme, the TPA is given  =  1 + ⋅ ⋅ ⋅ +   ∈ F +  (Section 4.1.1).In other words,  ∈ F +  .The TPA storage in the MD-POR scheme is thus (( + )log 2 ).
6.2.Encoding Cost 6.2.1.Computation on Client-Side.In the RDC-NC and NC-Audit schemes, during the encode phase, each client combines  augmented blocks (which is ()) to create ℎ coded blocks in order to store  coded blocks in each of ℎ servers.The cost in these schemes is thus (ℎ).In the MD-POR scheme, each client only needs to combine  augmented blocks (which is ()) and distributes the result to all the servers.The servers will create coded blocks by themselves.The cost in the MD-POR is thus ().

Computation on Server-Side.
In the RDC-NC and NC-Audit schemes, the servers do not need to do anything and only need to receive the coded blocks computed by the clients.The cost in these schemes is thus (1).In the MD-POR scheme, each of ℎ servers combines  coded blocks from the clients and computes  coded blocks for itself.The cost in the MD-POR is thus (ℎ).

Computation on TPA-Side.
In the RDC-NC scheme, the TPA does not exist.In the NC-Audit and MD-POR schemes, the TPA does nothing during the encode phase; and the costs are thus  (1).
International Journal of Distributed Sensor Networks 6.2.4.Communication.In the RDC-NC scheme, the client creates ℎ coded blocks and sends  coded blocks to each of ℎ servers.The size of a coded block in these scheme is (||/+) as mentioned in Section 6.1.2.The number of clients is .Therefore, the communication cost is (ℎ(||/ + )).In the NC-Audit scheme, the communication is also similar to the RDC-NC scheme.However, the difference is that the client in the NC-Audit scheme not only sends the coded blocks to the servers, but also sends all ℎ coefficients which are used to create the coded blocks to the servers.The cost in the NC-Audit scheme is thus (ℎ(||/ + ) + ℎ).In the MD-POR scheme, each of  clients sends the aggregated coded block to each of ℎ servers.The size of a coded block in the MD-POR scheme is (||/ + ) (see (5)).The cost in the MD-POR scheme is thus (ℎ(||/ + )).
6.3.Checking Cost 6.3.1.Computation on Client-Side.In the RDC-NC scheme, the client receives the aggregated coded block from each of ℎ servers and verifies each of them using his/her secret key; the cost is thus (ℎ).In the NC-Audit and MD-POR schemes, the TPA will check the servers instead of the client.The cost in the NC-Audit and MD-POR schemes is thus (1) on the client-side.

Computation on Server-Side.
In all three schemes, each of ℎ servers combines its  coded blocks to send the result (an aggregated coded block) back to the verifier.The verifier is the client in the case of the RDC-NC scheme and is the TPA in the case of the NC-Audit and MD-POR schemes.The cost in all three schemes is (ℎ).

Computation on TPA-Side.
In the RDC-NC scheme, the TPA does not exist.In the NC-Audit and MD-POR schemes, the TPA verifies the aggregated coded block which is accommodated from each of ℎ servers.Each verification only takes one operation.The cost in the NC-Audit and MD-POR schemes is (ℎ).

Communication.
In the RDC-NC and NC-Audit schemes, during the check phase, each of ℎ servers sends its aggregated coded block to the client.The size of that coded block is (||/ + ).The cost in these schemes is thus (ℎ(||/ + )).In the MD-POR scheme, the mechanism is the same as the RDC-NC and NC-Audit scheme, but the different thing is that the size of a coded block in the MD-POR scheme is (||/ + ).The cost in the MD-POR scheme is thus (ℎ(||/ + )).

Repairing Cost
6.4.1.Computation on Client-Side.In the RDC-NC scheme, in the repair phase, the client firstly has to check pollution attack in  coded blocks which are provided from  healthy servers (which is ()).Thereafter, the client computes  new coded blocks for the new server by combining  provided coded blocks (which is ()).Hence, the computation cost on the client-side in the RDC-NC scheme is ((+1)).In the NC-Audit and MD-POR schemes, the client(s) does nothing.

Computation on Server-Side.
In the RDC-NC scheme, each of  healthy servers is required to combine its  coded blocks.Therefore, the computation cost on the server-side is ().The cost in the new server is N/A because the direct repair feature is not supported in the RDC-NC scheme.In the NC-Audit and MD-POR schemes, not only  healthy servers combine their coded blocks (which is ()) but also the new server computes its  new coded blocks by combining  provided coded blocks (which is ()).

Computation on TPA-Side.
The RDC-NC scheme does not have a TPA.In the NC-Audit scheme, the TPA has to check pollution attack in  provided coded blocks (which is ()).In the MD-POR scheme, the TPA does nothing because the new server will check pollution attack, not the TPA as the NC-Audit scheme.Therefore, the computation cost on the TPA-side in the MD-POR scheme is (1).

Communication.
In the RDC-NC scheme, each of  healthy servers sends an aggregated coded block whose size is ||/ +  to the client (which is ((||/ + ))).After computing  new coded blocks, the client sends them to the new server (which is ((||/ + ))).As a result, the communication cost in the RDC-NC scheme is (( + )(||/ + )).In the NC-Audit scheme, each of  healthy servers also sends an aggregated coded block to the new server (which is ((||/ + ))).Then, the new server sends its linear coefficients which are used to compute  new coded blocks from  provided coded blocks to the TPA (which is ()).Therefore, the communication cost in the NC-Audit scheme is ((||/ + ) + ).In the MD-POR scheme, only each of  healthy servers sends an aggregated coded block to the new server (each coded block has the size ||/ + ).Therefore, the communication cost in the MD-POR scheme is ((||/ + )).
In summary, although the MD-POR scheme supports many heavy features, its cost of the whole scheme is still better than the previous schemes.Let   (),   (), and   () denote the whole computation costs of the RDC-NC, NC-Audit, and MD-POR schemes, respectively.Let   (),   (), and   () denote the whole communication costs of the RDC-NC, NC-Audit, and MD-POR schemes, respectively.Let   (),   (), and   () denote the whole storage costs of the RDC-NC, NC-Audit, and MD-POR schemes, respectively.In reality,  and  are far larger than  and ℎ (,  ≫ , ℎ),  ∈ {1, . . ., ℎ}, and  > .From Table 1, the following results are obtained.

Performance Evaluation
This section evaluates the computation and communication performances of the proposed MD-POR scheme to show that it is applicable for a real system.A program written by Python 2.7.3 is executed using a computer with Intel Core i5 processor, 2.4 GHz, 4 GB of RAM, and Windows 7 64-bit OS.The length of the prime  is set to be 160 bits.The number of clients is set to be 5 ( = 5) which is also the parameter used in the performance evaluation of the InterMac in the paper [16].The number of servers is set to be 10 (ℎ = 10).The number of coded blocks stored in each server is set to be 100 ( = 100).The number of healthy servers which are used for repairing is set to be 3 ( = 3).The size of each file block is set to be 2 23 bits (1MB).Each result is the average of 100 runs.
The experiment results are observed with three sets of computation performance and a set of communication performance by varying the file size of each client.The computation results are depicted in Figure 4 (encode), Figure 5 (check), and Figure 6 (repair).The communication result is depicted in Figure 7 (encode, check, and repair).
Computation Performance.The experiment results reveal that the computation time increases almost linearly as the file size increases, and each graph has a different slope.Only the computation time of TPA-side in the check phase is almost constant.In the encode phase, the slopes of increment in the graphs of client-side and server-side are approximately 0.04 and 0.002, respectively.Therefore, if the file size is 1 GB, the computation time on client-side and server-side is estimated as 41 seconds and 2 seconds, respectively.Note that the encode phase only is executed one time in the beginning; meanwhile, the check phase is executed many times during system lifetime and the repair phase is executed once a corruption is detected in the check phase.Consequently, the check and repair phases are more important than the encode phase.In the check phase, the slopes of increment in the graphs of server-side and TPA-side are approximately 0.0005 and 0, respectively.Therefore, if the file size is 1 GB, the computation time on server-side and TPA-side is estimated as 0.52 seconds and 0.02 seconds, respectively.Similarly, in the repair phase, the slopes of increment in the graphs of healthy server-side and new server-side are approximately 0.0005 and 0.0014, respectively.Therefore, if the file size is 1 GB, the computation time on healthy server-side and new server-side is estimated as 0.52 seconds and 1.47 seconds, respectively.
Communication Performance.The MD-POR scheme is performed with the bandwidth of 300 Mbps.The experiment results reveal that the communication time increases almost linearly as the file size increases, and each graph in Figure 7 has a different slope.The slopes of increment in the graphs of the encode phase, the check phase, and the repair phase are approximately 0.048, 0.008, and 0.006, respectively.Therefore, if the file size is 1 GB, the communication time of the encode phase, check phase, and repair phase is estimated as 49.27 seconds, 7.86 seconds, and 5.83 seconds, respectively.In addition, the size of the response from each server is given as follows.The response size of 50 MB, 75 MB, 100 MB, 125 MB, International Journal of Distributed Sensor Networks

Conclusion and Future Work
In this paper, a network coding-based POR scheme named MD-POR has been proposed.The MD-POR scheme supports multiclient, symmetric key-based direct repair and public authentication features.Moreover, the MD-POR scheme can protect against a strong adversary who can perform mobile attack, curious attack, response forgery, and pollution attack.Furthermore, the efficiency analysis based on the complexity theory shows that although the MD-POR scheme supports many features, its costs are not bad compared with the previous schemes.The experiment results reveal that the computation time increases as the file size increases.However, the graphs show that the slope of increment for the MD-POR scheme increases merely.Future work is invested to implement two previous RDC-NC and NC-Audit schemes in order to compare with the MD-POR scheme.This paper have implemented only the MD-POR scheme to show that its computation cost is applicable for a real system.

Figure 4 :Figure 5 :Figure 6 :
Figure 4: The computation time performance of the encode algorithm.
Figure 7: The communication time performance.and150MB file size is 13 KB, 19 KB, 26 KB, 32 KB, and 38 KB, respectively.Therefore, if the file size is 1 GB, the response size is estimated as 264.87 KB.The above results indicate that the computation and communication performances are very fast even when the file size is 1 GB.