A Low-Complexity Block Diagonalization Algorithm for MU-MIMO Two-Way Relay Systems with Complex Lattice Reduction

We design a scheme of precoding matrices for two-way multiuser relay systems, where a multiantenna relay station (RS) operating in an amplify-and-forward model simultaneously receives information from all multiple-antenna users. Considering the feasibility in mathematical analysis, users are distributed in two symmetrical groups. To reduce the complexity of proposed precoding scheme, we employ the QR decomposition and complex lattice reduction (CLR) transform to replace the two times singular value decomposition (SVD) of conventional BD-based precoding algorithm by introducing a combined channel inversion to eliminate the multiple users interference (MUI). Simulation and performance analysis demonstrate that the proposed LR-MMSE algorithm has not only a better bit error rate (BER) performance, a higher sum-rate, and simple architecture, but also 89.8% and 35.5% less complexity compared to BD- and MMSE-based scheme.


Introduction
Two-way relaying (TWR) cooperation has attracted considerable attention due to the high spectral efficiency. A typical model of the TWR protocol with multiple access channel (MAC) phase and broadcast channel (BC) phase has been investigated in [1]. In the MAC phase, two source nodes transmit their message to the relay node simultaneously. After processing the mixed received signal at the relay, a combined version of the received signal is broadcast to each source node in the BC phase.
Among the numerous two-way relaying protocols, such as amplify-and-forward (AF) and decode-and forward [2], AF protocol gets favored as one or more low-complexity relay nodes are adopted to assist the communication between sources and destinations without decoding the signals.
In wireless sensor networks (WSNs), a large number of static or mobile sensors cooperate to perceive, compute, and transmit message assistant by the relay [3]. Since sensor nodes are usually operated by lightweight batteries that are difficult to replace or recharge, energy becomes one of the most crucial resources in WSNs. Moreover, experimental measurements have shown that the energy consumption of a sensor node is dominated by communications [4]. Thus, a simple and low-complexity transceiver model for sensors is a powerful solution to solve this problem. Recently, a prominent research in [5] has proposed a multilevel physical layer network coding to rule the rate of sensors, which saves the power of networks potentially.
As we known, multiple-input multiple-output (MIMO) relays potentially obtain both spectral efficiency and link reliability by exploiting the multiple antennas, and, jointing the TWR technology, it can improve the system performance dramatically. Recently, the precoding design for TWR MIMO systems is extended to multiuser cases, which can be roughly divided into two categories: symmetric systems and asymmetric systems [6][7][8][9][10][11]. In symmetric system, all users are supposed to be distributed in two groups in the form of pairings [6][7][8]. In asymmetric systems, a based station exchanges messages with multiple users [9][10][11], which is a typical scenario of cellular networks.
Unlike the received signals in single-user MIMO (SU-MIMO) systems, the received signals of different users in multiuser MIMO (MU-MIMO) systems not only suffer 2 International Journal of Distributed Sensor Networks from the noise and intra-antenna interference but are also disturbed by the multiuser interference (MUI). However, when we consider two-way relaying multiuser MIMO (TWR MU-MIMO) systems, the MUI becomes more complicated, since there are two groups of users mixing in the received signals.
Channel inversion strategies based on zero forcing (ZF) and minimum mean squared error (MMSE) can be used to cancel the MUI but result in amplifying the noise [12,13]. Block diagonalization (BD) has been proposed in [14,15] to improve the system throughput and provide convenience for the power control. Literature [16] studies the precoding scheme of the singular value decomposition (SVD) and employs a technology of channels aligning referring to [17]. However, the complexity of SVD is very high when the number of users and the specification of the antenna are large. In order to reduce the complexity, generalized singular value decomposition (GSVD) has been applied in [18] to design the downlink of MIMO systems. Unfortunately, there is also a great performance loss at low signal to noise ratios (SNRs) when the noise is the dominant factor. In recent research, a systematic scheme is proposed in [7], which imposes a power constraint at the relay node and obtains an approximate result by fixing the transmitting and receiving matrices at the user nodes firstly. In [19], a low complexity precoding algorithm is proposed to reduce the condition number of the effective number by introducing a complex lattice reduction (CLR) transform in one-way relay MU-MIMO systems.
In this paper, we focus on AF TWR MU-MIMO systems and strive to find a lower complexity and better performance precoding scheme. By employing a combined channel inversion and replacing the SVD with QR decomposition, we can cancel the MUI remarkably. Then, the CLR transform is used to design the precoding matrices of users which locate in another group and is in pairs of receiving users.
The rest of the paper is organized as follows. Section 2 describes the conventional system model. Section 3 introduces the BD-based precoding algorithm. The scheme LR-MMSE is proposed in Section 4. Performance analysis and simulation are given in Sections 5 and 6. Finally, we conclude this work in Section 7.
Notation. Throughout this paper, for a matrix A, Tr{A}, A , A , A −1 , and A † denote the trace, transpose, complex conjugate transpose, inverse, and pseudoinverse, respectively. " " stands for the expectation of a random variable; the terms C × represent the ( × )-dimensional space with complexvalued elements. The notations ‖A‖ and ‖A‖ 2 denote Frobenius-norm and 2-norm of matrix A. I × is a -byidentity matrix; 0 is a -dimensional zero matrix.

General System Model
Investigate an uncoded TWR MU-MIMO system, with pairs of user equipment (UE) and one two-way relay station (RS) which operates in AF protocol. For simplicity, we consider a symmetric model in which users are divided into two groups: ∈ { , }. The th user in group and the relay has equipped , and antennas, respectively. A block diagram of such a system is shown in Figure 1.
Ruled that there is no direct link between two users and all the channels experience independent and flat MIMO fading. Let d , ( ) denote the data symbol at time for the th user in group (UE , ). The UE , performs transmit precoding beamforming with vector P , ∈ C × , and transmits the signal in the following form of matrix: where d , ( ) ∈ C , ×1 and satisfies [d , ( )d , ( )] = 2 n , ⋅ I , . We assume the average power of the transmit signal is 1. Then, users UE and UE exchange their signals by the assisted relay node RS in two phases. Firstly, in the multiple access channel (MAC) phase, UE and UE transmit their signals simultaneously to relay and the received signals at RS are given by where H , ∈ C × , is the channel coefficient coming from UE , to RS and n ( ) ∈ C ×1 is a zero-mean addictive white Gaussian noise (AWGN) at the relay and [n ( )n ( )] = 2 n I . In the broadcast (the broadcast channel) phase, the RS forwards and broadcasts the mixed signals s( ) ∈ C ×1 to all user by multiplying a forward matrix f ∈ C × , where s ( ) = fr ( ) . ( It is worth noting that the f is an assistant precoding matrix for decoding the signals at receiver sites. Since it is supposed that channel reciprocity (This assumption is reasonable when we consider time division duplex (TDD) systems. Channels in uplink and downlink are static and identical in the frequency domain.) is met within two phases, the received signals y , ( ) ∈ C , ×1 at the user UE , side are International Journal of Distributed Sensor Networks 3 where n , ( ) ∈ C , ×1 is also a zero-mean AWGN at the UE , and [n , ( )n , ( )] = 2 I , . The UE , combines its own received signal (4) by using a received decoding vector g , ∈ C 1× , to get the estimatê whered̂, ( ) denotes the information coming from UÊ, , the subscripts and̂∈ { , }, and ̸ =̂represents the user pairings. For notational convenience, the time index is henceforth wiped off. Refer to Figure 2(a), which shows the overall procedure for transmit-and receive-beamforming and relaying of multiuser two-way systems. Next, designs of the scheme of precoding matrices are given with the criteria of BD, ZF, and MMSE. Further, the lattice reduction is applied to suppress the bit error rate (BER) of the system. As a result, we can omit the decoding matrix to obtain a simple structure.

Generalized Design of BD-Based Precoding Algorithm
The BD-and SVD-based precoding algorithms are proposed in [10] and [12], respectively. Combining the two ideas, the BD-based scheme can be given as follows.
The received signals at the users UE , and UÊ, sides can be written asd̂, The detail components ofd̂, arê The 1st item can be dismissed by the perfect selfinterference suppression, and the key in the rest of processing is to eliminate MUI and obtain a good performance of the signal estimation.
Based on the system model, the combined channel matrix is given by In order to eliminate the MUI in (7), we impose the BD constraint in which Ĥ, f̂, = 0 ∀ ∈ {1, . . . , } .
Then, the second SVD operation is used to obtain the precoding and decoding matrix. We have Finally, the user UÊ, 's precoding matrix P̂, = V̂, Ω̂, and the decoding matrix is obtained as g , = Û, , where Ω̂, is a diagonal matrix and its elements stand for the transmit power allocation. To guarantee that the V , is not a null matrix, we must have > . Note that we omit the dimension matching matrices in (11) which will be mentioned in the next section.

The Proposed LR-MMSE Precoding Algorithms
In this section, we describe the proposed precoding algorithms based on a strategy that employs a combined channel inversion method, QR decompositions, and lattice reductions in detail. We can solve the questions with ZF and MMSE criteria. For a better BER performance, the complex lattice reduction is proposed to improve the MMSE performance.

ZF-Based Design.
The ZF-based design of precoding matrix termed as ZF can be performed in two steps.
Step 1. It is to obtain the relay forward matrix f , ,zf which eliminates the MUI completely by a QR decomposition of Ĥ, .
Firstly, by using the ZF inversion to the combined channel matrix H, we have where H † zf is the pseudoinverse of matrix H zf and Ĥ, ,zf ∈ Ĉ, × is the submatrix of H † zf . Note that H zf H † zf = I [12]. Thus, the off-diagonal block matrices of HH † zf are zero and Ĥ, ,zf is in the null space of Ĥ, ; that is, Imposing the QR decomposition on Ĥ, ,zf = Q̂, ,zf R̂, ,zf , we have where Q̂, ,zf ∈ Ĉ, × is an orthogonal matrix. Then, substituting f 1 , into (7), the 2nd and 3rd items will be equal to zeros.
Step 2. It is to obtain the user precoding and receive decoding matrices p̂, ,zf and g , ,zf by a SVD of the effective channel.
Note that the degree of freedom between H , and Ĥ, is × , we match the dimensions with the help of Π̂, ∈ C ×̂, , and ‖Π̂, ‖ 2 = 1. Then, substituting f 1 , and Π̂, into the 4th item ofd̂, , we have the effective channel H eff , ,zf = H , Π̂, Q̂, ,zf Ĥ, , where H eff , ,zf ∈ C , ×̂, . The ZF algorithm can be completed by applying the SVD operation to the matrix H eff , ,zf = Û, ,zf Λ̂, ,zf V̂, ,zf . Then, the user precoding and receive decoding matrices p̂, ,zf and g , ,zf are designed by p̂, ,zf = V̂, ,zf Ω̂, and g , ,zf = Û, ,zf , respectively. (p̂, ,zf denotes the precoding processing at the site of UÊ, and g , ,zf denotes the receiving processing at the site of UE , . This type of subscript contributes to distinguishing the two matrices.) Finally, we get the systematic ZF precoding scheme for users.

MMSE-Based
where Ĥ, ,mmse ∈ Ĉ, × is the submatrix of pseudoinverse matrix H † mmse and = . Note that the regularization factor approaches zero when the SNR is high, and thus we have HH † mmse ≈ I [12]. This means that those off-diagonal block matrices of HH † mmse converge to zeros with the increase in SNR. Hence, the matrix Ĥ, ,mmse is approximately in the null space of Ĥ, ; that is, Ĥ, ,mmse Ĥ, ≈ 0.
Then, the MMSE-BD algorithm also can be completed by applying the SVD operation to the effective channel matrix The user precoding and receive decoding beamforming matrices of MMSE are obtained as P̂, ,mmse = V̂, ,mmse Ω̂, , g , ,mmse = Û, ,mmse .
International Journal of Distributed Sensor Networks

LR-MMSE-Based Design.
In this subsection, LR is applied to reduce the dimension of the effective channel which replaces SVD in MMSE algorithm. For convenience, we term it as LR-MMSE (see Figure 2(b)). A powerful and famous reduction criterion for arbitrary lattice dimensions was introduced by Lenstra et al. in [20], and the algorithm they proposed is known as the LLL (or L 3 ) algorithm [21,22]. In order to reduce the complexity, a complex LLL (CLLL) algorithm was in [23], which reduces the overall complexity of the LLL algorithm by nearly half without sacrificing any performance. Through reducing the size of channels, CLLL can obtain better BER performance. In this paper, we employ the CLLL algorithm to implement the LR transform.
After the first MMSE precoding, we transform the MU-MIMO channel into parallel or approximately parallel SU-MIMO channels and the effective channel matrix for the UE , is H eff , ,mmse = H , Π̂, Q̂, ,mmse Ĥ, .
We perform the LR transformation on H eff , ,mmse in the precoding scenario [24]; that is, where T̂, is a unimodular matrix with det |T̂, | = 1 and all elements of T̂, are complex integers. The unimodular feature of T̂, guarantees that the energy will not change through the LR transform. The MMSE precoding is actually equivalent to the ZF precoding with respect to an extended system model [25]. The extended channel matrix H for the MMSE precoding scheme is defined as By introducing the MMSE method, a trade-off between the level of MUI and the noise is introduced (see also [12]). Then, the LR-M-D precoding filter is designed as where T̂, is the unimodular matrix for H eff , ,mmse . Then, the LR-MMSE relay forwarding filter is given by  Since the lattice reduced precoding matrix P has near orthogonal columns, the required transmit power will be reduced compared to the MMSE precoding algorithms. Thus, a better BER performance than that of the BD precoding algorithms can be achieved by the proposed LR-MMSE precoding algorithm. The left work for receiver is to quantize the signalsd to the nearest vectors. The LR-MMSE algorithm is summarized in Algorithm 2.

Simulation and Performance Analysis
In this section, we analyze the proposed LR-MMSE algorithm in terms of computational complexity, architecture, BER, and achievable rates.

Computational Complexity Analysis.
In this section, we measure the computational complexity of the precoding algorithms we have introduced by the total number of floating point operations Per Second (FLOPS). Note that the LR algorithm has complex variable, and the average complexity of CLLL algorithm has been given in [26] by FLOPS. The number of FLOPS for the complex QR decomposition and the real SVD operation are given in [27]. Moreover, the FLOPS number in a × complex SVD operation is equivalent to its extended 2 × 2 real matrix. The total FLOPS number required by the matrix operations is summarized below: (i) multiplication of × and × complex matrices: 8 − 2 ; (ii) QR decomposition of a × ( ≤ ) complex matrix: (iii) SVD of a × ( ≤ ) complex matrix by only obtaining Λ and V: 32( 2 + 2 3 ); (iv) SVD of a × ( ≤ ) complex matrix by obtaining U and Λ and V: 8(4 2 + 8 2 + 9 3 ); (v) inversion of a × real matrix by Gauss-Jordan elimination: 4 3 /3; (vi) inversion of a × complex matrix by Gauss-Jordan elimination: The FLOPS of three types of precoding schemes are shown in Tables 1, 2, and 3, respectively. We assume the system is composed of 6 users ( = 3) equipped 2 antennas and one relay station equipped 12 antennas. From Table 1, apparently, we know that the complexity of SVD is very large, resulting in the BD's complexity reaching a level of million. On the contrary, the combined channel inversion needs only 22805.4 FLOPS which operates in merely one time. Meanwhile, the LR-MMSE can save 35% of FLOPS of MMSE, while MMSE has saved 89.8% of FLOPS of BD.
Next, we fix the antennas of users and reveal the low complexity of LR-MMSE's property by enlarging the number of users. Ruling = , we can obtain the performance comparison in Figure 3. Figure 3 shows that the complexity of BD-type precoding scheme grows rapidly with the increase in , while it grows gently with respect to MMSE and LR-MMSE. This phenomenon is due to the significant influence of dimensions with regards to SVD. Note that the QR decomposition also needs to be implemented in times, though there is only one time of the combined channel inversion. But, the QR decomposition is simpler than SVD.

Performance Analysis in Architecture.
Compared to the BD and MMSE scheme, the LR-MMSE has a simple architecture. At the site of each user, the decoding matrix can be omitted, since the precoding processing in formula (31) has directly decoded the effective channel. This outperformance can contribute to saving a significant power of mobile users, even the relay stations. Moreover, it is worth noting that we need not employ the self-interference eliminating technology,  114432 Total 206389 Table 3: Computational complexity of LR-MMSE in case (6,2,12).

Steps
Operations FLOPS Number due to the combined channel inversion having included the channel information of the receiver.

BER Performance Analysis.
Recall that, with the increase in SNR, we have Q̂, ,mmse Ĥ, ≈ 0 in the LR-MMSE algorithm. Thus, the MU-MIMO channel is approximately decoupled into equivalent SU-MIMO channel. In most schemes of considering the channel noise, RBD is a common method. A result is shown in [28], which is like formula (33), which is not converged to zeros. Comparing the above two characters, LR-MMSE has more accurate receiving. As we known, the condition number is a measurement of receiving error bits. In virtue of the defined condition number in [23], the channel matrix can be detected with respect to orthogonality. Taking an example in a system of linear equations, There are three cases for different kinds of matrices: (i) when the channel matrix is an orthogonal matrix, the condition number is 1; x will change a little with tiny changing b; (ii) when it is a singular matrix, the condition number is ∞; x will change significantly with tiny changing b; even b has not changed; (iii) when it is a nonsingular but not orthogonal matrix, the condition number will be large, and x changes between the above two cases.
As can be seen in Figure 4, the LR-MMSE has a smaller average cond(H) of the effective channel compared to MMSE and BD. Thus, a significant power reduction and better BER performance are obtained.

Simulation Results
Considering a specification of antenna with = 8, = 10, and , = 2, six users are divided into two symmetrical groups. The perfect channel scenario is applied in which = 1 and all channels' elements are fetched from the complex Gaussian process. For highlighting the performance of the proposed algorithm, we allocate the transmit power equally to each user. Figure 5 shows that the LR-MMSE algorithm has the best sum-rate performance when MMSE and BD have a similar achievable sum-rate. In particular, when the SNR is 30 dB, the proposed algorithm has a 3.208 bit/s/Hz gain compared to the BD algorithm. Since the low SNR will affect the diagonality after the precoding processing in MMSE scheme, there is an inevitable performance loss. However, by virtue of the size reduction of CLLL transform, the LR-MMSE has obtained a performance compensation.
International Journal of Distributed Sensor Networks Additionally, in the process of simulating, we discover that choosing the optimal matching matrix Π can achieve nearly 1 bit/s/Hz gain. In this work, we did not study the optimization of power and matching matrices.

Conclusion
In this work, we have proposed a low complexity precoding scheme of two-way MU-MIMO relay systems with QR decomposition and complex LLL transform instead of two times SVD. For simplicity, users are distributed in two symmetrical groups. The performance analysis show that the LR-MMSE algorithm has not only a low complexity, but also a better BER, a simple structure at the site of the receiving user, and a theoretical maximum. Finally, a higher achievable sum-rate of the system is confirmed by simulation.