Novel Convolutional Restricted Boltzmann Machine manifold learning inspired dynamic user clustering hybrid precoding for millimeter-wave massive multiple-input multiple-output systems

Millimeter-wave massive multiple-input multiple-output is a key technology in 5G communication system. In particular, the hybrid precoding method has the advantages of being power efficient and less expensive than the full-digital precoding method, so it has attracted more and more attention. The effectiveness of this method in simple systems has been well verified, but its performance is still unknown due to many problems in real communication such as interference from other users and base stations, and users are constantly on the move. In this article, we propose a dynamic user clustering hybrid precoding method in the high-dimensional millimeter-wave multiple-input multiple-output system, which uses low-dimensional manifolds to avoid complicated calculations when there are many antennas. We model each user set as a novel Convolutional Restricted Boltzmann Machine manifold, and the problem is transformed into cluster-oriented multi-manifold learning. The novel Convolutional Restricted Boltzmann Machine manifold learning seeks to learn embedded low-dimensional manifolds through manifold learning in the face of user mobility in clusters. Through proper user clustering, the hybrid precoding is investigated for the sum-rate maximization problem by manifold quasi-conjugate gradient methods. This algorithm avoids the traditional method of processing high-dimensional channel parameters, achieves a high signal-to-noise ratio, and reduces computational complexity. The simulation result table shows that this method can get almost the best summation rate and higher spectral efficiency compared with the traditional method.


Introduction
Millimeter-wave (mmWave) massive MIMO (multipleinput multiple-output) technology gradually becomes a key technology in 5G communication due to its rich spectrum resources. 1-3 Due to the high carrier frequency, mmWave signal suffers from high propagation loss so that large-scale antenna arrays are leveraged for path compensation. 4 However, in massive MIMO system, the number of antennas at the transmitter and receiver is very large, 5 configuring a radio frequency (RF) chain for each antenna in the traditional alldigital solution requires a lot of hardware cost and causes the loss of power. In response to this problem, a hybrid scheme has emerged, considering the reduction of hardware requirements in spectrum efficiency (SE) and energy efficiency (EE). [6][7][8] However, the hybrid precoding scheme in wideband channels is currently a difficult problem to solve.
How to obtain the optimal precoding matrix is the key issue of hybrid precoding. For the case of large-scale antennas in mmWave communication, large-scale matrix calculations are usually required. 8 The difficulty of hybrid precoding is to reduce the complexity of the above situation. 9 Some advanced beam-space-based hybrid precoding algorithms have been studied. 10,11 Previous investigations [12][13][14][15][16] make full use of the sparsity of the beam space channel according to the sparse signal processing scheme. In the literature, 12,13 the problem is transformed into finding the optimal precoder with hybrid structure, and an algorithm based on the basis tracking method is proposed. A hybrid precoding scheme designed is proposed according to the Orthogonal Match Pursuit (OMP) algorithm in the literature, 14,15 which can make full use of channel sparsity. In the multi-user scenario, the low-complexity multi-user hybrid precoding of the mmWave system is studied. 16 A Kronecker decomposition hybrid beamforming (KDHB) method for multi-cell multi-user massive MIMO system based on sparse propagation path is proposed. 17 However, the resolution of the beam space is not infinite. Due to the existence of power leakage, the sparse channel is non-ideal, and there are many possible non-zero terms. Some papers consider hybrid precoding of interfering mmWave channels. 18,19 Dealing with interference is very challenging, because the number of antennas is large, and the high-complexity precoding matrix is difficult to implement. 20 To address the high interference problem, a closed-form broadband hybrid precoding scheme was proposed in the literature. [21][22][23][24] An analytical framework of hybrid beamforming (AFHB) in multi-cell mmWave systems was proposed. 25 A combination of analog and digital beamforming is adopted. The former is based on a phase shifter, and the latter is based on a regularized zero-forcing method.
Recently, scholars have proposed the manifold learning in mmWave massive MIMO systems. Yu et al. 26 proposed a manifold optimization (MO)-based hybrid precoding algorithm with lower complexity. To replace the range of the constant envelope with a circular manifold, Chen 27 proposed a Riemannian conjugate gradient manifold algorithm. In Mai et al., 28 a Riemann vector perturbation manifold for a multi-user massive MIMO system was studied, in which the RF-baseband hybrid precoding was jointly arranged. A Riemannian trust-region Newton manifold (RTRNM) showed an improved method of beamforming in multi-cluster scenarios. 29 The optimization beamforming is utilized to mitigate inter-cell interference by dividing multi-users into multi-clusters with spatial correlation. However, multi-user high-dimensional channels are not mapped into low-dimensional subspaces to achieve dimensionality reduction. Learning a form of a double digital beamforming schemes optimizes the network resource allocation in massive MIMO networks. 30 Moe Thet et al. 31 analyze by greedy algorithm how fast-moving users in static and time-varying user clustering are executed according to the system sum-rate. The manifold learning algorithm is used to reduce the multi-user high-dimensional channels. It reduces the computational complexity while mitigating inter-cell interference-based fully digital beamforming. It focuses on the local linear spatial structure between user channels, and ignores the global spatial characteristics. And it is not possible to quickly analyze the global and local correlations between user channels in the case of moving users.
The traditional user precoding methods are not applicable to multiple users, although they optimize precoding using channel sparsity. In the multi-user scenario, the traditional method suffers from high precoding complexity, high channel dimensionality, and does not consider user mobility. Therefore, it is necessary to propose new algorithms that take into account these important problems in practical communication.
In this article, we propose a low-complexity hybrid precoding algorithm for dynamic user clustering in mmWave mass MIMO systems. Specifically, a large-scale number of antennas embedded in a lowdimensional subspace. The mmWave channel measurement results show that the mmWave has a diffuse scattering phenomenon on the surface of the rough scatterer, and the scattering range will increase as the wavelength decreases. 32 For scenarios where users are dense, when there is not enough space between users, diffuse scattering may cause adjacent users to receive signals of the same path. Therefore, it causes serious inter-user interference. Our goal is to design a mixed precoding matrix, so they manage intra-cell and intercell interference requires a lower channel knowledge, and can be used to achieve low-complexity mixed analog/digital architecture, that is, compared with a small number of RF chains, the number of antennas. In order to solve the set classification problem, a manifold discriminant analysis (MDA) 33 is proposed. Set by each user is modeled as a manifold, we will issue expressed as clustering for multi-manifold learning. The manifold discriminative learning seeks to learn the embedding low-dimensional manifolds, wherein manifolds with different user cluster label better separation of highdimensional partial space of each flow channel in the shape of the correlation is enhanced. Learning by discriminant manifold, the majority of high-dimensional mapping of the channel to a low-dimensional manifold, it is possible to fully utilize the potential of the highdimensional channel spatial correlation. By transforming the non-linear problems of high-dimensional channels into global non-linearities and local nonlinearities, the purpose of dimensionality reduction is achieved. In low-dimensional manifolds, the intracluster channels become more clustered and the separability of embedded features is enhanced. Facing the situation that users will move in different clusters, we introduce the novel Convolutional Restricted Boltzmann Machine (NCRBM) framework into our stream shape learning. By continuously updating the stream shape discriminations in clusters, we get the best state of stream shape learning for users in clusters. Through proper user clustering, the hybrid precoding is investigated for the sum-rate maximization problem by manifold quasi-conjugate gradient methods. 34 To enhance the spectral efficiency of the system, the design of each cluster analog RF precoder should balance the optimizing self-transmission and the interference. The digital precoding matrix is obtained by Karush Kuhn Tucker (KKT). [35][36][37] Compared with the traditional method, the proposed method does not require the solution of large-scale channel parameters, and can achieve a high signal-to-noise ratio (SNR) while reducing the computational complexity. The results show that the algorithm can obtain close to the optimal sumrate and quite high spectral efficiency.
The rest of this article is as follows. Section ''System model and channel model'' introduces the system model and channel model. Section ''User clustering hybrid precoding scheme'' introduces the algorithm for dimensionality reduction and the hybrid precoding algorithm in multi-user high-dimensional channel scenarios. Section ''Simulation results'' presents the simulation results. Section ''Conclusion'' part summarizes this article.

Notations
Upper and lower-case boldface letters represent the matrices and the vectors, respectively. ( Á ) H , ( Á ) À1 , ( Á ) T , ( Á ) Ã , tr( Á ), and jj Á jj F are the Hermitian transpose, inverse, transpose, complex conjugate, trace, and the Frobenius norm of a matrix, respectively. E( Á ) is the expectation. diag( Á ) denotes the diagonal matrix. G j j is the cardinality of the set G. indicates the Kronecker product. CN (0, s 2 ) represents the zero-mean complex Gaussian distribution with zero-mean and the variance s 2 . span(Y ) denotes the subspace spanned by the column vectors of Y .r( Á ) indicates the gradient. Finally, I N denotes the N 3 N identity matrix.

System model
We consider a hybrid mmWave massive MIMO system model consisting of B cells. We assume that a base station (BS) equipped with N t antenna and N RF RF chains (N t ø N RF ø K) serves K single-antenna users, as shown  in Figure 1. To manage the interference and improve the data rate for users, the users are partitioned into L clusters G 1 , . . . , G L with g i = G i j j, P L i = 1 g i = K, and G i \ G i 0 = ;, 8i 6 ¼ i 0 . G i is ith cluster, where i = 1, . . . , L. The sets fG 1 , . . . , G L g are all user clusters.
Let u b, i, k , k = 1, . . . , g i denote the kth user of G i in the bth cell (b = 1, 2, . . . , B). Hybrid precoding consists of two parts: baseband domain digital precoding and radio frequency domain analog precoding. In the downlink system, the transmitted symbol first passes through the digital precoder, and the generated signal is fed back to the radio frequency chain. The output of the RF chain is analog precoded and then sent to the antenna element. The transmitted signal vector x b, i, k at the base station is first precoded with a digital precoding W b, i, k . The resulting signals are fed to analog precoding F b, i, k . The received signal y b, i, k of user u b, i, k can be given by where h b, i, k 2 C N t is the channel vector between the BS and user u b, i, k . x b, i, k 2 C N t represents the transmit signal of user u b, i, k . n b, i, k ;CN (0, s 2 ) is the spatially white additive Gaussian noise. F b, i, k 2 C N t 3 n RF, i is the analog precoding matrix that adaptively steers an n RF, i Àdimensional RF beamspace for the coverage of G i with n RF, i ø g i .
The result of the hybrid method is more accurate than the statistical method, and the method can get faster and more generalized results, but it cannot provide enough accuracy in modeling the intra-cluster angle, which is necessary for beamforming and inter-cluster interference optimization. 28,38 Channel model In order to take advantage of the unique spatial selectivity or scattering characteristics of the mmWave massive MIMO channel, this article adopts the Saleh-Valenzuela (SV) model, 35 where the channel matrix of the user in cluster can be expressed as where N l indicates the number of paths, a l is the complex gain in the lth path, a r (u r, i, l ) and a t (u t, i, l ) are the user and the base station array response vectors, respectively, where u r, i, l indicates the angle of arrival (AoA) at the user, and u t, i, l is the angle of departure (AoD) at the base station. For a simple N-element uniform linear antenna array, the response vector is where l is the wavelength, and d ULA denotes the spacing between antennas. Due to the limited space during scattering of during mmWave propagation, the mmWave massive MIMO channel h i, k is low-rank. Therefore, we can use a limited number of RF chains to obtain nearoptimal throughput.

User clustering hybrid precoding scheme
Our goal is to design a hybrid precoding matrix. Therefore, we must first deal with intra-cluster, intercluster, and inter-cell interference with less known channel knowledge, and second, we need to use little RF chains to complete the hybrid analog/digital architecture, avoiding the high complexity of traditional methods. Next, we propose a hybrid precoding method based on manifold learning to achieve the above goals.

NCRBM manifold learning for user clusters
With the increase of antennas and users in the mmWave massive MIMO system, inter-cell and intracell directional interference will occur during signal transmission. The high-dimensional channel matrix requires high-complexity hybrid analog/digital architectures. By modeling each user set as a manifold, we formulate the problem as clustering-oriented manifold discriminative learning. The undirected similarity graph of multi-users is represented by the graph embedding method. To represent each user set as a manifold, the user channel characteristic graphs f(h i, k , m k, j )g L i = 1 are constructed, as shown in Figure 2.
m 0 i, k 0 ð Þ represents the intra-cluster channel weight function between users k and user j. m z, k, j represents the inter-cluster channel weight function between users k and j. The sets of the cluster channel weight functions are M = fm k, j : k, j 2 (1, . . . , K)g.
The weight function m j, k, j of the intra-cluster is defined as follows The weight function m z, k, j of the inter-cluster is defined as follows The weight functions of the intra-cluster show that when users k and j are the same cluster, the weight is larger; when users k and j are the different cluster, the weight is 0. The weight functions of the inter-cluster show that when users k and j are different cluster, the weight is larger; when users k and j are the same cluster, the weight is 0. The manifold discriminative learning seek to learn the embedding low-dimensional manifolds, where manifolds with labels of different user groups can be separated more conveniently, and the local spatial correlation of the high-dimensional channels within each manifold is enhanced. Some existing manifold learning algorithms, such as Locally Linear Embedding (LLE), 39 cannot retain the complete global non-linear channel structure of user clusters.
We propose to perform the manifold discriminative learning for global dimensionality reduction. The highdimensional channels are mapped in the lowdimensional manifolds, as shown in Figure 3. In order to reveal the potential non-linear manifold structure of high-dimensional channels, intra-cluster graph and inter-cluster graph are constructed using the label information of user characteristics. In addition, it can make the low-dimensional channels more clustered, and enhance the separability of embedded low-dimensional channels. The radio frequency eigen-beamformer is   considered to be the best solution for user group transmission. The channel eigenvector learning corresponding to the maximum eigenvalue is taken as the spatial direction. In theory, the main direction learned is the beamforming. Multi-users of the same cluster have highly correlated transmission paths. We seek to learn a generic mapping A that is defined as where A is projection matrix, h 0 k is the kth user lowdimensional mapping of the high-dimensional channels h k . The original high-dimensional channels h k can be transformed into a low-dimensional channel h 0 k . The relative spatial relationship of neighboring users in high-dimensional channels remains unchanged in lowdimensional manifolds. In order to maintain the manifold structure of the high-dimensional channels, the optimization problem is the projection direction of manifold, that is, 8h k , h j (k 6 ¼ j) of the intra-cluster, the target function of the intra-cluster can be obtained as The projection can maximize the use of all users in the cluster of the intra-cluster as equation (8), where S j, local = H(D j À M j )H T is the local manifold structure of the intra-cluster, and D j is the diagonal matrix and According to the SV model, R i, k = E½h i, k h H i, k is the covariance matrix of the kth user in the ith cluster. The transmission covariance matrix of users in a cluster is the same, so,R i, k , that is where U i, k 2 C N t 3 r i is a matrix of eigenvectors corresponding to r i (r i = N t ) non-zero eigenvalues of R i, k . L i, k is the diagonal matrix whose elements are the nonzero eigenvalues of R i, k 2 C N t 3 N t corresponding to the non-zero eigenvalues, satisfying L k 2 C r 3 r . Since users in the same user cluster have similar spatial correlations, they have similar local scattering, R i = R i, k , 8k 2 G i . Measure of similarity between the user and the similarity criterion is a function of the distance function coefficients. Since span(U ) = UU T , 8U i, k , U i 0 , k 0 , the similarity measurement function between any two users based on the distance of subspace projection matrix can be expressed as where U k is the eigenvectors matrix of R k in any cluster, that is, is the symmetric positive semi-definite matrix that needs to be learned. The global manifold structure S j, global of intracluster is measured as To effectively utilize the global characteristics and local manifold structure of intra-cluster channels, we can get the intra-cluster dispersion h j by combining equations (9) and (11) where y is the constants. The weight functions m j, k, j of the intra-cluster can be obtained as where s 0 is the constants, and d k, j is the similarity measurement function between user k and user j.
In order to maintain the manifold structure of the inter-cluster user channels, the optimization problem is the projection direction of manifold, that is, 8h k , h j (k 6 ¼ j) of the inter-cluster, the objective function of the inter-cluster can be obtained as Therefore, the projection can maximize the use of all users of the inter-cluster, that is where S z, local = H(D z À M z )H T is the local manifold structure of the inter-cluster, and D z is the diagonal matrix and D z = P k6 ¼j M z (k, j). The global inter-cluster S z, global is measured as To effectively utilize the global characteristics and local manifold structure of inter-cluster channels, we can get the inter-cluster dispersion h z by combining equations (15) and (16) where } is the constants. The weight functions m j, k, j of the inter-cluster can be obtained as where s 00 is the constants. After getting the intra-cluster dispersion h j and the inter-cluster dispersion h z , the mobility of the user causes these two values to change continuously. To cope with this problem, we introduce the NCRBM model 40 to acquire optimal value. The NCRBM is a multi-layer network constructed by Convolutional Restricted Boltzmann Machines (CRBMs) stacked on top of each other. NCRBM is an extension of the standard Restricted Boltzmann Machine (RBM) that has inherited all its properties but with faster operation. NCRBM optimize the intra-cluster dispersion value over time. In ith(i = 1, :::, L) cluster of users, let us denote the intra-cluster dispersion of m time interval by , :::, h t 0 + mDt j g is a set of intra-cluster dispersions belonging to ith cluster. And t 0 is the start time, and Dt is the time interval. In addition, we defineh j as the optimal intra-cluster dispersion after optimization. The optimal value is computed by averaging across manifold structure feature over all the intra-cluster dispersions belonging to set h i j . An optimal value is a mean representation of a specific cluster in manifold structure feature space and must be calculated for each cluster separately. The energy of the joint configuration (h i j ,h j ) of the input intra-cluster dispersion and optimized units for an NCRBM with real-valued input units can be defined as follows where Ã stands for the convolution; w q , (h i j ) q m, n , and (h i j ) u, v denote the manifold structure feature detectors horizontally and vertically in filter q, the optimized unit on location (m, n), and the input intra-cluster dispersion unit on location (u, v), respectively. Also, b q is the shared bias among all units in feature map q, and c is the bias value for input intra-cluster dispersion units.
The network assigns a probability to every possible pair of input and optimized unit through this energy function as follows where Z is the partition function.
The optimization of the model parameters can be performed by minimizing the following objective function using the Contrastive Divergence The second term in equation (21) is the sparsity regularization proposed to prevent the model from being overcomplete. For each intra-cluster dispersion, h i j , we haveh i where Q is the number of manifold structure features. Then, the optimal value for each cluster can be computed as followsh NCRBM objective function is comprised of two main parts, for example, generative and optimized parts. Generative objective (the first two terms in equation (24)) is the same as sparsity regularized CRBM, while this way of optimization does not guarantee the best intra-cluster dispersion value. Optimization of the generative part is performed by minimizing Contrastive Divergence. The second two terms in equation (24) correspond to the optimized function, which can be optimized by following a gradient-based process. Minimizing the following objective function can get the optimal intra-cluster dispersion where L is the number of clusters. In equation (24), the third term tries to move the features of each intracluster dispersion in cluster r to its manifold structure M r , while the last term tries to maximize the distance between class-maps M u and M v for a better separation between clusters. The weighting parameters l dis and b dis are used to adjust the amount of contribution of the optimized terms in the overall process. The gradient of the optimized part can be computed exactly at each iteration. For each cluster, we summed the gradient contributions brought by the two components. Based on the definition in equation (23), update of intra-cluster dispersion in each cluster can be performed after optimizations. The limitation of this approach is that too long-time intervals can cause the obtained intra-cluster dispersion to be inaccurate. So, it needs to be run several times, which makes the overall computing time of the algorithm increase.
For inter-cluster dispersion, we use a similar approach to obtain the optimized inter-cluster values h z . In this way, if the users in a cluster move within a certain time, we use the NCRBM model to obtain optimal intra-cluster dispersion and inter-cluster dispersion while maintaining the manifold structure feature within a certain time. However, the overhead of manifold learning increases due to the constant updating of the dispersion values for intra-and inter-cluster users. This is mainly due to the fact that the introduced NCRBM model requires some iterations.
The discriminative function J (A) is transformed as According to equation (27), the low-dimensional mapping of the kth user channel matrix h 0 i, k is determined by the projection matrix A. The Lagrange multipliers are introduced to transform the original optimization problem into a Lagrangian function problem to find the optimal projection matrix A with intracluster and inter-cluster dispersion values. 41 By solving the generalized eigenvalues of the discriminative function, we can obtain the projection matrix A = ½A 1 , . . . , A n . n is the dimensionality reduction of user channel matrix. After user clustering, the channel correlation of users in the same cluster is enhanced.
Then, according to the intra-cluster graph and intercluster graph constructed using the label information of user characteristics, the user clusters can be divided more accurately with lower complexity. Based on the maximum and minimum distances and the weighted likelihood similarity criterion, an optimized spatial fuzzy c-means clustering algorithm is proposed. The algorithm is an iterative optimization that minimizes the cost function defined as follows where d i, k = (1= ffiffi ffi 2 p )tr(c k, i c T k, i ) is the similarity measurement function between the kth user and the ith cluster center. m i, k represents the membership function of user u i, k in the ith cluster, and = is a constant. The parameter = controls the ambiguity of the result partition; in this article, we set = = 2. When assigning high membership values to users u i, k close to the cluster center and assigning low membership values to users u i, k far away from the cluster center, the cost function J (m i, k ) is the smallest. The membership function represents the probability that a user u i, k belongs to a specific cluster. Member functions and cluster center are updated as follows and is the similarity measurement function between the ith cluster center and the i 0 th cluster center. In summary, to represent each user set as a manifold, the process of clustering-oriented manifold discriminative learning is as follows: Step 1: construct the user channel characteristic graphs f(h i, k , m k, j )g L i = 1 .
Step 2: find the two farthest distances U i and U i 0 , and their center as the first user group, that is, The number of the user clusters is i = 2.
Step 3: from the Euclidean distance criterion )tr(c k, i c T k, i ), all users are gathered in the i user cluster.
Step 4: among the i user groups that have completed clustering, find the weakest similarity point in each user group to obtain the i user clusters. Calculate the sum distance d i, k between the user k(k = 1, 2, :::, K), the membership functions m (0) i, k , and the center point V (0) i (i = 1, 2, :::, L) of each user cluster in turn.
Step 5: calculate the spatial membership function and update the center point V (0) i (i = 1, 2, :::, L) of each user cluster with Then, the maximum value among d i, k is found.
All users entering (i + 1) are redivided into different user clusters.
Step 7: is calculated, each user is arranged to the user clusters with the largest similarity coefficient.
Step 8: output cluster result, and the number of users in each cluster.
Step 9: calculate the m j, k, j and m z, k, j according to equations (13) and (18); construct intra-cluster graph and inter-cluster graph using the label information of user characteristics.
Step 11: calculate the h z and h j according to equations (12) and (17).
Step 14: according to the obtained projection matrix, get the projection in low-dimensional subspace h 0 i, k .

Manifold discriminative learning for hybrid precoding
On the basis of manifold discriminative learning for global dimensionality reduction and user clustering, we investigate the sum-rate maximization problem for hybrid precoding. In order to design the precoding matrix F 0 G i W 0 G i , the sum-rate maximization of mixed precoding g is studied, such that they manage intracluster interference and inter-cluster interference. To improve the spectral efficiency of the systems, the design of each cluster analog precoding should balance optimizing self-transmission and interference. By representing each user set as a manifold, the received signal of the cluster can be represented as where y 0 is the inter-cluster interference after the low-dimensional mapping. In order to adapt to special scenarios and requirements, the hybrid precoding matrix can be determined by per-cluster processing (PCP). The goal of PCP is to balance the performance and complexity by effectively separating the clusters in the RF beam domain.
In PCP mode, the analog precoding matrix F 0 G i of each cluster is calculated according to manifold quasi-conjugate gradient algorithm, while the digital precoding matrix W 0 G i is calculated by each user cluster according to their equivalent channel matrix. Let H 0 eq = H 0H F 0 denote the equivalent channel matrix after analog precoding, and it is an approximate block diagonal matrix, which can be expressed as where H 0 eq G i = H 0 G i HF 0 G i represents the diagonal elements of the matrix in equation (32), and off-diagonal elements of the matrix H 0 G i HF 0 G i 0 (i 6 ¼ i 0 ) represents the interference channel matrix between user clusters. After analog precoding, the inter-cluster interference is eliminated, that is, H 0 G i HF 0 G j ' 0. H 0 eq can be expressed as The conditional MSE in equation (31) is simplified as Therefore, the hybrid precoding based on interference leakage is jointly optimized with F 0 G i ,W 0 G i , and b G i . According to the literature, 19 is an unnormalized digital precoding matrix, which can be obtained by KKT conditions as where g À1 G i is the regularization factor, which depends on noise variance and base station transmit power. I G i is The optimal value given in Ayach et al. 12 is g À1 G i = P tol =Ks 2 . P tol is the total power of the transmitted signal. The optimal scaling factor b G i can be obtained from the base station transmission power with Accordingly, equation (38) can be expressed as We solve the following objective function with the choice of symmetric weight B L f, f 1 f 2 , where f 1 , f 2 2 G i are the users in the cluster L min X For the Lth cluster, B L u, f 1 f 2 incurs a penalty if the users are close to each other but in different clusters, whereas B L f, f 1 f 2 encourages users f 1 and f 2 map closer if users f 1 and f 2 are in the same cluster. We define : ð47Þ where N f (F 0 f 1 ) contains the G 0 i Ànearest users sharing the precoding with F 0 f 1 , whereas N u (F 0 f 1 ) contains the G 0 i Ànearest neighbors having different precoding, and t is the parameter of kernel function that follows a Gaussian distribution approximately. 42 Following some simple algebraic steps, equation (45) can be reduced to where (46) can be simplified as Thus, the discriminant embedding space C L can be derived by maximizing the following objective function This is equivalent to find the largest q eigenvalue of the following generalized eigenvalue problem Let the largest q eigenvalues fa 1 , a 2 , . . . , a q g that correspond to the largest q eigenvectors fv 1 , v 2 , . . . , v q g be the solutions of equation (52), which are chosen as the discriminant embedding space C L , where a 1 ø Á Á Á ø a 2 ø a q .
Once we have obtain the discriminant embedding space C L , the most distinguishing precodings are preserved in the most suitable projection space. In the low-dimensional discriminative embedding space, neighboring users of the same cluster approach each other, while preventing users of other clusters from entering the neighborhood.
Local discriminant matrix is defined as follows where a 0 is a scalar parameter. The local discriminant matrix is used as the input of kernel function.
In the general case, the distance in a Grassmann manifold (a particular class of manifolds) is the length of the shortest geodesic connecting two users in lineal subspaces U f 1 and U f 2 , this is where Y = ½t 1 , t 2 , . . . , t k is the principal angles 0 ł t 1 ł t 2 ł Á Á Á ł t k ł p=2.
Having defined projection distance over Grassmann manifold, the distance from user f 1 to f 2 in the discriminant embedding space C L also needs to be taken into account. To cover the intrinsic linear subspace, the user-to-user distance is denoted by When the two distances are formalized as above, we arrive at the following form of user-to-user distance metric In equation (55), the former describes how far away the origins of the two coordinate systems U f 1 and U f 2 in the discriminant embedding space C L , whereas the latter reflects the correlation between two orthonormal basis matrices.
The local discriminant matrix G local is chosen as the input of Kernel function where t is the parameter of the kernel function, a f 1 and a f 2 are the mean vectors of the manifolds where the users are located, and b f 1 and b f 2 are the vector of free parameters of F 0 f 1 and F 0 f 2 . By minimizing the objective function below, we can find a suitable mapping on which the manifolds belonging to the same subspace can be closer and the manifolds in different subspaces can be further apart The above formula can be simplified to where D is a diagonal matrix, this is, establishes a natural metric between different subspaces. To limit z on a fixed scale, we add the following constraint Therefore, the objective function with constraint is obtained as follows Let J (G) represent the objective function. The hybrid precoding optimization problem based on interference leakage under orthogonal constraints is Finally, we can obtain the optimal G by minimizing eigenvalue solution to the generalized eigenvalue problem.
Therefore, solving the objective function can be transformed into a convex optimization problem. The optimal radio frequency precoding matrix F 0 G i is found to obtain a minimum value of e (1) G i and e (2) G i . The manifold algorithm to find the optimal radio frequency precoding matrix F 0 G i is as follows: Step 1: initialize the analog precoding matrix F 0 G i , 1 , error threshold e 2 (0, 1), the discriminant embedding space C L .
Step 2: the learning discriminant embedding space C L can be derived according to equations (45)-(51).
Step 3: find the largest q eigenvalue according to equation (52).
Step 4: compute local discriminant matrix G local according to equation (53).
Step 6: update G to subject to G T D kernel DD T kernel G = 1, and find the optimal radio frequency precoding matrix F 0 G i according to equation (63). Update the analog precoding matrix until convergence to satisfy the error threshold condition, the algorithm ends.
For the intra-cluster, there is correlation between the channels between users. Users in non-adjacent clusters are much smaller than users in adjacent clusters, but the interference is equal. So, the impact of remote user clusters on users in the cluster can be ignored. The SNR of a user cluster G i in bth cell is as follows where and P G i are the transmit power of the G i th cluster, and P G i , k 0 and P G i 0 are the transmit power of the k 0 th user in the ith cluster and the transmit power of the G i 0 th cluster, respectively.
The capacity of mmWave massive MIMO system can be expressed as Equation (65) can be written as Simulation results In this section, we will study the SE and bit error rate (BER) performance of the proposed hybrid precoder. We compare the method in this article with several traditional methods, that is, OMP, KDHB, AFHB, MO, and RTRNM. We also consider the performance of this method for non-mobile users and mobile users.
The basic simulation parameters are as follows.
The carrier frequency is 60 GHz. The AoAs and AoDs are uniformly distributed in ½0, 2p, and a common Angle Spread (AS) D = 8. The complex gain of each path obeys the distribution CN (0, 1). The ULA is adopted in simulations. 26 Among them, there are many overlapping parts between the channel power azimuth spectrum, which causes inter-cluster interference. Figure 4 shows the differences of sum-rate performance with traditional schemes in the mmWave massive MIMO system of hybrid precoding. Let N t = 128,N RF = 32, and K = 32. In Figure 4, we can see that the proposed scheme can achieve a very high sum-rate than other existing precodings against different SNR. The reason for this situation is that traditional methods have not solved the problem of nonlinear resolution of multi-user high-dimensional channels. In our proposed scheme, each user set is represented by manifold, and the cluster-oriented multimanifold learning scheme is used to solve the above problems. Aiming at the high-density hotspot scene of the cell, the geometric model of clustering users is studied. This method can better solve the interference within and between clusters. Through user clustering and hybrid precoding, the achievable summation rate of the mmWave massive MIMO system is improved. Figure 5 shows the differences of the BER performance of different hybrid precoding schemes, where the channel parameters in the single-cell scenario are the same as above. From Figure 5, in the case of different SNRs, a conclusion similar to Figure 4 can be drawn. We can also see that the proposed-based manifold discriminative learning scheme achieve a better BER performance than other schemes. The proposed scheme improves beamspace resolution and reduces the influence of power leakage on beamspace channel. Figure 6 compares the average sum-rate of this article and the traditional scheme, and other existing precodings with different numbers of users. We set n RF, i ø g i in each cluster. It can be seen from Figure 6 that this method is superior to other methods. Due to the continuous increase in the number of users, the traditional solution does not consider the problem of non-linear resolution of multi-user high-dimensional channels. This scheme can well solve the interference in clusters, between clusters and between cells. The   solution proposed in this article can significantly improve the average and rate of mmWave massive MIMO systems. Figure 7 shows the change trend of the system average SNR as the SNR changes. Figure 8 shows the average SE when the BS antenna changes. We can learn from the figure that the proposed method achieves an average SE that is significantly higher than other traditional schemes. From Figure 7 that through the proposed manifold learning scheme, each user and its adjacent high-dimensional channels are in a global and local non-linear neighborhood. Aiming at the highdensity hotspot scene of the community, the geometric model of clustering users is studied. The proposed scheme manages the multi-user and inter-cell interference and improves the data rate for cell-edge users. From Figure 8, the proposed method can effectively and extensively use antennas in multiple lowdimensional manifolds.

Conclusion
A hybrid precoding scheme for user clustering is proposed, which can solve the problem of large-scale mmWave MIMO in multiple low-dimensional manifolds, avoiding the high-dimensional complex operations of traditional schemes. For the BS, mmWave massive MIMO obtains a low-dimensional learning channel matrix by manifold. Then user clustering hybrid precoding is studied for the transmitted signal based on the low-dimensional channel matrix. The manifold discriminative learning seek to learn the embedding low-dimensional subspace, where manifolds with different user group labels are easier to distinguish, and the local spatial correlation of high-dimensional channels in each manifold is enhanced. Through proper user clustering, the hybrid precoding is investigated for the sum-rate maximization problem by manifold quasiconjugate gradient methods. The simulation results show that the method has good robustness on the basis of reducing the computational complexity of the mmWave mass MIMO system.
More realistic precoding for MIMO is expected in the future research. In popular learning after user clustering we just use the traditional method, the choice of user clustering center is crucial, if the user is in high speed of movement will make our method inaccurate. So, more advanced research is needed. In addition, our current method is applied in hybrid precoding of full connections; precoding research is still needed for subconnections as well as dynamic connections.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was supported by the Shanghai Capacity Building Projects in Local Institutions under Grant 19070502900.