Lower limb motor function assessment based on TensorFlow convolutional neural network and kernel entropy component analysis–local tangent space alignment

Motor function assessment of patients and the elderly is crucial to gait assessment and gait rehabilitation. Accuracy of the assessment is affected by clinician’s experience. To solve the problem, this article proposes motor function assessment index to assess the motor function of patients. VICON system collects video of subjects when they are walking. And the original gait videos are pre-processed by the pixel-based adaptive segmenter and extracted by the convolutional neural network. The kernel entropy component analysis and local tangent space alignment reduced the dimensions of extracted features, and motor function assessment index is obtained. The Pearson correlation analysis shows that the motor function assessment index and modified gait abnormality rating scale are significantly correlated, and Pearson correlation coefficient is 0.92. These effectiveness results demonstrate that the proposed method has the considerable potential to promote the future design of automatic motor function assessment for clinical rehabilitation research.


Introduction
The decline of lower limb motor function greatly reduces the elderly and the patients' quality of life. Traditionally, the estimation of motor function is mostly based on the international classification of functioning, disability and health (ICF) and lower extremity motor score (LEMS). 1,2 However, these assessments are subjective and time-consuming, and their accuracy depends on the clinicians' experience. Therefore, it is necessary to develop a convenient and objective quantitative assessment of motor function.
In the past few years, advanced measuring techniques have made it easier to collect the kinematic, kinetic, and gait video data during walking. 3,4 It greatly facilitates the development of motor function assessment. Rozumalski and Schwartz 5 and Schwartz and Rozumalski 6 used three-dimensional gait information to evaluate lower limb motor function of cerebral palsy patients and established quantitative evaluation indexes. Gouelle et al. 7 collected eight gait spatiotemporal parameters from patients with Friedrich's ataxia and developed gait variability index (GVI) to assess motor function. Ma and Liao 8 collected ground reaction forces and presented abnormal evaluation index (AEI) which used a semi-Markov process. Rahmati et al. 9 used lower limb surface electromyographic (sEMG) signals of cerebral palsy patients to establish synergy comprehensive assessment (SCA) for quantitative evaluation motor dysfunction. Modified gait abnormality rating scale (GARS-M) is a semisubjective assessment method based on video, which evaluates seven gait characteristics during human walking, including general performance; lower limb movements; and the trunk, head, and upper limb movements. 10 However, these assessment were usually calculated based on one kind of data source (e.g. kinematic data), which could only summarize a subject's gait characteristics from a single aspect. And it might lead to a one-sided assessment result. Some researchers built assessment models based on multisource data. Soran et al. 11 collected natural behavior videos of children with spinal muscular atrophy (SMA) and used regression convolutional neural network (RCNN) to analyze their behavioral videos. D'Souza et al. 12 established a quantitative multiple sclerosis assessment system using deep video motion data, and the effectiveness of the system is verified by clinical trials.
In recent years, CNN has achieved great results in medical image diagnosis, video target detection, and behavior recognition areas. [13][14][15] Deep convolution layers extracted more abstract features and got more effective recognition. [16][17][18][19] Many previous studies showed that CNN had good feature extraction capabilities and revealed hidden and complicated features in the spatial domain. 20,21 To avoid subjective effects, TensorFlow convolutional neural network (TF-CNN) is used to extract features from gait video. Kernel entropy component analysis (KECA)-local tangent space alignment (LTSA) reduces redundant features and got motor function assessment index (MFAI). Compared with traditional CNN, TF-CNN reduces the training time of CNN and improves the accuracy. KECA-LTSA is more efficient than linear discriminate analysis (LDA) and principal component analysis (PCA) in nonlinear data extraction.
The article is structured as follows. Section ''Materials and methods'' introduces the process of data collection and describes the new assessment based on improved CNN and KECA-LTSA. Section ''Experimental results'' shows the experiment results of proposed algorithm. Conclusion and future work are reported in section ''Conclusion and future work.''

Overview of the proposed method
As shown in Figure 1, this assessment system consists of data collection, data feature extraction, and motor function assessment. In the first stage, VICON system collects video data. Then, pixel-based adaptive segmenter (PBAS) extracts the silhouette images from video and TF-CNN extracts feature from images. KECA-LTSA reduced redundant features and MFAI is proposed.
Classification of subjects. All subjects of this article are divided into three groups by clinician, including middleage stroke group (MG, 25 subjects), elderly stroke group (EG, 25 subjects), and young healthy group (YG, 40 subjects). GARS-M is used to assess the motor function of all subjects preliminary by clinicians. The GARS-M scores range from 0 to 21. Table 1 lists the conditions of the three group subjects, including age, gender, physical condition, and the GARS-M score. The MG group consists of subjects with motor dysfunction aged between 40 and 59 years with a GARS-M score of 3-9. The EG group consists of subjects with motor dysfunction aged between 60 and 70 years with a GARS-M score of 8-14. Both the EG and MG groups are the test group in this study. YG group is considered as reference group. The YG group consists of subjects with motor dysfunction aged between 20 and 39 years. Because there is no GARS-M score in young healthy group, the GARS-M score in YG group is 0. All the subjects could complete the data collection without any help or assistance.
Motor data collection. VICON Motion Capture System (by Oxford Metrics Limited, UK) is used to collected gait video. It consists of six cameras, signal capture equipment, and data generation software. Full image measurement frequency of cameras is 1000 Hz. In this study, the measurement frequency is conducted at 200 Hz. As shown in Figure 2, VICON captures the video of subjects when they are walking, while the silhouette images of video are shown in Figure 3.

Data feature extraction
As shown in Figure 4, feature extraction could be summarized as three steps 1. Subjects' images are divided into train set and test set, whose ratio is 8-2. 2. CNN model parameters are pre-trained on the ImageNet and used as the initialization parameters of the network. The trained CNN model extracts features from gait silhouettes images. 3. The gait features are fused serially to obtain feature-vector.
Structure of CNN. The trained CNN is used as a feature extractor to establish the motor function assessment model. VGG-16 has a simple structure but performed a high accuracy on ImageNet. And transfer learning and fine-tuning show strong generalization ability on VGG-16 from big data set to small data set. As shown in Figure 5, CNN structure is modified from VGG-16. This network has seven sequentially connected layers, including input layer (input image size is 120 3 150 pixel), five convolution blocks, two full connection layers, and a output layer. Each convolution block consists of two or three convolution layers and one maxpooling layer. The output of Block5 is flattened into   full connection layer vector. FC1 connection layer is a 256-dimensional vector and FC2 connection layer is used to gait recognition by softmax function. Table 2 shows the structure of CNN. C represents convolution layer, R represents nonlinear activation function (RELU), P represents maximum pooling layer, F represents full connection layer, D represents Dropout (probability is 0.5), and S represents activation function. And the number of trainable parameters at each layer is given in Table 2.
Transfer learning and fine-tuning. Transfer learning is different from the traditional CNN training method. It could improve the performance of learning by transferring knowledge from several source domains to a target domain. 22 In this study, CNN structure is trained by ImageNet data set, which includes one million pictures, to get parameters. Then, the pre-trained parameters are used as the initial parameters of Block1-Block5 layers, and the parameters of FC6 and FC7 are randomly initialized. During the training, the gait image data set is used to fine-tune and train the parameters of FC6 and FC7. Although there is a big difference between the images of ImageNet database and the gait video image of this study, pre-training weight is better than the random weight because the initialization of random weight has a high chance to fit the training data.

Motor function assessment
Feature normalization. The Gaussian feature normalization is used to process the features, so that the indicators of different magnitudes can be uniformly reduced to remove the correlation, and the local influence of different types of features is guaranteed. Combining the extracted features into a fusion matrix G ¼ ½g 1 ; Á Á Á ; g i ; Á Á Á g N m 3 N , which m is the characteristic dimension of the fusion, N is the number of experimental objects, g i is the fusion feature vector of the first experimental object, and the average value m i and  standard deviation s i of each column of the feature components are in G, g i using the following equation (1) normalized so that most of the values are in the range [0,1], and the normalized feature matrix is equation (2) G FV ¼ ½g FV 1 ; :::; g FV i ; :: Kernel entropy component analysis. KECA puts data from low-dimensional space to high-dimensional space and selects features by the entropy contribution (EC). It can be described as where V _ ðpÞ is an estimator with Renyi entropy, x is the process data, K KECA is the kernel matrix, and e ¼ 1; 1; :::; 1 ½ T . K KECA can be represented by kernel matrix eigenvalues and eigenvectors where D is the eigenvalue matrix of features, D ¼ diagðl 1 ; :::; l N Þ. E is the eigenvectors matrix, E ¼ ðe 1 ; :::; e N Þ. So V _ ðpÞ can be rewritten as where ð ffiffiffiffi l i p e T i 1Þ 2 is the Renyi EC. The elements which contribute most span the subspace.
LTSA. LTSA uses the tangent space of the neighborhood of the sample set to represent the local geometric features and uses those features to reduce the dimensions of data. A sample set X ¼ ½x 1 ; :::; x N , first LTSA looked for neighborhoods of x 1 and got a patch X n ¼ ½x n1 ; :::; x nk x n ðn ¼ 1; :::; N Þ. X n is centralized X n ¼ ½x n1 À x n ; :::; x nK À x n ; x n À x n ð 6Þ where x n ¼ ½1=ðK + 1Þð P K k¼1 x nk + x n Þ is the center of the patch. AndX n can be rewritten aŝ where O n ¼ P n V T n , the first d column vectors of U n compose U n;1 , and O n;1 are the eigenvectors of U n;1 . And T is defined as where l max is the maximum value of l and I is the unit matrix. Kernel matrix K LTSA is Therefore, the final kernel matrix is where a is the balance parameter, K KECAÀLTSA can be represented by kernel matrix eigenvalues l d ¼ l 1 ; l 2 ; :::; l d ½ and eigenvectors v d ¼ ½v 1 ; v 2 ; :::; v d .  Architecture The deviation between other subjects and standard features is

MFAI
The MFAI of all subjects is obtained as follows The gait deviation index of any of the subjects in the reference group YG is The MEAN(Z) and S.D.(Z) represent the average of Z group and standard deviation of Z group. Zscore definition is The MFAI of all the subjects in the experiment is obtained as And a percentage system evaluation index is Experimental results

CNN analysis
The training results of CNN using transfer learning are shown in Figure 6. In the figure, acc and loss represent the training set accuracy rate and the loss value, and val_acc and val_loss represent the accuracy and loss values of the test set. The results of TF-CNN are shown in Figure 7. It shows that transfer learning and finetuning training CNN converge faster than transfer learning CNN. As shown in Table 3, TF-CNN is better than CNN. And after fine-tuned, TF-CNN's accuracy is improved. Therefore, this article uses transfer learning and finetuning to train convolutional networks in the network training process.

Feature extraction analysis
The multi-feature fusion method used in this article avoided the randomness of single feature parameters, but the feature parameters increases exponentially. To solve this problem, the redundant information of the fusion matrix is removed by KECA-LTSA. Radial basis function (RBF) kernel function is used as the kernel function of KECA and the kernel width parameter d ¼ 0:3, and balance parameter a ¼ 0:7. Kernel function of kernel principal component analysis (KPCA) is also the RBF kernel function and the kernel width parameter d ¼ 0:5.
It can be seen that when the EC is 97%, the principal subspace is 30 dimensions. And when the variance contribution (VC) is 97%, it is 45 dimensions. So KECA-LTSA can cover the original data information with fewer feature dimensions. Table 4 shows the eigenvalues and cumulative EC ratios of the first 10 principal

MFAI indicators and statistical analysis
Using the first 30 elements extracted by KECA-LTSA as the final feature vector, the subspace of the feature extraction is constructed, and the lower limb dyskinesia evaluation index (MFAI) is obtained.
The MFAI scores of each group extracted by KPCA are shown in Figure 8. YG scores range from 95 to 123. MG scores range from 95 to 83. EG scores range from 94 to 87. It can be seen that YG group can be distinguished from MG and EG groups, but it is difficult to classify MG and EG groups.
The MFAI scores extracted by KECA are shown in Figure 9. YG scores range from 90 to 116. MG scores range from 50 to 90. EG scores range from 43 to 66. KECA performance is better than KPCA. MG and EG groups' distance is increased.
The MFAI scores extracted by KECA-LTSA are shown in Figure 10. YG group's MFAI scores range from 91 to 112. MG group's MFAI scores range from 86 to 72. EG group's MFAI scores range from 61 to 40. This method has the best classification, and there is large distance between different groups and small distance within group.

Correlation analysis between MFAI index and GARS-M
GARS-M score of subjects with dysfunction (i.e. MG and EG groups), which evaluated by physicians, and    proposed MFAI are analyzed with Pearson correlation coefficient. Figure 11 shows the linear regression analysis of the MFAI scores extracted by KECA-LTSA and GARS-M. The Pearson correlation coefficient is r = 0.92. It can verify that the effectiveness of the MFAI.

Conclusion and future work
This article proposes a novel lower limb motor function assessment. First, VICON system collects subjects' gait video data. CNN is pre-trained by ImageNet and extracts features from gait images. Compared with normal training method, transfer learning and fine-tuning could make CNN converge faster and have higher accuracy. Then, feature data are fused into feature matrix. KECA-LTSA is used to decrease the dimensions of the feature matrix. Compared with the KPCA and KECA, KECA-LTSA decreases more dimensions and retains effective information of original data, and the MFAI results shows that classification of KECA-LTSA is better than KPCA and KECA. And the correlation coefficient between MFAI and GARS-M scores is 0.92, which proves the effectiveness of the method.
In this article, we only studies how to evaluate dysfunction of lower limb, which we think affects human's motion function directly. However, some diseases and dysfunction of other parts of body could affect lower limb's function, such as diabetes and obesity. In the future work, this research can be extended to evaluate more people who have complicated healthy conditions to further improve the model.

Author contributions
Y.Z. was involved in conceptualization, methodology, and writing-review and editing. S.L. was involved in data curation, investigation, and writing-original draft. Y.Z. was involved in writing-review and editing. J.L. was involved in data curation and writing-review and editing.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.