Fast battery capacity estimation using convolutional neural networks

Lithium-ion batteries have been widely used in electric vehicles, smart grids and many other applications as energy storage devices, for which the aging assessment is crucial to guarantee their safe and reliable operation. The battery capacity is a popular indicator for assessing the battery aging, however, its accurate estimation is challenging due to a range of time-varying situation-dependent internal and external factors. Traditional simplified models and machine learning tools are difficult to capture these characteristics. As a class of deep neural networks, the convolutional neural network (CNN) is powerful to capture hidden information from a huge amount of input data, making it an ideal tool for battery capacity estimation. This paper proposes a CNN-based battery capacity estimation method, which can accurately estimate the battery capacity using limited available measurements, without resorting to other offline information. Further, the proposed method only requires partial charging segment of voltage, current and temperature curves, making it possible to achieve fast online health monitoring. The partial charging curves have a fixed length of 225 consecutive points and a flexible starting point, thereby short-term charging data of the battery charged from any initial state-of-charge can be used to produce accurate capacity estimation. To employ CNN for capacity estimation using partial charging curves is however not trivial, this paper presents a comprehensive approach covering time series-to-image transformation, data segmentation, and CNN configuration. The CNN-based method is applied to two battery degradation datasets and achieves root mean square errors (RMSEs) of less than 0.0279 Ah (2.54%) and 0.0217 Ah (2.93% ), respectively, outperforming exist-ing machine learning methods.


Introduction
Due to continual falling costs, and features of high energy density, low self-discharge rate and long lifespan relative to other battery types, Lithium-ion batteries have been widely used as energy storage devices for electric vehicles (EVs), electric power grid, portable electronic devices, and many other applications Zhang et al., 2016). However, undesirable side reactions and processes inside the batteries while in use will continuously degrade their performance, leading to capacity loss and increase of internal resistance (Couto et al., 2019). Therefore, battery capacity and internal resistance are two important indicators for assessing battery ageing and performance degradation known as battery state of health (SOH). For example, SOH can be defined as SOH t = Q t =Q 0 , where Q 0 is the rated capacity of a battery, and Q t is the battery's maximum available capacity at current cycle t (Liu et al., 2019a).
Accurate capacity estimation provides insights into the SOH, thus plays a critical role in the battery management system, ensuring safe and reliable battery operation, preventing incipient failures and catastrophic hazards, and prolonging the battery service life . However, the battery capacity can not be measured in real time, and a variety of estimation/prediction methods have been developed (Tang et al., 2019). These methods can generally be classified into three categories: model-based, differential analysis-based, and machine learning-based.
Model-based methods use battery electrochemical, electrical, or other empirical models to depict the battery dynamics, and estimate the battery capacity with a combination of observers or adaptive filtering algorithms (Garg et al., 2018;Ouyang et al., 2016). A comprehensive review of battery modelling methods including electrochemical models, reducedorder models, equivalent circuit models, empirical models and black-box models has been presented by Zhang et al. (2014). Liu et al. (2020a) have systematically evaluated the performance of three modelling techniques (i.e. electrochemical model, semi-empirical model and Gaussian process regression-based model) for calendar ageing prediction in terms of accuracy, generalization ability and uncertainty management. Zheng et al. (2016) propose to estimate the battery capacity by using proportional integral observers based on an accurate electrochemical model, which can capture the spatiotemporal dynamics of batteries based upon the electrochemical principles. The equivalent circuit model with online identified parameters is used by Yu et al. (2019), and based on this model, an adaptive H infinite filter is applied to estimate the battery capacity. An empirical model that can reflect the battery dynamic capacity fading is proposed to predict the capacity degradation (Xu et al., 2016). However, the accuracy of the model-based capacity estimation methods is dependent on the quality of the estimation model. Unfortunately, it is difficult to build precise battery models due to the complex electrochemical reactions inside the battery under different operation conditions. Given the sheer complexity of the ageing mechanisms, simple lumped parameter models will lead to inaccurate estimation of the battery capacity.
The differential analysis-based methods correlate the features extracted from the differentiated curves of some electrical, thermal or mechanical parameters with battery capacity fade. For example, incremental capacity (IC) analysis and differential voltage (DV) analysis have been frequently used (Xiong et al., 2018). IC is calculated by differentiating the capacity change corresponding to its terminal voltage (dQ/ dV) through charging or discharging the battery under a small and constant current rate. The DV curves (dV/dQ) is defined as the inverse of IC. The voltage plateaus can be easily identified from the IC/DV curves (peaks/valleys) after the differential operation. The features extracted from the curves such as IC peak position, peak shape, corresponding peak voltage/ SOC, and peak area, are analyzed to estimate the battery capacity. For example,  have extracted five different features from the IC curves, the first two are peaks and the last two are valleys, the rest is the shoulder of the IC curves. The capacity is estimated by analyzing the position, value and associated area changes of these features. As described in Weng et al. (2016), the IC peak values are tracked to estimate the capacity for single cells as well as battery packs. In Zheng et al. (2018), three corresponding SOC positions are extracted from the SOC-based IC and DV curves for battery capacity estimation. While Tang et al. (2018) use a regional voltage, which is calculated by the terminal voltage corresponding to the IC peak, for fast capacity estimation. However, the IC/DV analysis is sensitive to measurement noise and subject to operation temperature, further, it requires very low current rate, therefore their applications are severely constrained .
With the unprecedented progress of machine learning (ML) techniques and the documentation of a large volume of battery test data worldwide, ML techniques have shown a greater potential in benefiting the battery capacity estimation. These methods are model-free, and do not need prior knowledge on the complex working principles of the battery. Various ML techniques have been applied to estimate the battery capacity fade, such as neural networks (NNs) (Dai et al., 2018;You et al., 2016;Zhang et al., 2019), recurrent neural network (RNN) (Chaoui and Ibe-Ekeocha, 2017;Eddahech et al., 2012), support vector machine (SVM) (Liu et al., 2018), support vector regression (SVR) (Weng et al., 2013), and relevance vector machine (RVM) Hu et al., 2015), just to name a few. In You et al. (2016), a NN with various optimization strategies is used for capacity estimation, by combining with the k-means clustering algorithm, achieving a RMSE of less than 2.44%. The inputs fed into this NN are the features manually extracted from the raw data. The RNN is used to predict the battery performance degradation in Eddahech et al. (2012), the mean square errors for capacity and resistance prediction are 0.462 and 0.296, respectively. In Liu et al. (2018), the nonlinear relationship between the extracted battery degradation features and battery capacity is established using the least square SVM method. The mean error for the capacity estimation is less than 5%. In Weng et al. (2013), a linear programmingbased SVR is proposed to correlate the IC peaks with the faded battery capacity, the model developed using one cell data is able to estimate the capacity fade of other cells with absolute error less than 1%. In Guo et al. (2019), a RVM based on particle swarm optimization is used to predict the battery capacity by modelling the relationship between the health feature and capacity, a relative error of less than 5% and 10% is achieved for single and multiple battery experiments respectively. In Eleftheroglou et al. (2019), three MLbased methods are used for battery health prediction, and the uncertainty associated with each point prediction is quantified. Liu et al. (2020b) have proposed a hybrid method for battery capacity and remaining useful life prediction, where the long short term memory model is used to capture the long-term capacity degradation dynamics and the Gaussian process regression model is used for the uncertainty quantification caused by the capacity regeneration phenomena. Further, the convolutional neural network (CNN) is applied to estimate the battery capacity using the measured voltage, current and the calculated cumulative capacity as inputs, of which the overall RMSEs are less than 2% on the NASA dataset (Shen et al., 2019).
However, the aforementioned ML-based estimation methods require either a non-trivial health features extraction process or an extra cumulative capacity calculation process, rather than directly use the measurements (e.g. current, terminal voltage, surface temperature). In summary, the battery capacity estimation is still a challenging topic due to a range of time-varying situation-dependent internal and external factors. Traditional simplified models and ML tools are difficult to capture these characteristics. As a class of deep NNs, CNN is powerful to capture hidden information from a huge amount of input data, making it an ideal tool for battery capacity estimation. In order to make full use of the information embedded in the direct measurements, while eliminating the necessity to manually extract features as well as fully charge a battery from a pre-defined state-of-charge, this paper proposes a CNN-based battery capacity estimation method using partial charging segment with flexible starting point. The paper has the following four contributions: Firstly, the CNN-based method will eliminate the need for priori knowledge and accurate battery physical model, making the method intelligent and adaptive for real-time capacity estimation. Secondly, the proposed method can deal with raw signals directly, mapping the measurements such as the terminal voltage, current, and surface temperature to the battery capacity, instead of relying on the preextracted health features. The representative features will be automatically learnt from the raw data. Thirdly, the paper introduces a novel data segmentation and time series-to-image transformation method which makes it feasible to use CNN for battery capacity estimation. Further, the proposed method only requires flexible partial charging segment of voltage, current and surface temperature curves, allowing fast and accurate capacity estimation, a key issue in realtime battery management. Finally, the proposed CNN-based method can selflearn its parameters and weights by using optimization algorithms like Adam. Once the parameters are properly learned offline, the model can be directly applied for fast online estimation.
The remainder of this paper is organized as follows. In Section 2, a brief introduction of the CNN is presented. Section 3 details the proposed CNN-based battery capacity estimation method, including the signal-to-image transformation method and the proposed CNN architecture. Section 4 validates the proposed method on two battery degradation data-sets and the experimental results are presented and analyzed. Finally, Section 5 concludes the paper.

Overview
The CNN is probably one of the most popular NNs in recent years. Compared with traditional deep neural networks (DNNs) with the same number of layers, the number of parameters (weights) of a CNN that are required to maintain the accuracy is significantly reduced, due to the sparse connectivity, shared weights, and pooling architectures. The sparse connectivity is achieved by making the size of filter smaller than the input, and enforcing a local connectivity pattern among neurons of adjacent layers. This architecture can reduce the overfitting risk, because the number of parameters are dramatically reduced. Shared weights refers to using the same weights for more than one activation function in a model, that is, each filter is used across the whole visual field. The architecture of shared weights has endowed the CNN with a property called equivariance, meaning that the output will change in the same way as the input changes (Liu et al., 2017). Then, the use of pooling architecture replaces the outputs of the convolutional layer with summary statistic, and this subsampling operation makes the output insensitive to small translation of the input.
CNNs are effective tools for extracting features from a high-dimensional data, and have been widely used in a range of fields, such as image processing, text classification, and speech recognition. These high-dimensional signals usually have high spatial or temporal correlations in adjacent variables, which can be effectively extracted through the convolution operations. Due to the fact that time series data is ubiquitous and is constantly generated in many engineering processes and in our daily life, there are imperative needs to develop efficient techniques to extract useful information from time series data. Considering the merits of CNNs in terms of automatic feature extraction and low overfitting risk, their applications in dealing with large amount of time series signals have also been investigated. For example, some reports have confirmed the potential of CNNs in extracting the representative features from time series data. In Yang et al. (2015), a CNN is used for solving a human activity recognition problem where the inputs of the network are multichannel time series signals collected from inertial sensors, and the outputs are related human activities. In this application, the filters in the CNN move along the temporal dimension for each sensor (each sensor corresponds to a row in the twodimensional (2D) input). In Cui et al. (2016), a multi-scale CNN is used for time series data classification problems. The CNN architecture has multiple branches in its first layer that can extract features of different frequency and time scales. Further, CNNs have also been used for time series forecasting and estimation, and fault diagnosis.

CNN architecture
A three-layer fully connected feedforward neural network and a simple CNN are compared in Figure 1. To illustrate the differences in neuron connection between conventional neural networks and CNNs, Figure 1(b) reformulates the 2D input into a column, it is obvious that each output node in a convolutional layer is connected to a small subset of the inputs. This sparse connectivity is different from the fully connected NNs, and this sparsity is achieved by replacing the matrix multiplication in NNs with convolutions (Borovykh et al., 2017). The filter (also called weight matrix) slides over the input space and generates a set of output nodes, and each output node is calculated by convolving the input with the filter. The number of involved inputs for one output node is dependent on the filter size. All the output nodes produced by the same filter form a feature map, which is a matrix, while the number of feature maps is decided by the number of filters. In other words, all the nodes in one output feature map share the same weights. For the r th feature map in layer l, the node C a, t r, l at a th row and t th column can be calculated by where x v, l 2 R p 3 q 3 v denotes the input of the l th layer, of which the number of channels (also called depth) is v, and each channel has a size of p 3 q. The inputs can be the initial input signal or the output of the preceding layer. For the output of the preceding layer, v refers to the number of feature maps in the (l À 1) th layer. v v, r, l 2 R ml 3 nl refers to the v th channel of the r th filter in layer l, with size m l 3 n l and stride set to (s l , d l ). b r, l is the bias for the r th feature map. f (:) is the activation function that endows the network with the ability to learn complex nonlinear relationships in the data. The activation function used in this paper is rectified linear unit (ReLU), which is given by The pooling layer is a down-sampling process which reduces the size of the feature maps extracted in a convolution layer as well as the number of parameters introduced to the following layers by either max pooling strategy P a, t r, l + 1 = max or average pooling strategy P a, t r, l + 1 = where (m (l + 1) , n (l + 1) ) is the size of the pooling region, while (s (l + 1) , d (l + 1) ) is the strides of the pooling filter in layer l + 1.
In the example shown in Figure1(b), only one filter is used, the filter size is 2 3 2 with stride being set to (1, 1), which means four neurons of the input generate one output node, the pooling size is 1 3 3 with stride being set to (1, 1), thus a 3by-3 matrix becomes a 3-by-1 matrix after the pooling stage. Followed by adding selective convolution/pooling/flatten/ fully connected layer, the final output will be obtained where x denotes the input of the output layer, v and b are the weights and bias that connect the x and final output, respectively. In this paper, the output O is the estimated battery capacity. To evaluate the accuracy of the estimation results, the estimated capacity is compared with the reference value Q(i) using the following measures: mean-square error (MSE) E mse , root mean-square error (RMSE) E rmse and mean absolute error (MAE) E mae , which are defined as where N is the sample size. Here one sample is referred to one image input to the CNN. All the weights and bias are tunable parameters (u) which are updated by minimizing the loss function J (u) through an optimization algorithm. For prediction problems, it is common to use MSE loss function (Reed and Marks, 1999), that is J (u) = E mse . To update the parameters, the Adam algorithm (Kingma and Ba, 2014) is used in this work, which has been suggested as the default optimization method for deep learning applications (Ruder, 2016) where g t is set to be the gradient of the loss function J (u) at t th training iteration. m t and v t are the estimated first moment (the mean) and second moment (the uncentered variance) of the gradient respectively, andm t andṽ t are their biascorrected values. b 1 and b 2 are exponential decay rates, while b t 1 and b t 2 are b 1 and b 2 to the power t. u t is the updated parameters.

Methodology
In this section, the proposed CNN-based battery capacity estimation method is described in detail. First, the method to transform the measured time series signals consisting of battery current, terminal voltage and cell temperature to a 3D image representation is introduced. Then the CNN is designed based on the classical LeNet-5 configuration (LeCun et al., 1998).

Time series signal transformation
For other popular capacity estimation methods, the measurement data are not directly used for capacity estimation, and some features need to be extracted from the data first. For these methods, the estimation performance is dependent on both the number of extracted features and the way they are combined (Cai et al., 2019). However, it is not easy to effectively and efficiently extract features form the raw data. To make full use of the large volume of historic measurements, the correlations among different measured variables at different sampling periods have to be investigated. This is, however, not a trivial task to handle manually. CNNs, however, can overcome this difficulty, but to apply CNN for capacity estimation, a transformation stage is first required, which is elaborated below.
As illustrated in Figure 2, which shows one complete charging and discharging cycle, n continuous data points are extracted from each measured variable (e.g. current, voltage, cell temperature) in one charging and discharging cycle, and this operation is executed for M times. That is, M data chunks are generated for each variable in a cycle. Then the n points in each chunk are converted into a ffiffi ffi n p 3 ffiffi ffi n p matrix, and the matrix represents a 2D image. The three variables together form 3D images ( ffiffi ffi n p 3 ffiffi ffi n p 3 3) as the input samples to the CNN, with each variable being associated to one channel of the image. In this signal-to-image transformation, the number of data chunks M for each variable in a cycle, which contains L data points in total, is determined by the segmentation length n and the overlap size c between two adjacent data chunks The function floor(.) gives the greatest integer less than or equal to the input parameter. The samples generated from the same cycle correspond to the same capacity value. Since each sample intercepted from the full charging and discharging cycle corresponds to a part of the charging/discharging process, based on the model trained with such samples, it is possible to estimate the capacity of a battery only using a part of the charging/discharging data. Besides, the part of the charging/discharging curve intercepted from the whole cycle may start at any point, meaning that the trained model can estimate the capacity of the battery charged/discharged from any unknown initial SOC. As shown in Figure 2, these measurements have different scale, which may slow the training process and degrade the estimation accuracy. Thus, data normalization is applied to process the signals before feed them into the network. In this work, the min-max normalization strategy is adopted, which retains the original distribution of data and all transformed data fall into the range of [-1,1], reflecting both the charging and discharging phases. The normalized value z of the set of measurement x k = fx k i , x k 2 , :::, x k n g is calculated by 3 2 À 1, i 2 1, :::, n where x k refers to the measurements of the k th data chunk, x denotes a collection of all data points used in the training, n is the number of data points per chunk. After the data normalization and time series to image transformation step, the final input of the CNN is illustrated in Figure 3. This data transformation method is simple to use because no predefined parameters are required, and it is an enabling block to apply the CNNs for time series signals.

Model construction
With the transformed 3D data, the CNN can then be trained to estimate the battery capacity. Considering that the size of the input sample is relatively small (15 3 15 3 3), a rather simpler CNN structure is adopted. In this study, the architecture of the proposed model is designed based on LeNet-5 (LeCun et al., 1998), a classical and effective CNN structure. The proposed CNN architecture consists of two alternating convolutional and pooling layers, followed by two convolutional layers, and finally a flatten layer and two fully connected layers are utilized. The zero-padding method (Li et al., 2015) is used in the last two convolutional layers to control the size of the feature maps. This architecture is graphically illustrated in Figure 4, and the output shape and number of parameters for each layer are summarized in Table 1. The input image has three dimensions: weight, height and depth, where the weight and height are determined by the n data points in each data chunk, and the depth is determined by the number of variables. Take n = 225 for example, the weight and height of the input both equal to 15, and the depth is 3 (the three variables are current, voltage and cell temperature). The input is fed into the CNN model, it is first convolved with the filters. The filter design, denoted as M@w 3 h 3 d means that there are M filters with the size of w 3 h 3 d for a particular convolutional layer, d is determined by the depth of this layer's input. The M feature maps generated by the convolutional  layer are then passed through the pooling layer with 2 3 2 pooling region and output smaller feature maps. After the last convolutional layer, the flatten layer transforms the i 3 j 3 k features into a vector with i 3 j 3 k neurons. Finally, the vector is fed into fully connected (FC) layers to calculate the final output, which is the capacity value.

Experiment and analysis
In this section, the proposed CNN-based capacity estimation method is applied to two battery experimental datasets. The first is sourced from 124 commercial lithium-ion batteries cycled to failure under fast-charging conditions (Severson et al., 2019), and the other is the Oxford Battery Degradation Dataset (Birkl, 2017). During the training process, the number of the maximum training epochs is set to 80 and the minibatch size is set to 128 samples. Early stopping method with patience set to 4 is used to avoid overfitting problem. Further, the learning rate is set to 0.001.

Case 1: 124 commercial cells
In this public available dataset, the 124 lithium iron phosphate (LFP)/graphite cells are manufactured by A123 System (APR18650M1A), with a nominal capacity of 1.1 Ah and a nominal voltage of 3.3 V. All the cells in this dataset are charged at a constant temperature of 30°C with the fastcharging policy, namely ''C1(Q1)-C2''. In this charging scheme, the cell is first charged at a constant current (CC) C1, and when the SOC reaches Q1, the CC switches to C2. This CC step ends at 80% SOC, after which the cells are charged at 1C until the battery voltage reaches its upper cutoff potential 3.6 V. Then a constant voltage (CV) mode continues until the charge current falls to 22 mA. All the cells are discharged under a CC-CV protocol, discharging at CC of 4C until the cell voltage falls to 2.0 V with a current cutoff of 22 mA. In this work, data of the first 16 batteries in dataset 'batch3' are used. These 16 batteries are divided into four groups, each group contains four different batteries. The detailed policies applied to charging these 16 cells from 0% to 80% SOC are summarized in Table 2, and the test cells in each trial are given in details. Each trial, samples generated from  three of the four groups are first shuffled and randomly split into a training set and a validation set with the ratio of 7:3, which are then used to train the CNN model. The remaining group is finally used for testing the performance of the trained CNN model. The size of one sample inputted to the CNN is 15 3 15 3 3, and the training samples are first shuffled before they are fed into the network to train the model. Once trained, the model is used to estimate the capacity on the test group, some of which may have slightly different charging policies from the training dataset. For a test cell, 225 consecutive data points are selected for each variable (i.e. current, voltage and temperature) in each cycle, and the selected data chunks from the three variables are then transformed to a 3D image as an input to the network. It should be noted that the selected data chunks for the three variables are from the same time segment, though the starting point can be randomly chosen. The training and testing procedures are repeated 100 times, and the estimation results with the lowest RMSE on the test dataset among the 100 runs in each trial are illustrated in Figures  5-8 and listed in Table 2. In Figures 5-8, the estimated capacities are compared with reference values for all the test cells, where the blue region represents 5% error boundaries of the actual capacities. In Table 2, RMSE, max error (MaxE) and MAE of the estimation results are all listed. It is shown that the capacity degradation trend is well traced, and the RMSE, MaxE and MAE of the capacity estimations for the test batteries are less than 0.0279 Ah, 0.0896 Ah and 0.0206 Ah(2.54%, 8.15% and 1.87% of the rated capacity), respectively. It is noted that the estimation errors of battery 7 are slightly bigger than other battery cells. This is due to the fact that the charging policy of battery 7 is not included in the training dataset. However, as shown in Table 2, the estimation performance reveals that satisfactory results can still be achievable when the proposed method is applied to estimate the capacity of a battery whose charging policy is different from the training dataset.

Case 2: Oxford dataset
In this dataset, aging experiments are applied to eight commercial Kokam pouch cells, with a nominal capacity of 0.74 Ah. The dynamic driving profile used to degrade these cells is the Artemix urban drive cycle, and a characterization cycle is carried out every 100 dynamic cycles. The data collected from the characterization cycles, which charge and discharge the cells under a CC profile (1C) and the thermal chamber is set at a constant temperature of 40 8 C, are used for capacity estimation (Birkl, 2017). Each time data of seven cells are used to train the model, of which the generated samples are shuffled and split into training and validation sets with the ratio of 7:3, while data from the remaining cell is used for testing. The capacity estimation procedure in this case is the same as in   case 1. The whole training and testing procedures are executed 100 times, the best estimation results on the testing dataset out of the 100 runs are shown in Figure 9, and the related RMSE, MaxE and MAE are summarized in Table 3. The RMSE is less than 0.0217 Ah, which is 2.93% of the rated capacity.

Analysis
To investigate the performance of the CNN model with different number of convolutional layers in both cases, the identical training and testing datasets are used for all tests. The training and testing procedures are executed 100 times for each CNN configuration, and the average RMSE, MaxE and MAE of 100 runs are summarized in Table 4. Further, Figure  10 shows RMSE bar charts for different CNN models. It is revealed that the CNN model with two convolutional layers can achieve satisfactory results in Case 1, while four convolutional layers are required in Case 2. This is because Case 2 has less samples for training, therefore requires deeper architecture than Case 1 to extract more detailed information from limited training samples. Comparing the results of networks with four, five and six convolutional layers, and considering the total number of parameters involved in each configuration (as shown in Table 4), the CNN with four convolutional layers is the best trade-off, which can achieve satisfactory estimation results with relatively fewer parameters.
In this paper, the length of consecutive data points cut from the charging and discharging curves is chosen to be 225 for each variable, and the three data chunks for current, voltage and temperature are fused to generate a 15 3 15 3 3 input sample (image) to the CNN, referring to a partial charging/discharging segment with unknown initial and final SOC. This is again the best trade-off between the number of generated samples and the information embedded in each sample. To illustrate this, experiments are conducted with different lengths of segmentation, where the training and testing procedures are repeated 100 times on the identical training and testing datasets in Case 1. Figure 11 shows the average estimation error with respect to different length of segmentation. It is clear that 225 continuous data points provide relatively small estimation errors, while too small or too large segmentation both produce unsatisfactory estimation results. This is because shorter segments contain less useful information to describe the features, while longer segments will generate too few samples to train a proper model. In addition, the larger the segment size, the more parameters are involved, leading to longer training time.
Further, Table 5 compares the capacity estimation results of CNN, ANN and DNN (with different number of hidden layers and each layer has 40 neurons) using average RMSE,   MaxE and MAE of 100 runs in Case 1, and the error ratio against the rated capacity are given in parenthesis. It is obvious that the CNN model has achieved the best results while involving much less parameters.
In summary, the normalized RMSEs are less than 2.54% and 2.93%, respectively, on the two datasets, outperforming other machine-learning-based estimation methods.

Conclusions
This paper has proposed a novel CNN-based battery capacity estimation method only using partial charging segment of the direct measurements (e.g. current, voltage, and cell surface temperature). Compared to other ML-based methods, the proposed method is easy to implement, and can achieve fast online capacity estimation without extra health features extraction or cumulative charge calculation processes, while only raw data of a partial charging process is required. The CNN has demonstrated the capability of handling a massive amount of data to learn representative features, and the feature extraction and capacity estimation are automatically executed in one framework. To apply CNN for capacity estimation using measurable variables, a transformation method is developed to convert the time series to image representations that are acceptable by CNNs, and the converted 3-D images embed the spatially and temporally correlated information among these variables. The data segmentation    method performed priori to the transformation stage not only increases the sample numbers, but also makes it possible to achieve fast online capacity estimation only using partial charging segment of direct measurements with flexible starting point. The proposed method is evaluated on two battery degradation datasets, the estimation results confirm that the proposed CNN-based method can achieve satisfactory results and can be used for fast online capacity estimation once the model is properly trained offline. The CNN model developed in this paper has a large number of parameters to tune, and to reduce the size and number of tunable parameters in the CNN model will be our future work.