Content-Aware Fast Motion Estimation Algorithm for IoT Based Multimedia Service

Various applications based on IoT real-time multimedia are under the spotlight. To implement real-time multimedia service, motion estimation in video compression service has a high computational complexity. In this paper, an efficient motion search method based on content awareness is proposed consisting of three steps. The first step is motion classification using the center position cost distribution. The second step is calculation of a predictor based motion classification. The third step is setting the arm size of the search pattern based on adaptive use of the distance between the predictor and the center position. Experimental results show that the proposed algorithm achieves speed-up factors of up to 48.57% and 16.03%, on average, with good bitrate performance, compared with fast integer-pel and fractional-pel motion estimation for H.264/AVC (UMHexagonS), and an enhanced predictive zonal search for single and multiple frame motion estimation (EPZS) methods using JM 18.5, respectively. In addition, the proposed algorithm achieves a speed-up factor of up to 42.61%, on average, with negligible bitrate degradation, compared with the TZ search motion estimation algorithm for the multiview video coding (TZS) method on HM 10.0.


Introduction
The rapid development of Internet of Things (IoT) technology makes it possible for connecting various smart objects together through the Internet and providing more data interoperability methods for application purpose.Recent research shows more potential applications of IoT in information intensive industrial sectors such as healthcare services [1].Also, with the vigorous development of the Internet of Things (IoT) technology, the wave of the Internet of Things applications followed by applications based on IoT real-time multimedia is under the spotlight [2].
Most viewers prefer digital television delivered via terrestrial, cable, satellite, or the Internet over analog service due to a greater choice of channels, electronic program guides, and high definition services.Analog TV has been switched off in many countries.Many TV programs can be provided via the Internet.However, due to the huge size of data for a video signal, in practical scenarios it is not possible to transmit raw data and video compression is required.To generalize the video data compression technique for use with different kinds of media, a global standard for video compression has been developed by the ISO/IEC motion picture experts group (MPEG) and the ITU-T video coding experts group (VCEG).
The ITU-T video coding expert group and the ISO/IEC moving picture experts group together proposed the H.264/AVC video coding standard in 2003 [3].This standard is widely used for many applications, including broadcast of HD TV signals, video content acquisition, camcorders, and security applications.However, due to increasing demands for ultrahigh definition over and above HD video formats VCEG and MPEG established a Joint Collaborative Team on Video Coding (JCT-VC) January 2010 to implement the next-generation video compression standard, named high efficiency video coding (HEVC) [4].The aim of this new standard is to further reduce the bitrate by 50% with the same reconstructed video quality.With these standards, Liu and Yan have introduced the multiview video technology, explained the principle and coding structure of multiview video, and discussed two kinds of video monitoring applications mode in Internet of Things, focusing on the application of the intelligent security systems in the IoT environment [5].Both the H.264/AVC and HEVC video coding standards use both temporal and spatial redundancy to achieve compression.In the temporal domain, there is usually a high degree of correlation or similarity between video frames that were captured at nearly the same time.Temporally adjacent frames (successive frames in time order) are often highly correlated, especially if the temporal sampling rate or frame rate is high.In the spatial domain, there is usually a high degree of correlation between pixels that are close to each other as values of neighboring pixel are often similar.
In general, a video system comprises the two major modules of intraframe coding and interframe coding.The block based motion estimation (BME) algorithm plays a key role in interframe coding.In the H.264/AVC video coding standard, BME has 7 different block sizes that are used for interframe motion estimation and compensation [6].These different block sizes actually form a two-level hierarchy inside a macroblock (MB).The first level comprises block sizes of 16 × 16, 16 × 8, and 8 × 16 that is called a large MB type.In the second level, the MB is specified as a P8 × 8 type block size where each 8 × 8 block can be one of the subtypes 8 × 8, 8 × 4, 4×8, or 4×4.These are referred to as the submacroblock type.The relationship between these different block sizes is shown in Figure 1.
For HEVC, sizes of coding blocks are not fixed, as are macroblocks in H.264/AVC.A flexible coding block (CB) and a novel quadtree based coding tree unit (CTU) have been proposed for HEVC.Prediction of each block is performed in the prediction unit (PU).Illustration of the CTB structure for a CTU is shown in Figure 2(a) where 64 × 64 pixel CTU block is divided into smaller CU blocks.Upon calculation of the RD cost for every combination, the CUs that are under the red dotted area of Figure 2(a) give minimum RD values.The corresponding CTU partitioning and CTB structure for this particular (best) combination are shown in Figure 2(b).However, the interprediction technique is similar to H.264/AVC interprediction, for which the most computational time is needed for motion estimation, which is common for both standards.
Among traditional motion estimation (ME) algorithms, a full search (FS) algorithm requires much time due to evaluation of all candidate macroblocks to achieve optimal performance.However, the FS algorithm is not suitable in real-time video coding applications because high degree of computational complexity is involved.
To overcome this problem in use of the FS algorithm, variable fast block-matching algorithms (FBMA) have been proposed to increase the speed of the motion estimation process based on reduction of the number of search points using three different approaches.The first approach uses coarse-to-fine searching as a three-step search algorithm (TSS) [7], a 2D logarithmic search algorithm [8], and a fourstep search algorithm (4SS) [9].The second approach uses center-biased characteristics of the error distortion as a block based gradient descent search algorithm (BBGDS) [10] and a diamond search algorithm (DS) [11].BBGDS is difficult to apply fast motion and simply falls to a local minimum position.DS is efficient for use in midlevel motion.However, many search points are required in slow and fast motion.The third approach uses temporal and/or spatial correlation for calculation of a predicted motion vector (MV).An adaptive rood pattern search (ARPS) and an enhanced predictive zonal search for single and multiple frame motion estimation (EPZS) are also used.These algorithms use a set pattern size or start position from a previous frame and/or a neighboring current block MV.The EPZS and ARPS approaches also preserve the peak signal-to-noise ratio (PSNR), as does FS, while reducing the time required and maintaining the bitrate.
An adaptive FBMA has also been proposed based on pattern switching among TSS, DS, and BBGDS.However, some proposed algorithms only consider a fixed size MB [12].In [13], a clustering-based approach has been proposed in which an algorithm periodically counts the motion vectors of past blocks to accumulate progressive clustering statistics; then it uses the clusters as MV predictors for following blocks.
To reduce the computational complexity of the motion estimation module, an efficient algorithm is herein proposed for classification of video motion content using an adaptive size for a search pattern.The proposed algorithm reduces computational complexity by use of a predictor.
The rest of the paper is organized as follows: Section 2 presents related motion estimation algorithms.The proposed algorithm is described in Section 3. Experimental results are presented in Section 4, and conclusions are drawn in Section 5.

Related Research
Block based encoding structure is used in the H.264/AVC and HEVC video standard.For interprediction, a motion estimation technique is at the core of video compression and for different video processing applications that extract motion information from a video sequence.Typically using motion estimation, a motion vector is generated for a block (MB or CU) in the video compression standard.The motion vector indicates displacement of a block of pixels from a current location due to motion of an object or the camera.This information is used to determine the best matching block in the reference frame that minimizes the rate-distortion cost.This technique is known as the block-matching algorithm (BMA).Based on study of different motion estimation algorithms used in the H.264/AVC and HEVC standards, existing BMAs can be classified into the three categories of (1) fixed search patterns, (2) search patterns based on block correlation, and (3) search patterns based on motion classification.

Fixed Search Patterns.
Most methods are based on the assumption that the ME matching error decreases monotonically as search moves toward the position of the global minimum error.The motion vector of each block is searched independently using fixed search patterns.Examples are displacement measurement and applications in interframe image coding (2-LOGS), motion compensated interframe coding for video conferencing (TSS), a novel four-step search algorithm for fast block motion estimation (4SS), a block based gradient descent search algorithm for block motion estimation in video coding (BBGDS), a hexagon-based search pattern for fast block motion estimation (HEXBS) [14], a new diamond search algorithm for fast block-matching motion estimation (DS), and fast integer-pel and fractionalpel motion estimation for H.264/AVC (UMHexagonS) [15].These algorithms reduce the number of search points with a tradeoff between complexity reduction and image quality.
Both 4SS and TSS are efficient for fast motion video sequences because MVs in fast motion sequences are far removed from the center point of a macroblock.However, in other cases, such as medium and slow motion sequences, the MV can be trapped in a local minimum.TSS also uses a constantly allocated checking point pattern in the first step, which becomes inefficient for estimation of slow motion.A new three-step search for block motion estimation (NTSS) [16], an efficient three-step search algorithm for block motion estimation (ETSS) [17], and a simple and efficient search algorithm for block-matching motion estimation (SES) have been proposed in order to improve the performance of simple fixed search pattern algorithms [18].

Search Patterns Based on Block
Correlation.Instead of using predetermined search patterns, methods for exploitation of the correlation between the current block and an adjacent block in the spatial and/or temporal domains for prediction of candidate MVs are being considered.Predicted MVs are obtained based on calculation of statistical average (such as a mean, a median, or weighted mean/median) for neighboring MVs or selection of one of the neighboring MVs according to predetermined criteria [19].One such candidate method, named accelerator MVs, is based on differentially increased or decreased MVs after consideration of the motion vector of the collocated frame in the previous frame and also of the frame before that.
The concept behind selection of such predictor is that a block may not follow a constant velocity but may be accelerated.This kind of approach uses spatial or/and temporal correlation to calculate the predictor as the ARPS and EPZS.These types of algorithms set pattern sizes or estimate positions from a previous frame or/and a neighboring current block MVs.Both EPZS and ARPS preserve the peak signalto-noise ratio (PSNR) as does FS, and the time required is reduced with a similar bitrate.However, these procedures create much overhead in memory resources since they use Spatio-temproal information.

Search Patterns Based on Motion Classification.
Apart from fixed or variable search patterns, the motion activity of a video sequence has been used for a block-matching algorithm.Video sequences can be broadly divided into the three categories of frame slow, medium, and fast sequences based on motion activity in successive frames.Algorithms use different schemes to classify video sequences.
A proposed search patterns switching algorithm for block motion estimation (SPS) [20] has combined two approaches of motion estimation.The first approach uses a coarse-tofine technique for reduction of the number of search points, similar to 2-DLG and TSS.This approach is efficient for fast motion video sequences because, in these sequences, search points are evenly distributed over the search window and, thus, global minima far distant from window centers can be located more efficiently.The second approach uses the centerbiased characteristic of MVs.Algorithms such as N3SS, 4SS, BBGDS, and DS use center-biased search patterns to use the center-biased global minima distribution.Compared with the first approach, a substantial reduction in the number of search points can be achieved for slow motion sequences.SPS algorithms combine the advantages of these two approaches by use of different search patterns according to the motion content of a block.The performance of an adaptive algorithm depends on the accuracy of motion content classification.
In real video sequences with slow, medium, and fast motion, different motions frequently occur together.The adaptive fast block-matching algorithm using switching search patterns for sequences with a wide-range of motion (A-TDB) can efficiently remove temporal redundancy of sequences containing a wide range of motion.Based on characteristics of a predicted profit list, A-TDB can adaptively switch search patterns among TSS, DS, and BBGDS according to the motion content [21].
An adaptive motion estimation scheme for video coding (NUMHexagonS) using statistics of the MV distribution has been developed.The algorithm uses method for prediction of the MV distribution and makes full use of MV characteristics.A combined MV distribution prediction with new search patterns also makes the search position more accurate [22].

Proposed Work
Most real world video sequences contain a substantial amount of background and many motionless objects that have a high temporal correlation between successive frames.In order to estimate movement accurately, different pattern sizes are used based on the type of motion content.

Motion Classification Based on Video Content.
There are several measurements for classifying the motion degree of video content: RD cost, sum of absolute difference (SAD), and the mean of absolute difference (MAD).Usually, analysis of MAD value is robust to some noise components for separating the motion degree.Based on this, we also employ the MAD value to classify the motion degree.
We have analyzed the distribution of the mean absolute difference (MAD) of the center position for the Akiyo, Foreman, and Football sequences, typical representatives of slow, medium, and fast motion, respectively.Mean and standard deviation values were calculated and used to model Gaussian distribution as shown in Figure 3.
Only intermotion estimation was decomposed.Calculated MAD values are represented as where Cur  and Ref  are the macroblocks of the current and reference frames, respectively, with a size of , and (, ) are the spatial horizontal and vertical positions of the current and reference frames, respectively.Gaussian plots of slow, medium, and fast motion are shown in Figure 3. Video sequences are shown in which two intersection points of these distributions can be seen.These points of intersections are considered as thresholds Th 1 and Th 2 .For motion classification, there should be two points of intersection as Threshold 1 = Th 1 and Threshold 2 = Th 2 .The Th 1 value is a classification of the slow motion content between slow and medium motion, and Th 2 is classification of the fast motion content.(2) To effectively locate the global minimum, predictor is designed to use the motion content.Predictor designs in many ME algorithms have been proposed and most use temporal and spatial information [23,24].While accurate predictors can be located, resource overhead and much memory are required.In the next section, calculation of the predictors to solve this problem is explained.

Calculation of Predictors.
Based on a unimodal error surface assumption, block distortion monotonically decreases towards the global minimum.A predictor is calculated using three steps.First, distortion of the center point is compared to the threshold value.If the determined motion is slow, calculation of the predictor is not performed.Otherwise, distortions at the center and four adjacent center points are calculated.Among these five points, the one with the minimum distortion is recognized as the best point and not Min 1 .Therefore, calculation of the distortion values of the five points is performed exclusively of Min 1 to locate the second minimum as Min 2 .The Min 1 and Min 2 values have the direction of MVs.
From these Min 1 and Min 2 values a minimum of two positions is obtained.In the second step, if motion is classified as medium, the jumping distance is set to 2. If motion of the current block is classified as fast, then the jumping distance is set to 3. To justify the jumping distances of different sequences, analysis of the 3 representative video sequences Akiyo, Foreman, and Football was performed, and results are shown in Table 2 in which jumping distances of 2 and 3 were sufficient.
For the Foreman and Football sequences, two minima are calculated initially for a candidate point where the candidate point is shown as the center point and in black color in Figure 4, two separate loops are generated for each minima (Min 1 and Min 2 ) as shown in Figure 4.
From Min 1 , in the vertical direction a new point  is considered and the SAD value of this point is calculated.The  distance between point  and point  is the jumping distance of the sequence.If the SAD value of point  is smaller than for point  , another point in the same direction is considered (point  in the example).This procedure will terminate when the SAD value of the new point is less than the SAD value of the previous point.For this example, point  has the minimum SAD value in this direction and is considered as the candidate point in the vertical direction.A similar procedure is also applied for point  (Figure 4).Considering point  as the candidate minimum point in the horizontal direction we identify two candidate points in two different directions.Using these two candidate points a third predictor point is calculated.The predictor point is shown as point  in Figure 4.The predictor point is calculated using (3).On the other hand, after calculation of the predictor, an adaptive diamond search is performed using the calculated predictor as the center point (4)

Selection of a Search Pattern Size.
A pattern size decision algorithm is introduced (Figure 5).Initially, the maximum distance from the predictor to the center is calculated, which is 3 in Figure 5.Using this maximum distance, a square region is considered with 4 points (Figure 5, marked as point  ).SAD values of these 4 new points are calculated and centers are checked.If the minimum SAD value is the center of the diamond, then the large search pattern size decision algorithm terminates.For this case, a small diamond search region is considered again with 4 new points (shown in Figure 5 as point  ).In this second step, SAD values of the 5 points are checked.As in the previous decision, if the minimum value lies in the center, then the search terminates.
Other conditions are shown in Figure 6 and discussed in detail in the next section.

Overall Procedure.
The proposed algorithm can be divided into the steps shown in Figure 6: (1) Initial decision for predictor candidates.
(3) Setting a search pattern size.
Initially, values Min 1 and Min 2 are calculated.If Min 1 is the center, then the block has zero motion and no additional investigation is needed.Otherwise, the MAD value of the center is calculated as MAD  , the value of which is compared with Th 1 and Th 2 for calculation of the jumping distance.
(5) For slow motion, only a small diamond search is considered in the proposed algorithm.On the other hand, medium and fast motion based blocks require calculation of the predictor.The overall procedure is shown in Figure 6.

Experimental Results
The proposed algorithm was implemented using JM 18.5 [25] and HM 10.0 [26] where ΔBit indicates the total bitrate change (percentage), ΔPSNR  indicates ΔPSNR  changes, and Δ is the time saving factor (percentage).A positive value for the bitrate indicates an increase in the number of bits and a negative value indicates a decrease.An improvement in the ΔPSNR  parameter gives a positive value and degradation gives a negative value.A negative value for the time saving factor indicates that the proposed algorithm consumes less time than the original reference software.We performed all simulation experiments on a PC with an Intel i5-3470 processor with a 3.2 GHz clock speed and 4 GB of RAM running on Windows 7. The proposed algorithm was tested using both H.264/AVC and HEVC.

Performance Evaluation Using H.264/AVC Standard.
The experimental conditions were as follows: (1) Frames To Be Encoded 100.
(5) Entropy coding method CABAC.Experimental results are shown in Table 3 in which the proposed algorithm is compared with the full search (FS), UMHexagon search, and EPZ search algorithms.Deviations between the proposed algorithm and the comparison algorithms are shown.The proposed algorithm achieved reductions of 97.10% for total time, on average, compared with FS and achieved, on average, 29.43% and 8.28% more total time reductions than for UMHexagonS and EPZS, respectively.Considering the motion estimation time (ΔME time) the proposed algorithm provided 98.88%, 48.57%, and 16.03%, on average, time reductions compared with FS, UMHexagonS, and EPZS, respectively.
Quality degradation of the proposed algorithm is also shown based on the ΔPSNR and ΔBit metrics.The proposed algorithm provided a negligible loss of quality, compared with the FS, UMHexagonS, and EPZS algorithms, all with larger time reductions.
To show PSNR deviation from a low to a high bitrate, RD curves of the proposed and compared algorithms are plotted in Figure 7 for all sequences.All of the algorithms provided almost the same RD curve.For the Forman sequence, the proposed and UMHexagonS algorithms produced similar    RD performance.The proposed algorithm achieved a better result than UMHexagonS for the Tractor sequence and almost the same performance as FS for the same sequence.

Performance Evaluation Using HEVC Standard.
The proposed algorithm was tested in HM 10.0 reference software using the HEVC encoder.The test environment using the same hardware as for the H.264/AVC comparison was based on the following configuration: (1) QP values are 24, 28, and 32.
(2) The first 30 frames of each sequences were considered.
(3) All analyses were performed in low-delay main mode.(4) Fast modes were set in the order of FEN: 1, ECU: 0, FDM: 1, CFM: 0, ESD: 0. Results are shown in Table 4 in which the proposed algorithm was compared with the FS and the TZS algorithms [27].The proposed algorithm reduced the overall encoding time, on average, 95.57% and 14.27%, compared with FS and TZS, respectively.Considering only the motion estimation time, the proposed algorithm achieves 99% and 42.61% time reductions, compared with the FS and TZS algorithms.
To evaluate quality degradation, ΔPSNR and ΔBit values are also shown in Table 4.The proposed algorithm provided a similar PSNR loss with a marginal bit reduction.
RD curves of all sequences are shown in Figures 8, 9, and 10 for Class B, Class C, and Class D. The proposed algorithm achieved almost the same result as the FS and TZS algorithms.For the BasketballDrive sequence, the proposed method achieved a degradation of the bitrate, compared with FS, due to the complexity of the sequence.
In IoT environment, multimedia contents can be also generated locally or downloaded from the Internet through any WiFi, DSRC, WiMAX, 4G LTE connections available to the mobile/smart devices and enable the end-user or end-system to take appropriate actions and be aware of the environmental conditions based on rich visual information.In view of this, real-time content delivery is very important, especially based on HEVC video standard.The proposed algorithm can provide a scheme for supporting the real-time UHD video content distribution system.

Conclusions
A video content-based fast motion estimation algorithm is herein proposed.Based on motion classification, a predictor is used and an adaptive search pattern size is set.Using H.264/AVC, experimental results showed that the proposed algorithm achieved speed-up factors of up to 48.57% and 16.03%, on average, compared with UMHexagonS and EPZS, respectively, while maintaining good bitrate performance.
For HEVC, the proposed algorithm achieved a speedup factor of up to 42.61%, on average, compared with TZS.Also, for medium and fast motion sequences, the proposed algorithm substantially reduced the consumed time for motion estimation.The proposed scheme will be helpful for implementation of a real-time video encoding system because of no need for large memory comparing with EPZS and UMHexagonS algorithms.It means that the proposed algorithm can give easy hardware implementation of HEVC encoder on portable (embedded) devices.

Figure 4 :
Figure 4: Candidates with consideration of directional prediction.

Figure 5 :
Figure 5: An adaptive diamond search pattern and a small diamond search pattern.

Foreman
and Tractor are medium motion, and fast motion representative sequences are Soccer and Stefan.The frame size of Tractor was 1920 × 1080 (FHD).The other sequences used a frame size of 352 × 288 (CIF).

Figure 8 :
Figure 8: Rate-distortion (RD) curves for (a) Cactus and (b) BasketballDrive sequences for Class B in low-delay-p, main condition.

Table 1 :
Change in threshold points for different QP values.Analysis was performed for a particular QP value that varies from 20 to 36.Corresponding Th 1 and Th 2 values are calculated for each QP value.Different threshold points for changing QP values are shown in Table 1 for the combined Akiyo, Foreman, and Football distributions.The values of Th 1 and Th 2 change linearly with the QP value.Hence, it is natural to use a linear model for these thresholds for different QP values.The corresponding linear model is

Table 2 :
Experimental results for determination of the jumping distance.
of HEVC.Measurement of ΔBit, ΔPSNR  , and Δ was defined as

Table 3 :
The performance comparison of the proposed algorithm with the FS, UMHexagonS, and the EPZS on the JM 18.5.