Prediction and optimization of sharing bikes queuing model in grid of Geohash coding

Dockless bike-sharing systems provide parking anywhere feature and environment-friendly approach for commuter. It is booming all over the world. Different from dockless bike-sharing systems, for example, previous studies focus on rental mode and docking stations planning. Yet, due to the fact that human mobility patterns of temporal and geographic lead to bike imbalance problem, we modeled human mobility patterns, predicted bike usage, and optimized management of the bike-sharing service. First, we proposed adaptive Geohash-grid clustering to classify bike flow patterns. For simplicity and rapid modeling, we defined three queuing models: over-demand, self-balance, and over-supply. Second, we improved adaptive Geohash-grid clustering-support vector machine algorithm to recognize self-balance pattern. Third, based on the result of adaptive Geohash-grid clustering-support vector machine, we proposed Markov state prediction model and Poisson mixture model expectation-maximization algorithm. Based on data set from Mobike and OFO, we conduct experiments to evaluate models. Results show that our models offer better prediction and optimization performance.


Introduction
With the development of technology, dockless bikesharing systems (BSSs) have been solved the last mile problem in intelligent city life. 1 BSSs are booming all over the world, especially in large cities. In the traditional self-service mode, users have to rent or return bike sharing at fixed stations. Based on mobile Internet, global positioning system (GPS), and location-based service (LBS), BSSs allow users to start or end service in community curbside, subway stations, and central business district (CBD) parking zone.
Since about 2015, the central problems for municipal administration to solve include acquiring space to park the bikes and achieve efficient use of the bikes. According to bike sharing park-anywhere feature, the core of the issue is focused on two factors: 1. Attribute to the human mobility patterns and spatiotemporal factor, the imbalance problem is difficult to model and predict in dockless BSS. 2. The phenomenon of ''bike-sharing graveyard'' takes place anywhere, for example, curbsides, which blocks the path of pedestrian. This is a supply and demand planning problem that changes with temporal and geographic. 2 The truckbased 3 and the user-based approaches are two baseline approaches to solve the bike imbalance issue.
However, the truck recycling approach depends on demand prediction and manual intervention.

Motivation and incitement
In dockless BSSs, bikes are widely used in our daily life. In recent study, many previous methods focus on demand prediction in dock BSSs. The researcher had implemented different strategies to address the occurrence of rebalance, such as sending cargo trucks to relocation bikes before rush hours. Due to lack of supervision and control strategy, the BSS is toward extreme phenomenon. That is too many illegal parking 1 School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China 2 at curbside, bus station, community, and so on. The imbalanced usage pattern of bikes causes over-demand and over-supply issues not only to commuter but also to cities. Motivated by the aforementioned challenge, we examine three methods: station-centric model with global features, 4 demand prediction, 5,6 and freefloating bike-sharing model. 7 Based on the limitations of approach, our prediction model by adaptive Geohash-grid clustering (AGC) technology to preprocess parking coordinates data. We describe the stage of imbalance changed in different Geohash grids. Then, we propose a dockless sharing bike demand modeling: over-demand model, oversupply model, and self-balance model. Expectationmaximization (EM) algorithm is derived to learn the parameters of Poisson mixture model (PMM). According to queuing theory, we modeled bike usage patterns and human mobility patterns, predicted demand, and optimized the parameters.
We also showed that the prediction and optimization algorithms improve convergence and achieve a better performance compared with existing algorithms.

Literature review
This section summarizes research work on modeling, demand prediction, and parameter optimization. Mathematical modeling is the first step in solving prediction and optimization problems. From the viewpoint of BSS designers, route determination and demand prediction are relocated. Parameter value is a critical decision indicator for managers and controllers to optimize BSS. In our literature review, we concentrate on spatiotemporal, demand, and rebalance problems.
There are many researchers focusing on BSS modeling. Mathematical modeling 8 have been used in recent work such as planning model, 9 probability model, 10 clustering algorithms, 11 loss function, 3 and so on. Especially, the auto-regressive integrated moving average models 12 and auto-regressive moving average (ARMA) 13 models are widely used in modeling of human mobility patterns. According to BSS stations usage profile, Sayarshad et al. 9 proposed a multiperiodic optimization formulation for planning problems. Crisostomi et al. 10 propose a Markov chain model. In the BSS, Markov decision process was a tool to solve commuter mobility patterns. Based on the discriminative functional mixture (DFM) model, Bouveyron et al. 11 proposed FunFEM methodology. To minimize the total cost, Hu and Liu 3 proposed allocation model to solve bike rental stations and truck dispatching depot problems. According to travel patterns, MY Du et al. 14 proposed a multinomial logit (MNL) model. In free-floating bike-sharing model of MNL consists of three categories, such as origin to destination pattern (ODP), travel cycle pattern (TCP), and transfer pattern (TP).
In order to redistribute the number of bikes, spatiotemporal analysis method is widely used in prediction algorithm. It is help to analyze a strategic design model for BSS. According to temporal and spatial factors, Yang and Hu 15 proposed a spatiotemporal bicycle mobility model. The temporal and geographic mobility patterns are applied to demand prediction. 16 Based on spatiotemporal analysis, Froehlich et al. 17 proposed clustering technique in Barcelona's BSSs.
From the viewpoint of bike-sharing operators, demand prediction is a critical performance indicator. We can predict the number of rental bikes according to cluster algorithm. Ciancia et al. 2 propose station occupancy predictor. This is a data mining framework to predict the occupancy levels of the stations by Bayesian and associative classifiers. Based on station usage, long-term stability, and short-term volatility, Yao et al. 18 proposed three-step demand estimation model. However, tensor factorization is widely used in routing prediction of BSS. YX Li et al. 19 proposed a hierarchical prediction model by tensor factorization to extract latent user activity patterns. Based on clustering algorithm to forecast bikes' and docks' availability for each station, 20 Gaussian mixture model (GMM) 21,22 and PMM 23 are common prediction schemes. LB Chen et al. 24 proposed dynamic cluster-based framework for demand prediction.
Optimization of resource allocation is necessary to improve system performance (e.g. bike usage patterns and rebalancing problem). Based on BSS optimization configuration, Ling et al. 25 developed a novel deep reinforcement learning algorithm called hierarchical reinforcement pricing (HRP) for optimized rebalance problem in BSS. Based on optimal facility allocation and pool sizing for BSSs, [26][27][28] Guoming and Lukasz 29 propose two bikeshare pool sizing techniques which guarantee bike availability with high probability. Based on resource of bike-sharing optimization, similar problems are single vehicle one commodity capacitated pickup and delivery problem (SVOCPDP), 30 one commodity pickup and delivery traveling salesman problem (1PDTSP), 31 Swapping Problem, 32 and split delivery problem. 33 These optimization algorithms above mentioned aim to find a minimum cost route for users to renting and returning bicycles. Mostly, split delivery problem and branch-and-cut algorithm are solved through a tabu search algorithm. 33 The SVOCPDP gathers aspects from both the Swapping Problem 32 and the 1PDTSP. 31 The objective of optimization is to find the costless function by heuristics algorithm. In the literature, artificial immune systems support vector machine (AIS-SVM), 34 artificial neural network (ANN)-SVM, 35 and particle swarm optimization methods have been widely applied in optimization and classification problems.
Currently, some papers adopt Monte Carlo simulation to predict the demand of cluster. 29 Simulation results are different from real results. In our review of the related literature, there is little research available regarding modeling and predicting user behaviors in dockless systems. 2 In this paper, we focus on predicting imbalance stage and optimized parameter in dockless BSSs. Based on the real-world bike-sharing data set from Beijing city, we combined Geohash coding and queuing theory approaches to improve the SVM and to optimize the parameters of the EM algorithm to adjust the weight value in PMM.

Contribution and paper organization
As is evident from the review, there is abundant research regarding BSSs. To tackle our problem, we carefully design solutions to overcome the above drawbacks of literature. Based on geographic-grid clustering, we proposed AGC approach preprocessing checkin/out from parking data set. Starting from parking anywhere and mobility point, we research on improved SVM (ISVM) classification and optimized EM parameter in queuing models.
Our main contributions are as follows: 1. For bike flow pattern parking-anywhere problem, we formally define the state of overdemand, over-supply, and self-balance by processing coordinate data, transition state modeling by queuing theory. 2. For demand prediction, we improved biclassification algorithm to solve three-stage classification problem. Based on Markov state prediction (MSP), we propose AGC-SVM cope with dynamic demand. 3. For rebalance systems, we proposed PMM-EM model by PMM-EM algorithm.
The optimized parameters p L and l L guarantee high probability of bike system running. 4. In order to validate our methods, we use realworld data under eight baseline methods, such as perceptron, decision tree, gradient boosting regression tree (GBRT), k-nearest neighbor (k-NN), k-means, ARMA, Kalman filter, and hidden Markov model (HMM). We adopt measure accuracy of classification and regression algorithm through root mean squared error (RMSE), root mean squared logarithmic error (RMSLE), error rate, precision, recall, and F1 to evaluate that our models outperform significantly.
This paper is organized as follows: in section ''Overview,'' we definitions our models, in subsection ''framework,'' we discussed bike flow pattern models in BSS. In section ''Methodology,'' we defined AGC to solve check-in/out dynamic demand. Based on threestage classification problem, we improved SVM algorithm. We proposed PMM-EM model for imbalance stage prediction and rebalance parameter optimization. In section ''Experiments,'' we present experimental results to validate our method and discuss the merits and potential limitations of approaches, which give us guidance on when to use which model in practice. In section ''Conclusion and future works,'' we outline future research directions.

Overview
This section defines the notations (see Table 1) and terminologies used in this paper.

Preliminary and problem definition
Then, we define the terms used in this paper as follows.
Definition 1: Geohash coding. Geohash is a public domain geocode system invented in 2008 by Gustavo Niemeyer. The Geohash encoding generated by latitude is stored in List 1, whereas that generated by longitude is stored in List 2. Lists 1 and 2 encodings are merged. The odd number positions denote the latitude, whereas the even number positions indicate the longitude. A total of 32 characters, namely, 029 and b2z (remove a, i, l, o), are used for the base 32 encoding, that is, List 3. The purpose of privacy protection is to publish different encoding lengths. For example, the 6-bit code can represent a range of approximately 0.34 km 2 . A length of 7 bit can represent the range of 76 3 76 m 2 , as shown in Table 2.
Definition 2: Geohash grid. Geospatial index technique is a search method that efficiently deals with the roads, streets, and districts data. The grid index uses a hash data structure. Each grid corresponds to a bucket of the hash map ( Figure 5).
Definition 3: bike flow patterns of check-in/out. In a given time window, [t, t + Dt] is defined as a tuple X Dt = {X in , X out }, where X in i and X out i are the number of bikes' start and end services from Geohash grid during [t, t + Dt], respectively. We define X in and X out as the start and end bike services, shown as X in fx in 1 , x in 2 :::x in n g and X out fx out 1 , x out 2 :::x out n g. Check-in/out values mean the number of bikes that activate or terminate sharing bike service in Dt, denoted as x out iDt and x in iDt Definition 4: grid state. Sharing bikes is used by commuters who pick it up anywhere and anytime. According to statistic consequent from data set X in i and X out i , we find two states and one quasi-state. We define R A grid state as over-demand, R B grid state as over-supply, and R C grid state as self-balance.
Definition 5: queuing theory. Queuing theory is the mathematical study of waiting lines or queues. In this paper, we explore queuing theory by modeling and analyzing the number of sharing bikes for service, waiting times, and so on. In Geohash gird, sharing bikes are considered customers and parking be defined as entering the queuing system. Queue represents customers or sharing bikes waiting for service. We propose three queuing models, described by Kendall's notation (the standard system used to describe and classify a queuing node).

Framework
Based on bike flow pattern, we simulate the three classification results by queuing theory. Based on the fact that bike pattern of return and rental obeys Poisson process, we define three states convert to each other by bikes flow, as shown in Figure 1.

Label A: over-demand stage
This type of state in Geohash grid is named R A . For example, bus station region, office building. Label R A describes the model in over-demand stage. This stage means a little sharing bikes arriving at the parking area in a period of time, shown in the following equation The capacity is N in the sharing bike queuing system. The time interval of commuters arriving at R A grid follows negative exponential distribution. We proposed over-demand R A stage by M/M/1/N queuing theory. The working of the model is shown in Figure 2. Figure 2 shows how to model works.
In the model, the idle probability is Service rate in queuing model l Arrival rate in queuing model p Utilization of the server = l/m L The number of sharing bike that waiting for be serviced in line Vector data p Probability of PMM When K 6 ¼ 1 and K 6 ¼ 0. The number of customers waiting in line for bikes is obtained as Shown as (8) in Dt, as shown in the following equation For example, in subway station region, we named over-supply stage as R B . The capacity of the R B is K. we proposed over-supply R B stage by M/M/1/K queuing theory, as shown in Figure 3.
In this model, the idle probability of sharing bikes is Shown as (9):It means that the systems are kept at over-supply stage. Shown as (2): Following the changed bikes' flow pattern, arrow (2) means x out iDt . x in iDt . The state is transformed from B stage to C stage. Shown as (6): Following the growth of commuter, arrow (6) means x out iDt . x in iDt in BSS. The state is transformed from B stage to A stage. Label C: Self-balance stage R C stage is self-balance. It means x out iDt 'x in iDt , as shown in the following equation Different from defined R A and R B before, this queuing system have several service centers. The capacity of the R C stage is K. We named self-balance stage as R C stage by M/M/S/K queuing theory. In self-balance stage, the gird of R C has high user density and high demand for sharing bikes. The working of the model is shown in Figure 4.
In this model, the number of supplemented shared bikes is The probability of n bikes in the platform is r n s!s nÀs p 0 (n5s) The idle probability of sharing bikes is The mean number of waiting users is denoted by Shown as (4): It means the systems are kept at self-balance stage. Shown as (7): Following the changed bikes' flow pattern, arrow (7) means x out iDt . x in iDt in BSS. The state is transformed from C stage to A stage. Shown as (5): Following the growth of commuter, arrow (5) means x out iDt \ x in iDt in BSS. The grid state is transformed from C stage to B stage.

Methodology
We formulate bike flow pattern as a Poisson process and modeled by queuing system. Based on predicted imbalance stage and optimized rebalance parameters, we improved AGC and AGC-SVM approaches. We proposed PMM-EM model in which key components of problems are described in the following subsection.

AGC
Geohash coding means longitude and latitude point data transform algorithm. It is a hierarchical spatial data structure that subdivides space into buckets with a grid shape. In order to find the statistic of bike, we proposed AGC. First of all, we use the Geohash coding algorithm to process parking data. Latitude and longitude coordinate data are transformed into grid layers on the map, as shown in Figure 5. For Geohash code tagging, we statistics x out iDt and x in iDt from every Geohash grid during Dt. Bike flow pattern is denoted as P n i = 1 x out i and P n i = 1 x in i in the Geohash grid. We process the training data set as follows A cluster consists of all density of parking points, for example, WX4EQY tagging over-supply stage and WX4EQV tagging over-demand stage.
With the same Geohash coding length of the prefix letter, we evaluated the statistical result by threshold parameters b and P n i = 1 DX i relationship. Therefore, the statistical bikes' flow problem and pattern recognition are done by AGC approach.
According to the number of bikes, we adjust Geohash-grid size by choosing suitable prefix lengths. Different periods have different usage stages in the same Geohash grid. The adaptive Geohash grid not only predicted imbalance stage but also protect privacy. In next subsection, we introduce stage label classification by ISVM algorithm.

ISVM label classification
In this subsection, we describe how to create stage label classification in training data. To predict three states of Geohash-grid bike flow pattern, we adopt three label stages in queuing models. Because of self-balance, it is a fuzzy stage. Classification of bike flow pattern is convex quadratic programming. Support vector machine is good at bi-classification problem, as the method can significantly reduce the need for labeled training instances. Intuitively, a good separation is achieved by hyperplane that has the largest distance to the nearest point in any labels classification from training data. The details of the ISVM classification algorithm are illustrated as follow: Geometric Because of fuzzy stage boundaries and label classification, we improved SVM algorithm. This solution can be distinguished as ''over-demand and self-balance'' and ''over-supply and self-rebalance'' stages. The fundamental idea behind SVM is to choose the hyperplane with the maximum margin b, that is, the optimal canonical hyperplane. The geometric margin problem has become a convex minimization problem, as shown in To do this, one needs to find the weight vector v and the bias b that yield the maximum margin among all possible separating hyperplanes, the state is come from the hyperplane that maximizes Predicted imbalance stage Based on Markov non-aftereffect and Bayes conditional probability formula, Markov transition matrix is a square matrix describing the probabilities of moving from one state to another in a dynamic system. During the whole day, we divide up several time windows Dt, Markov transient evolution is For simplification and calculation of modeling, we named AGC-MSP. Different lengths of Geohash coding have different bike flow patterns toward AGC. AGC-MSP is a soft clustering method. It was used to predict imbalance stage and deployed bikes in advance. In application, these two scenarios for bike flow pattern prediction can be used complementarily as temporary stage scenarios and permanent state scenarios. We obtain the number of bike that come in and come out from Geohash grid. We process training data set that is recorded in the parking.

Rebalance parameter optimization
The bike flow pattern can be predicted by multiple times of Markov chain iteration in Geohash grid. In grid of R A state, the strategy is allocation bike in the morning. In grid of R B state, the strategy was recycled bike at night. Depending on operational experience and statistical summary, the deployed and recycled problems are uncertain.
Because of sharing the mobility of bikes, the three states of the grid alternate every day. To answer the second questions ''how to cope with rebalance issue,'' we proposed PMM-EM model. In different time, the grid state obeys Poisson distribution with different parameters. Therefore, how to optimize resource of deployed and recycled is rebalance issue. From observed parking data set, we can hardly point out the distribution exactly. Throughout the day, we define that the bike usage obeys Poisson mixed distribution. So that, the probability of sample X i is Among them, p L is the coefficient, where p L 50, and P n L = 1 p L = 1, where L is the l-th Poisson distribution.

Objectives function.
Maxlog L(x i , p, l) In Poisson mixture distribution, estimated variables p and l is When parameter p A and l A in Poisson distribution Likelihood function Log-likelihood function: If bike flow patterns obeys just one l Poisson distribution in BSS Hidden variable g. In the log-likelihood function, there is a sum in the logarithm. So add hidden variable g; if x t from type of sample A, then g t,A = 1, g t,B = 0..., g t, C = 0 means (y t , 1, 0, ..., 0) 3(p C B(x i ; l)) g i, C = (p A B(x i ; l)) 1 3(p B B(x i ; l)) 0 3(p C B(x i ; l)) 0 ð38Þ we define implicit variable g as a K-dimensional binary random variable with only a specific value in its Kdimensional value. When K value is 1, the other elements' value is 0 when p(x 1 ) = Expectation step. Estimation of from which part of Poisson distribution the data x i come from is as follows Expectation Maximization. Estimate the parametric proportions of each, as we know various possibilities of g. In order to find out the proportion of Poisson distribution, we analyze and iterate maximizing Q function, and the most suitable parameter values are obtained to construct the optimization model Derivative of Q function and equal to zero. Bike flow patterns when l (i + 1) when The optimal value of parameter is found, as shown in Figure 8.
Set u(p A , p B , p C , l A , l B , l C ) by EM algorithmŝ Experiments Based on historical parking data, we predicted grid of state and optimized the parameters.

Settings
Our experiment data come from real-world database in Mobike and OFO technology Co., Ltd. The experiments were executed on a computer running Windows 7, MATLAB 2014a, on a Pentium IV, IntelÒ, 1.84 GHz CPU, 4 GB of RAM. First, we use a statistics Spatiotemporal approach via time series analysis and get X Dt = {X in , X out }. Second, we tagged the classification of the BSSs' training data set. Finally, we predicted the cluster of Geohash grid.

Achievements
In this section, we experimentally evaluate the performance of AGC-MSP and PMM-EM in Geohash grid. AGC-MSP is predicted in Geohash-grid stage in next time slot. Multi-grid scenarios. At first, we choose six district of Beijing to divide grid of Geohash. Dongcheng district is divided into 320 Geohash grids, Xicheng district is divided into 355 Geohash grids, Fengtai district is divided into 471 Geohash grids, Haidian district is divided into 1035 Geohash grids, Chaoyang district is divided into 1344 Geohash grids, and Changping district is divided into 1833 Geohash grids. We proposed AGC-MSP in stage prediction, as shown in Figure 9.
Single-grid scenarios. The number of bikes is calculated by the function X in À X out , where positive value is demand of deployed and negative value is redundancy of recycle, as shown in Tables 5 and 6.
Recommended solution strategies R A solution. K is the capacity in R A queuing model system. When the capacity of the queues is K = 1, it means that one commuter searches for sharing bikes for traveling. Under over-demand stage, commuters could choose any transport tool in Geohash grid. In this system, the service rate is m = 30 min/7 bikes = 4.2 min/bike. The bikes arriving rate is l = 5 bikes/h and the service intensity is r = l/m = 1.19.
When the capacity of queues is K 6 ¼ 1, for example, users waiting for a bike, the parameters are shown in Table 7.
In the over-demand stage, queuing system service intensity is r A . 1 in grid R A , the idle probability of sharing bikes is p (A) 0 ' 20%, the probability of the users loss is p (A) k-lost ' 40% (K = 1, 2, 3...), the average of commuters is L A ' 3, the wait time is W A ' 1 h, the number of bikes cannot match the demand of users based on the strong paroxysmal behavior of commuters.

Initial value in WX4G7PT
Stage Þ r 2 + 1 1Àr 2 + 1 = 3.57 r 1Àr À (K + 1)r K + 1 1Àr K + 1 = 2.55 r 1Àr À (K + 1)r K + 1 Thus, commuter should not wait. In R B state, there are several bikes offered to commuters to pick up in grid parking zone. The resources for bike sharing are in imbalance state. Thus, it is phenomenon of ''bikes graveyard'' by social news report.
In R B state, the utilization rate of sharing bikes is inefficient. After several iterations, Markov transition matrix becomes steady-state. Based on the arrival rate l according to a Poisson process, the number of check-in of bikes is m B = 20 bikes/h by the historical data. The remainder bikes are in idle state in R B Geohash grid According to PMM-EM, the optimized parameter is l B = 8 bikes/h. The remainder bikes will be recycled by truck dispatching strategy. 3 The intensity of the queuing model system is r B = 0.4 \ 1. The p (B) 0 value is larger than the p (B) 3 value. The result shows that the users are not required to wait, as shown in Table 8. Table 9 shows that the bikes arrive at the parking platform following Poisson distribution.
In self-balance Geohash grid, the parameters are described as follows: the idle probability is p (C) 0 = 17%, the waiting time for service is 0.09 h, the arrival rate l C = 7 bikes/h, the completed services rate m C ' 5.67, the intensity is r C = 1.23, the number of users waiting in line is L C = 1.24. When the number of counter is S counter = 3/5/7, the parameter indicators are presented in Table 10.
The occasional idle situation of sharing bikes in R C causes the behavior of customer selection to become stable. No case of over-demand or over-supply was observed. The number of bikes can maintain stable selfbalance.

Systems solution
Bike rebalance strategy in grid R A where N A is the optimized number of bike put in R A and N 0 A is the initial quantity in R A . Bikes' rebalance strategy in grid R B where N B is the optimized recycle bikes from R B , N 0 B is the initial quantity in R B , and u is the penalty factor.
BSS rebalance optimized object function is

Result
In the experiments, two models are designed as the classification and prediction frameworks. For the result on bike check-in/out, we compared famous baseline clustering and classify method perceptron, decision tree, k-NN, and k-means in bike-sharing service region. For the result on imbalance problem, we compared AGC-MSP with baseline method: ARMA and GBRT in predicted demand. For the result on rebalance problem, we compared PMM-EM with Kalman filter and HMM in parameter optimization.
Baselines and evaluation method. The models are proposed in our work to solve imbalance and rebalance problems by bike flow patterns prediction and parameter optimization. The methods are ISVM algorithm to solve multi-classification problem and AGC algorithm, based on AGC-MSP and PMM-EM. In order to confirm our models, there are eight approaches that can be compared with the proposed method as follows: Perceptron: the perceptron is an algorithm for supervised learning of binary classifiers.
Decision tree: rules based on variable values are selected to get the best split to differentiate observations based on the dependent variable. Tree models where the target variable can take a discrete set of values are called classification trees. GBRT: gradient boosting regression tree. k-NN: k-nearest neighbor algorithm output depends on whether k-NN is used for classification or regression.
In k-NN regression, the output is the property value for the object. This value is the average of the values of k-NNs. Geographical grid: It means that the city was divided into several grids. k-means: k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. ARMA: This method is used to predict the stage of bikes in future time series. It helps to find imbalance stage based on the result. Kalman filter: It is an algorithm that uses a series of measurements observed over time.
HMM: In HMMs, the state is directly visible to the observer, and therefore, the state transition probabilities are the only parameters. The state is not directly visible, but the output dependent on the state is visible.
In order to evaluate the realistic scenario of BSS, the test data set is divided two parts: sequential hour slots and anomalous time slots.
In this paper, we proposed AGC, ISVM, AGC-MSP, and PMM-EM approaches compared with six baselines method. AGC is based on density-based clustering, ISVM is based on classification algorithm, AGC-MSP is based on classification and regression analysis, and PMM-EM is based on machine learn.
Evaluation metrics: To evaluate the performance of method, we adopt RMSE, RMSLE, error rate, precision, recall, and F1 which are widely used to measure accuracy of classification and regression algorithm F1 À score = 2 precision3recall precision + recall ð61Þ x 0 in and x 0 out are the bikes of the check-in and check-out from Geohash-grid cluster G i during Dt where X G i are the bikes of the check-in or check-out from Geohash-grid cluster G i during Dt.
In machine learning, clustering is unsupervised learning approach. It is difficult to evaluate the result of clustering. What is clusters number n? We usually chose the number of clusters by knowledge and experience. In our experiments, we cluster all parking points separately in Mobike and OFO BSSs.

Result on clustering
For simplicity and rapid modeling, we have been inspired by the most popular method of density-based spatial clustering of applications with noise (DBSCAN) and OPTICS. In the bike flow pattern clustering, we apply the idea of AGC in state label renewal. According to the number of bikes' check-in/out, we compare k-means (k = 3) and geographically constrained label propagation (GCLP) with the geographical grid. 36 When the demand of imbalance problem in BSS is predicted, we should be able to cluster bikes' checkout/in regions. k-means is the baseline clustering algorithm by distance of object. But in dockless pattern of bike sharing, the point of parking is changing with Dt. GCLP considers geographic constraint and label propagation based on popular community detection algorithm. Table 11 reveals that the time complexity is better than GCLP. Therefore, we adopted AGC for the next step.

Result on label classification
In order to figure out the stage of bike flow pattern for each region (grid or dock station), the initial parking data will be transformed to {x out iDt ,x in iDt } by Dt. After preprocess, we adopt simply bi-classification method. To make it fair, we change multi-classifiers to binary classifiers. To evaluate the rebalance by parameter optimization, RMSE is evaluated as the prediction performance. In this paper, k = 2 is (R A and R C ) or (R B and R C ) binary classifiers problem and k = 3 is multi-classifiers problem. We used ISVM to compare with regression tree, k-NN, perceptron, and k-means. Figure 10 shows that when k = 2, perceptron is famous with high accuracy and when k = 3, our ISVM is very suitable for dealing with multi-classification problems.

Result on predicted patterns
We proposed AGC-MSP compared with ARMA, ANN, and Kalman filter predicted algorithm.
In other words, the proposed AGC-MSP based on density-based clustering and Markov models can improve the prediction performance. The predicted over-demand state results are shown in Table 12.
Based on the AGC-MSP results, we predict the check-out/in stage by ARMA and GBRT. We choose two type of stages for AGC in our experiment: one is over-supply stage and the other is over-demand stage. The performances of ARMA are much better than GBRT. In addition, in all the hours, GBRT is less affected by time factor. We proposed that AGC-MSP is more accurate than ARMA and GBRT obviously, as shown in Figures 11-13. Because of sample of time window, the accuracy of AGC-MSP depends on stable Dt and prior knowledge for historical data.

Result on parameter optimization
We proposed PMM-EM's parameter optimization compared with HMM and Kalman filter. In anomalous time series, PMM-EM performance was calculated using RMSLE and error rate.
In summary, PMM-EM based on the result of AGC, ISVM, and Markov state transition matrix is much better than baseline method. In time complexity, the PMM-EM model need to achieve stable transition by initialized values iterated as p L and l L . Like pre-classified items in clustering, these sets are often created by expert human. Fortunately, we proposed queuing model, in which the bikes' flow obey Poisson distribution. We can choose l L by historical data. Based on prior knowledge, we can find the optimal parameter in the convergence state as quickly as possible. Base on PMM-EM algorithms, the performance of RMSLE is 0.349 in OFO dataset. The result of Mobike dataset 0.371>0.349, Therefore, in OFO dataset, PMM-EM performs is better than Mobike dataset result, show as in Tables 13 and 14.

Conclusion and future works
Bike sharing is a means of transportation that provides services to residents through mobile Internet, LBS, ecommerce, and other technologies. With the development of the market and the increasing number of customer groups, a huge amount of data has been generated. Different researcher have different attitude toward BSSs. With the development of deep learning and ANN, the traffic dispatching that has become hot spot in International conference. However, those algorithms are not suitable for our framework. The main reason is perspective of feature and factor. For instance, genetic algorithm, simulated annealing, and the heuristic algorithm are well performance in nondeterministic polynomial time (NP)-hardness problem, but our approach focus on global features. In future research, we should address the problem of heuristic algorithm, such as particle swarm optimization optimized BSS. Besides the optimization, deep learning differential privacy is a newest research area. Therefore, commuter should be prevented from being tracked and intercepted. Largescale data sets are shared, and the law of user behavior is examined. The overall arrangement of sharing bikes is continuously optimized. It provides a reference for traffic planning of urban public service and alleviates urban congestion. It provides a business siting scheme which pushes media advertising and recommends commodities.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.