Delay-reliability-aware protocol adaption and quality of service guarantee for message queuing telemetry transport-empowered electric Internet of things

Message queuing telemetry transport has emerged as a promising communication protocol for resource-constrained electric Internet of things due to high bandwidth utilization, simple implementation, and various quality of service levels. Enabled by message queuing telemetry transport, electric Internet of things gateways adopt dynamic protocol adaptation, conversion, and quality of service level selection to realize bidirectional communication with massive devices and platforms based on heterogeneous communication protocols. However, protocol adaptation and quality of service guarantee in message queuing telemetry transport-empowered electric Internet of things still faces several challenges, such as unified communication architecture, differentiated quality of service requirements, lack of quality of service metric models, and incomplete information. In this paper, we first establish a unified communication architecture for message queuing telemetry transport-empowered electric Internet of things for adaptation and conversion of heterogeneous protocols. Second, we formulate the quality of service level selection optimization problem to minimize the weighted sum of packet-loss ratio and delay. Then, a delay-reliability-aware message queuing telemetry transport quality of service level selection algorithm based on upper confidence bound is proposed to learn the optimal quality of service level through dynamically interacting with the environment. Compared with single and fixed quality of service level selection strategies, delay-reliability-aware message queuing telemetry transport quality of service level selection can effectively reduce the weighted sum of delay and packet-loss ratio and satisfy the differentiated quality of service requirements of electric Internet of things.


Introduction
Electric Internet of things (EIoT) can provide significant support for the intelligence, digitalization, and transparency of power grid through timely collecting the operation parameters, including voltage, current, as well as active and reactive power, and transmitting them to the cloud platform for processing and analysis. 1 In EIoT, the communication devices produced by different manufacturers utilize multiple communication protocols for data transmission and information interaction. 2 Typical communication protocols in EIoT include message queue telemetry transport (MQTT), data distribution service (DDS), constrained application protocol (CoAP), hypertext transfer protocol (HTTP), etc. DDS is commonly used for state monitoring in EIoT. 3 CoAP is particularly suitable for services like meter reading management and load forecasting. 4 HTTP is applicable for high-performance devices with large computing and storage resources in EIoT. 5 MQTT is suitable for lightweight data transmission of gateways due to the characteristics of high bandwidth utilization and simple implementation. 6 The gateway can achieve the adaptation and conversion of different protocols to MQTT. Through the information interaction between gateways based on MQTT, the connectivity and interoperability among different devices can be achieved, which shields the differences among various protocols.
QoS guarantee is of vital importance in the process of data transmission between gateway and platform in EIoT. 7,8 MQTT provides three quality of service (QoS) levels, that is, at most once (QoS0) level, at least once (QoS1) level, and exactly once (QoS2) level, 9 which provide different QoS guarantees in terms of transmission delay and packet-loss ratio. Specifically, the transmission delay of QoS0 is relatively lower but the packetloss ratio is higher, while QoS1 and QoS2 achieve no packet loss at the expense of increased transmission delay. Moreover, QoS1 guarantees that the data packet is successfully transmitted at least once, and QoS2 ensures that the data packet is successfully transmitted exactly once by leveraging more complicated retransmission mechanism. Therefore, it is necessary to dynamically and intelligently select MQTT QoS levels for data transmission between gateway and platform according to the time-varying network state and QoS requirements in EIoT. 10 However, the dynamic MQTT QoS level selection still faces some challenges, which are summarized as follows. First, the QoS requirements of control services and acquisition services differ in terms of delay and reliability. [11][12][13] However, the different metrics are contradictory, for example, adopting retransmission mechanism ensures lower packet-loss ratio but greatly increasing transmission delay. Therefore, it is a critical challenge to achieve a balanced trade-off among different QoS metrics. 14 Second, the current delay and packet-loss ratio models do not take the impact of protocol-specific QoS guarantee mechanism on the physical-layer performance into consideration. Therefore, deriving the accurate closed-form models of delay and packet-loss ratio which are adaptive with MQTT-specific QoS levels is challenging. Last but not least, due to network resource limitation and prohibitive signaling overhead, the global state information (GSI), for example, channel gain, is uncertain. [15][16][17] Therefore, it is necessary to intelligently optimize MQTT QoS level selection under incomplete information. 18 There exist some works that have addressed MQTT QoS level selection problems in IoT. Sadeq et al. 19 proposed a QoS approach for IoT environment utilizing MQTT and designed a flow control mechanism to minimize the transmission delay. Niruntasukrat et al. 20 proposed an authorization mechanism for MQTTbased IoT service platform to minimize delay and message overhead. However, these works have not considered the joint optimization of delay and packet-loss ratio. Lee et al. 21 proposed a push notification service network utilizing MQTT protocol to minimize the packet loss and delay by selecting appropriate QoS level according to different payloads. Nurwarsito et al. 22 proposed a communication architecture using MQTT protocol for emergency vehicles which aims to minimize the packet loss and average delay. However, the above-mentioned works have not considered uncertain GSI in practical EIoT application scenarios. Weerasinghe et al. 23 proposed an MQTT-based localization mechanism for wireless sensor network by utilizing supervised learning. Ahmadon et al. 24 proposed a machine learning-based anomaly detection method for MQTT-based network. However, these works need offline scene data, which cannot adapt to the complex environment in EIoT. 25 Reinforcement learning provides a powerful tool to deal with sequential decision problems under incomplete information. [26][27][28] Among various reinforcement learning algorithms, upper confidence bound (UCB) originally developed for the multi-armed bandit (MAB) problems has rapid convergence speed and wellbalanced trade-off between exploitation and exploration. Zhou et al. 29 proposed an energy-aware and data backlog-aware UCB-based channel selection algorithm, which can improve energy efficiency and throughput. However, the delay and reliability are not taken into account. Endo et al. 30 proposed a distributed QoS-UCB channel selection algorithm considering channel rating quality, which can improve the reliability and reduce the delay while avoiding congestion. However, this work has not considered the complex communication environment in EIoT and MQTT-specific QoS level selection optimization.
Motivated by the aforementioned challenges, we propose a delay-reliability-aware protocol adaption and QoS guarantee method for EIoT based on reinforcement learning. First, considering the adaptation and conversion of heterogeneous protocols, we establish a communication architecture of EIoT based on MQTT.
Second, we propose a delay-reliability-aware MQTT QoS level selection (DR-MQLS) algorithm based on UCB to minimize the weighted sum of packet-loss ratio and delay. Last but not least, simulations are carried out to validate the effectiveness of DR-MQLS. Compared with single and fixed QoS level selection strategies, DR-MQLS can effectively reduce the weighted sum of packet-loss ratio and delay and satisfy the differentiated QoS requirements in EIoT. We summarize the main contributions of this work as follows: Intelligent QoS Guarantee under Incomplete Information: DR-MQLS enables gateway to interact with environment and learn the optimal QoS level selection based on UCB under incomplete information. DR-MQLS can realize intelligent QoS guarantee with only local information. Delay and Reliability Awareness: The closedform models of delay and packet-loss ratio for three MQTT-specific QoS levels are derived. The optimization objective is defined to minimize the weighted sum of packet-loss ratio and delay. DR-MQLS can achieve delay and reliability awareness by selecting the MQTT QoS levels according to the specific QoS requirements of EIoT services. Extensive Performance Evaluation: Extensive simulations are carried out to demonstrate the effectiveness and reliability of DR-MQLS. Specially, the effects of various parameter settings, such as the signal-to-noise threshold and the weight of delay in the optimization objective, have been illustrated to provide guidance for practical application.
The remaining structure is as follows. In section ''System model and problem formulation,'' we describe system model and problem formulation in details. The proposed DR-MQLS algorithm is introduced in section ''Delay-reliability-aware MQTT QoS level selection in EloT.'' Section ''Simulation results'' provides simulation results. In section ''Conclusion,'' we summarize this article.

System model and problem formulation
The considered communication architecture of EIoT based on MQTT is shown in Figure 1, 31,32 which consists of an MQTT broker server, a cloud platform, multiple EIoT devices, and multiple gateways. The gateways with protocol adaption and conversion functions adopt publish/subscribe pattern for information interaction with cloud platform and can act as both publishers and subscribers. The broker server acts as an intermediary for data transmission between publishers and subscribers, which is deployed on the cloud platform. The publisher notifies the broker server with topics which it tends to publish. Then, the broker server keeps the topics and pushes them when subscribers ask for relevant topics. Multiple communication protocols are used for data transmission between EIoT devices and gateways, for example, HTTP, CoAP, and DDS. Through parsing and repackaging protocol messages, the gateway achieves the conversion between multiple protocols and MQTT protocol. An example is shown in Figure 1. The broker server pushes the subscribed topic and transmits the related data to the gateway based on the transmission mechanism specified by MQTT QoS1 level. Then the gateway executes protocol adaption and conversion to repackage protocol messages based on DDS, CoAP, and HTTP and transmits the data to the corresponding EIoT devices.
We assume that there are I large packets to be transmitted, and the set is I = f1, . . . , i, . . . , Ig. Each large packet consists of J small packets, and the set is J = f1, . . . , j, . . . , J g. The selected QoS level remains unchanged within a large packet but varies across different large packets. Denote the three QoS levels as m = 0, 1, 2, respectively. Define x m i 2 f0, 1g as the MQTT QoS level selection variable, where x m i = 1 represents that the mth QoS level is selected by the ith large packet, and x m i = 0 otherwise. We assume that channel state remains unchanged during small packet data transmission process but varies across different small packets. 33 In particular, each retransmission is considered as a small packet transmission process for QoS1 and QoS2 which adopt retransmission mechanisms. The channel gain 34,35 of the nth transmission of the jth small packet of the ith large packet is given by where H i, j, n represents the channel frequency response. N 0 represents the noise power. e i, j, n is the electromagnetic interference power. Since each message in QoS0 level is only transmitted once, we define n = 1 in QoS0 level. Figure 2 shows MQTT data transmission processes of three QoS levels. The packet-loss ratio and delay models of the three QoS levels are elaborated in the following.

QoS0 level
QoS0 provides best-effort delivery of the PUBLISH packet. After the gateway sending the PUBLISH packet to the broker server, the transmission process is completed immediately, regardless of whether the broker server receives the packet. Therefore, although the transmission delay of QoS0 is low, the packet-loss ratio is relatively high under poor channel states.
Packet-loss ratio model. QoS0 level for data transmission has only one PUBLISH packet transmission process. Therefore, the packet-loss variable of the jth small packet of the ith large packet in QoS0 level is given by where G th represents the signal-to-noise ratio threshold. P represents the transmission power. If the current signal-to-noise ratio PG i, j, 1 is lower than the threshold G th , the PUBLISH packet of the jth small packet of the ith large packet is lost, that is, a 0 i, j = 1. Otherwise, a 0 i, j = 0. Therefore, in QoS0 level, the packet-loss ratio of the ith large packet is given by Here, a 0 i represents the number of lost packet of the ith large packet, which is given by Delay model. The transmission delay of the jth small packet of the ith large packet in QoS0 level is given by where U (i) represents the packet size of each small packet of the ith large packet. B represents the bandwidth.
The total delay of the ith large packet is given by QoS1 level QoS1 adopts a PUBACK packet to acknowledge the reception of the PUBLISH packet. If the PUBACK packet is not received by the gateway within a certain time, the PUBLISH packet is retransmitted. In this case, the PUBLISH packet is received at least once at the broker server. The data deduplication process is required to delete the duplicate packets at the expensive of a certain data processing delay. 36 Therefore, the packet-loss ratio in QoS1 level is zero, but the transmission delay and data deduplication delay are relatively high.
Packet-loss ratio model. Since QoS1 adopts retransmission to ensure successful data transmission, the packet-loss ratio of the ith large packet is Q 1 i = 0, 8i 2 I.
Delay model. There are two transmission processes in QoS1 level, that is, PUBLISH packet transmission and PUBACK packet feedback. When the above two processes are successful, the transmission process of a small packet is completed. Define a 1 i, j, n 2 f0, 1g as the nth transmission result variable of the PUBLISH packet of the jth small packet of the ith large packet in QoS1 level. If the current signal-to-noise ratio PG i, j, n is lower than G th , the nth transmission of the PUBLISH packet is lost, which is denoted as a 1 i, j, n = 1. Otherwise, a 1 i, j, n = 0. Therefore, a 1 i, j, n is given by We define b 1 i, j, n 2 f0, 1g as the nth transmission result variable of the PUBACK packet of the jth small packet of the ith large packet in QoS1 level. P back is defined as the transmission power of the feedback PUBACK packet. G i, j, n, back is defined as the channel gain of packet feedback. If the feedback signal-to-noise ratio P back G i, j, n, back is lower than G th , the nth transmission of the PUBACK packet is lost, which is denoted as b 1 i, j, n = 1. Otherwise, b 1 i, j, n = 0. Therefore, b 1 i, j, n is given by Then, the transmission delay of the jth small packet of the ith large packet in QoS1 level is given by where N i, j represents the total number of transmissions for delivering the jth small packet of the ith large packet. U ACK represents the packet size of each PUBACK packet. t 0 represents the maximum waiting time for the feedback packets. The first term indicates the transmission delay when the PUBLISH packet retransmission is failed. The second term indicates the transmission delay when the PUBACK packet transmission is failed. The third and fourth terms indicate the transmission delay when the PUBLISH packet and the PUBACK packet are successfully transmitted, respectively.
In order to simplify the model, we assume that the data deduplication delay of different small packets is uniformly defined as t c . 37 Therefore, the data deduplication delay of the jth small packet of the ith large packet is given by The total delay of the ith large packet in QoS1 level is given by QoS2 level QoS2 ensures that messages are delivered exactly once through two interaction processes by means of PUBLISH, PUBREC, PUBREL, and PUBCOMP packets. In the first interaction process, after the gateway sending the PUBLISH packet to the broker server, if a PUBREC is not received within a certain time, the PUBLISH packet will be retransmitted until the PUBREC packet is successfully received. If a duplicate PUBLISH packet is received at the broker server, it will be deleted immediately. In the second interaction process, when receiving the PUBREC packet, the gateway responds to the broker server with a PUBREL packet and waits for the feedback PUBCOMP packet. Similarly, if the PUBCOMP packet is not received within a certain time, the PUBREL packet will be retransmitted until the PUBCOMP packet is successfully received. Therefore, the QoS2 level ensures that each packet is successfully received without duplication.
Packet-loss ratio model. Since QoS2 also adopts retransmission to ensure successful data transmission, the packet-loss ratio of the ith large packet is Q 2 i = 0, 8i 2 I.
Delay model. There are four processes, that is, PUBLISH packet transmission, PUBREC packet feedback, PUBREL packet transmission, and PUBCOMP packet feedback in QoS2 level. When the above processes are successful, the transmission of a small packet is completed. The PUBREL packet will be transmitted only after the PUBREC packet is successfully fed back.
In the first interaction process, we define a 2 i, j, n 2 f0, 1g and b 2 i, j, n 2 f0, 1g as the transmission result variables of the PUBLISH packet and the PUBREC packet, respectively. In the second interaction process, we define a 2, REL i, j, n 2 f0, 1g and b 2, COMP i, j, n 2 f0, 1g as the transmission result variables of the PUBREL packet and the PUBCOMP packet, respectively. The judgment of these variables is similar to that of QoS1 and will not be repeated here.
Therefore, the transmission delay of the jth small packet of the ith large packet in the first interaction process in QoS2 level is given by where U REC represents the packet size of each PUBREC packet. N i, j, 1 represents the total number of transmissions for delivering the jth small packet of the ith large packet in the first interaction process. The first term indicates the transmission delay when the PUBLISH packet retransmission is failed. The second term indicates the transmission delay when the PUBREC packet retransmission is failed. The third and fourth terms indicate the transmission delay when the PUBLISH packet and PUBREC packet are successfully transmitted, respectively. The transmission delay of the jth small packet of the ith large packet in the second interaction process in QoS2 level is given by where U REL and U COMP represent the packet size of each PUBREL packet and PUBCOMP packet, respectively. N i, j, 2 represents the total number of transmissions for delivering the jth small packet of the ith large packet in the second interaction process. Since the packet transmission mechanisms of two interaction processes are similar but only the transmitted packets are different, the explanation of formula (12) is omitted here.
Since there is no data deduplication process in QoS2 level, the total delay of the ith large packet is the sum of the transmission delays of the two interaction processes, which is given by

Problem formulation
To solve the differentiated QoS guarantee problem in EIoT, the optimization objective is defined to minimize the weighted sum of packet-loss ratio and delay under the QoS level selection constraint. The optimization problem is formulated as s:t: C 1 : x m i 2 f0, 1g, 8i 2 I, 8m 2 f0, 1, 2g where V represents the weight of delay in the optimization objective. C 1 and C 2 guarantee that only one QoS level can be selected during the transmission process of each large packet.

Delay-reliability-aware MQTT QoS level selection in EloT
Problem transformation MAB is an efficient reinforcement learning tool to cope with the sequential decision problems under incomplete information. 38 It describes a sequence of explorationexploitation decision-making processes. 39,40 The MAB model is mainly composed of decision makers, arms, and rewards. 41 In each round, the decision maker selects an arm, and the selected arm will generate a reward. 42 The decision maker aims to maximize its reward by exploiting the empirically optimal arm or exploring non-optimal arms. In this paper, we transform P1 into a MAB problem. The decision maker, arm, and reward are modeled as follows: Decision Maker: Gateways are defined as the decision makers. Arm: The three QoS levels of MQTT protocol are abstracted as arms, that is, m 2 f0, 1, 2g. Reward: The reward of selecting the mth QoS level is defined as the reciprocal of the weighted sum of packet-loss ratio and delay, which is given by The proposed DR-MQLS algorithm DR-MQLS estimates the reward based on historical observations and considers estimation uncertainty through the confidence bound based on UCB. 43 Therefore, the gateway estimates its preference 44 toward mth QoS level as Here, u m iÀ1 represents the empirical performance of the mth QoS level, which promotes the gateway to select the QoS level with the best cumulative performance up to the ith large packet transmission. k m i represents the times that the mth QoS level is selected when the ith large packet is transmitted. v is the weight of exploration. The second item represents the confidence bound, which ensures that the gateway can explore QoS levels with less number of selections in order to improve the accuracy of estimation.
Then, the gateway selects the QoS level with the maximum estimation value, which is denoted as Select three QoS levels sequentially and obtain the initial rewards. 7: end for 8: Phase 2: Estimation and QoS selection 9: for i = 4 : I do 10: Calculate the preference of the gateway toward the mth QoS level as equation (16). 11: Select the optimal QoS level m Ã as equation (17). 12: Phase 3: Learning 13: for j = 1 : J do 14: Observe the packet-loss result and transmission delay of each small packet. 15: end for 16: Calculate the packet-loss ratio and delay of each large packet. 17: Calculate the reward as equation (15). 18: Update u m i and k m i as equations (18) and (19).

19: end for
The implementation procedure of the proposed algorithm is summarized in Algorithm 1, which is divided into three phases, as follows: 1. Initialization: Initialize all the indicator variables as zero, that is, Then, select three QoS levels sequentially for the first three large packet transmissions to obtain the initial rewards.

Estimation and QoS Selection:
The gateway calculates its preference toward the mth QoS level as equation (16) and selects the optimal QoS level m Ã as equation (17). 3. Learning: The gateway observes the packet-loss result and transmission delay of each small packet. Then the packet-loss ratio and delay of each large packet are calculated. Finally, calculate the reward as equation (15), update u m i and k m i as equations (18) and (19).

Complexity analysis
The

Simulation results
In this section, we validate the performance of DR-MQLS through simulations. The single and fixed QoS level selection strategies, that is, only selecting a specific QoS level for data transmission, for example, QoS0, QoS1, and QoS2, are used for comparison. We assume that there are a total of 800 large packets to be transmitted. The channel gain is randomly distributed within ½3, 5 dB in the first 200 large packet transmission, and randomly distributed within ½4, 11 dB in the next 600 large packet transmissions. Since the PUBACK, PUBREC, PUBREL, and PUBCOMP packets only contain 2 bytes of fixed and variable headers, the packet sizes U ACK , U REC , U REL , and U COMP are set as 0.032 kbits. 19 The setting of other simulation parameters is summarized in Table 1. 45,46 Figure 3 shows the weighted sum of packet-loss ratio and delay versus the number of large packet transmission. Simulation result shows that after 200 large packet transmissions, all the curves show the downward trend, and the performance of QoS0 decreases the fastest. The reason is that the packet-loss ratio of QoS0 decreases due to the channel gain improvement after 200 large packet transmissions, while QoS1 and QoS2 are less affected by the channel gain based on the retransmission mechanism. DR-MQLS outperforms the single and fixed QoS level selection strategies of QoS0, QoS1, and QoS2 in weighted sum of packet-loss ratio and delay by 16:06%, 24:46%, and 44:86%, respectively. The reason is that DR-MQLS can dynamically select the optimal QoS level through trading off the exploration and exploitation based on the empirical performance. Table 2 shows the delay versus the number of large packet transmission. Simulation result demonstrates that the delay of DR-MQLS is slightly higher than QoS0. The reason is that there is no retransmission mechanism and deduplication process in QoS0 level. It performs best in terms of delay, but sacrifices the The data deduplication delay t c 10 ms The maximum waiting time t 0 5 ms The weight of exploration v 2 packet-loss ratio as shown in Figure 3. When i = 800, compared with QoS1 and QoS2, the delay of DR-MQLS is decreased by 29:79% and 47:62%, respectively. Figure 4 shows the optimal QoS level selection probability versus the number of large packet transmission. The optimal QoS level selection probability of DR-MQLS converges to 60.10% when the number of large packet transmission reaches 200. After 200 QoS level selections, the optimal QoS level selection probability first decreases and then reconverges to 87.77%. The reason is that DR-MQLS needs to relearn the QoS level selection strategy due to the significant change of channel state. After 200 large packet transmissions, compared with random QoS level selection, DR-MQLS has a significant advantage in the optimal QoS level selection probability due to the interaction and learning ability with the dynamic environment. Figure 5 shows the weighted sum of packet-loss ratio and delay versus G th . Simulation result shows that the weighted sum increases with G th . The reason is that as the threshold increases, the packet-loss ratio of QoS0 gradually increases. In QoS1 and QoS2, to ensure the reliability of data transmission, the number of packet retransmissions also increases, leading to increased delay. DR-MQLS performs the best because it can explore the potential optimal QoS level when the rewards of different QoS levels change with G th . Figure 6 shows the impact of V . Simulation result shows that as V increases, the delay shows a downward trend, while the packet-loss ratio shows the opposite trend. The reason is that as V increases, DR-MQLS lays more emphasis on delay minimization rather than packet-loss ratio reduction. DR-MQLS can dynamically balance the trade-off between packet-loss ratio and delay by adjusting the value of V , so as to satisfy the differentiated QoS requirements of EIoT services. The simulation results can provide a reference for the setting of the weight V in practical applications.

Conclusion
In this paper, aiming at the QoS guarantee problem for EIoT based on MQTT protocol, we proposed a UCBbased delay-reliability-aware MQTT QoS level selection algorithm named DR-MQLS to minimize the weighted sum of packet-loss ratio and delay under incomplete information. Compared with the single and fixed QoS level selection strategies, that is, QoS0, QoS1, and QoS2, DR-MQLS can reduce the weighted sum of packet-loss ratio and delay by 16:06%, 24:46%, and 44:86%. In addition, the optimal QoS level selection probability can converge to 87.77%. For the future investigation, the joint optimization of bandwidth allocation and QoS level selection will be considered to facilitate low transmission delay, reliable data transmission, and high throughput in EIoT.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was financially supported by the Science and Technology Project of State Grid Corporation of China under grant number 52094021N010 (5400-202199534A-0-5-ZN).