A consistency-guaranteed approach for Internet of Things software refactoring

The software architecture of Internet of Things defines the component model and interconnection topology of Internet of Things systems. Refactoring is a systematic practice of improving a software structure without altering its external behaviors. When the Internet of Things software is refactored, it is necessary to detect the correctness of Internet of Things software to ensure its security. To this end, this article proposes a novel refactoring correction detection approach to ensure software security. Control flow analysis and data flow analysis are used to detect code changes before and after refactoring, and synchronization dependency analysis is used to detect changes in synchronization dependency. Three detection algorithms are designed to detect refactoring correctness. Four real-world benchmark applications are used to evaluate our approach. The experimental results show that our proposed approach can ensure correctness of Internet of Things software refactoring.


Introduction
In recent years, the wide adoption of the Internet of Things (IoT) systems and immature IoT technologies pose multiple challenges to the development of IoT software. 1,2 Despite the multitude of IoT software architectures proposed in previous studies, the optimal IoT software architecture has not been found on a global scale, which means that the IoT technology still needs to be optimized. 3 IoT products have provided much convenience to people's lives. Juniper Research predicts that nearly 38 billion devices will be connected to the Internet by 2020. 4 With the increase in IoT applications, the type and quantity of IoT terminal devices increase as well. Therefore, the intelligence and correctness of IoT terminals draw wider attention than before. 5,6 However, because the functions and structures of the IoT terminals are different, some terminal devices will not be able to meet the needs of users. 7,8 Some developers refactor the architecture of IoT software to improve reusability and maintainability. However, the existing refactoring methods may incur a variety of concurrency bugs and lead to changes in behaviors. These problems can also cause the security of IoT software to be compromised. 9,10 In order to avoid the problem of post-refactoring behavior inconsistency, it is necessary to study the consistency detection approaches.
We propose a novel detection approach to detect software security. This approach uses the control flow analysis, synchronization dependency analysis, and data flow analysis to detect the security of the refactoring under the WALA software analysis framework, and detection algorithms are designed for three kinds of problems that are common in software development to ensure the security of IoT software. In the experiment, we refactor the benchmark programs using the Eclipse refactoring tool and use the proposed detection approach to assess the refactored program. The experimental results show that the proposed approach can effectively resolve the security problems.

Related work
This section reviews previous studies on IoT softwarebased methods and refactoring consistency-based methods.

IoT software-based method
IoT security involves several abstraction layers and a number of dimensions. 11 Most security attacks happen at the software level because these attacks are currently the most popular and can affect a large number of devices and processes simultaneously. Most attacks are semantic attacks in data processing. 12 Rebuilding the IoT software is very likely to trigger security threats to the IoT. Therefore, IoT security detection is of great importance. 13 Xu et al. 14 proposed a trajectory privacy-protection scheme based on a trusted anonymous server. Zhang et al. 15 introduced some background knowledge of information security and ongoing challenges to IoT security. Conti et al. 16 introduced existing major security and forensics challenges in the IoT domain and briefly analyzed some papers targeting identified challenges. Xu 17 proposed a method to address the security issues and key technologies in IOT. He elaborated the basic concepts and the principle of the IOT and combined the relevant characteristics of the IOT as well as the international main research results to analyse the security issues and key technologies of the IOT.
IoT has become a popular term around the globe. 18,19 Although IoT systems have brought convenience to users, they also cause huge security risks. 20,21 The risks of IoT software are immeasurable. Problems that occur in IoT software refactoring may lead to changes in user requirements or security vulnerabilities in the software. Therefore, it is important to detect the refactoring of IoT software. 22,23 Security problems related to IoT systems are drawing more and more attention from security experts and government departments. 24,25 Both the business community and relevant governmental departments have put forward necessary security assessment requirements for information systems and IoT systems.

Refactoring consistency-based method
Many previous studies focused on the consistency of software refactoring. Changes in the behavior may cause security problems in the software. Therefore, some researchers proposed refactoring tools and methods. If the time spent using the refactoring tools and fixing the bugs is less than the time doing it manually, the tool is useful. 26 Schafer et al. 27 illustrated several types of behavior changes that may cause inconsistency by current refactoring engines and proposed techniques to make the concurrent programs behavior-preserving. They introduced synchronization dependencies that modeled the ordering constraints imposed by the Java memory model and proved that their techniques yielded a strong behavior-preservation guarantee.
Maruyama et al. 28 presented an approach that tames behavior preservation by introducing the concept of a frame. In order to accommodate individual problems in refactoring, a frame was used to represent the boundary of a stakeholder's concern about the refactored codes. This frame-based refactoring approach preserved the observable behavior within a particular frame and helped programmers distinguish the behavioral changes.
Zhang et al. 29 presented an automated refactoring method among locks at the byte code level. With the promising features of StampedLock, Zhang et al. 30 presented an automated refactoring framework to convert a synchronized lock to a StampedLock. Although many methods are proposed to address software refactoring issues, 31,32 there is still no static analysis method to validate the synchronization dependency of synchronized methods and blocks and to detect the consistency of the refactoring behavior. To this end, we use static analysis methods to create an automated detection tool that can detect the security problems of IoT software.

Motivation
Refactoring is an effective way to improve software efficiency. In this section, we use an example to illustrate the problem of software security, as shown in Figure 1. In Figure 1(a), method v1() first acquires the monitor object B.class and then calls A.m(), which in turn acquires the A.class lock. Similarly, method v2() first acquires the monitor object A.class and then calls A.n(), which acquires the monitor object lock A.class.
In Figure 1(b), we apply the Move method refactoring to move method n() from class A to class B. Moving the synchronized method A.n() to class B leads the method to acquire the monitor object B.class.
Method v2() first attempts to acquire A.class and then B.class. Method v1() acquires the monitor object B.class and then A.class. Hence, refactoring may end in a deadlock.
To address concurrency problems in IoT software, we designed three detection algorithms based on static analysis: deadlock detection algorithm, object reusable detection algorithm, and shared static field detection algorithm.

Approach overview
In this section, we introduce our approach to detect code changes before and after refactoring. The framework of our approach is shown in Figure 2: 1. The input of the refactoring program. We refactor the open-source programs using the Eclipse refactoring tools. The refactored code is  obtained by refactoring a particular method, function, or variable. 2. Static analysis. First, we analyze and compare the control flows before and after refactoring to find the structure that leads to inconsistent behavior. Then, we conduct synchronization dependency analysis to detect synchronized methods or blocks and detect the structures before and after refactoring in which the synchronization dependency changes. Finally, we analyze and compare the data flows before and after refactoring to find the structure that leads to inconsistent behaviors. 3. Detection algorithm. We design three algorithms to detect inconsistent behaviors and software security, including a deadlock detection algorithm, an object reusable detection algorithm, and a static shared field detection algorithm. 4. Generating detection results.

Control flow analysis
Control flow analysis generates a directed control flow graph. Node D represents the basic code block, and D={d 1 , d 2 , ..., d n }, where d 1 , d 2 , ..., d n represents the node. Each node has a set of successor nodes which can be empty, and (d k , d y ) represents the directed edge between nodes.
By comparing the changes of nodes before and after refactoring, we find that the software structure changes because of the refactoring. We define that cfref(d n ): \ exClass, d n , d n + 1 . is the control flow before refactor- is the postrefactoring control flow, where exClass is the class name, d n is the nth node, and d n + 1 is a successor node of the nth node.
We assume a refactoring node changes when the node meets the following conditions: Comparing cfref(d n ) and cfref 0 (d 0 n ) results in an inconsistent structure. When we detect the program in the same class (condition 1), we then execute condition 2. When d n and d 0 n are the same (condition 2), the node information of d n + 1 before and after refactoring is compared. If the node information of d n + 1 has changed, that is, d n + 1 6 ¼ d 0 n + 1 (condition 3), it is considered that the node d n + 1 in the control flow has changed. cfdect(d n ) is the intersection of cfref(d n ) and cfref 0 (d 0 n ) before and after refactoring. If the intersection is empty, it indicates that the control flow information has changed in d n + 1 . We store the node d n + 1 in cfdect(d n ) (condition 4). If cfdect(d n ) is not empty (condition 5), it indicates that the nodes have changed before and after refactoring.
For example, we conduct control flow analysis for Figure 1. The code in line 14 before and after refactoring is the same, but in line 15, A.n() 6 ¼ B.n(), that is, the node corresponding to the 15th row is changed.

Synchronization dependency analysis
Synchronization dependency analysis is to analyze the methods that contain synchronized blocks or methods. Synchronization dependencies occur in the following situations: 1. There is a nested relationship between synchronized blocks; 2. There is a calling relationship between the synchronized methods; 3. Synchronized methods contain synchronized blocks; 4. Synchronized methods are called in the synchronized blocks.
A monitor-enter is an instruction in the synchronized block that acquires a lock, and a monitor-exit is an instruction in the synchronized block that releases a lock. If the lock of the monitor is the current class object, it is a static synchronized method. If the lock of the monitor is an instance object of the current class, it is a synchronized method.
The synchronization dependence edge is defined as follows: Synchronization dependence edge analysis is based on the control flow graph analysis. All nodes include an entry node and an exit node of the monitor on the control flow graph: 1. A control flow graph node, Node b, has an acquire dependence on Node a if Node a corresponds to an acquire action and there is a path from a to b in the control flow graph. In this case, we consider there is an acquire edge between a and b, denoted as a. 2. A control flow graph node, Node a, has a release dependence on Node b if Node b corresponds to a release action and there is a path from a to b in the control flow graph. In this case, we consider there is a release edge between a and b, denoted as b.
Synchronization dependency is also defined as follows: A situation is considered to have synchronization dependency if the following four conditions are met between the synchronized methods and synchronized blocks. Method g() represents that this method contains synchronized blocks, and method f() represents that this method contains synchronized methods: 1. If g(m1) happens before g(m2), g(m2) synchronization depends on g(m1); 2. If f(m1) happens before f(m2), f(m2) synchronization depends on f(m1); 3. If g(m2) happens before f(m1), f(m1) synchronization depends on g(m2); 4. If g(m1) happens before f(m2), f(m2) synchronization depends on g(m1). Table 1 describes the synchronization dependency relationships of Figure 1. In Figure 1, we first access the synchronized block in the method v1() and then access the synchronized method m() in the static class A. Hence, synchronized method m() has a synchronization dependency relationship with the synchronized block, that is, synchronization of the method m() is dependent on the synchronized block in method v1(). Similarly, synchronization of the synchronized method n() is dependent on the synchronized block in method v2(). However, after refactoring, synchronization of the synchronized method n() is dependent on the synchronized block in method v2(). Since the synchronization dependency relationship has changed, the behavior has changed.

Data flow analysis
Data flow analysis is based on control flow analysis. It analyzes the flow direction of data on the execution path of a program. The purpose of data flow analysis is to detect changes in the data flow. The set of nodes D={d 1 , ..., d k }, where d i represents the ith node. The entry node is the start of a data flow graph and the exit node is the end. ]. as the data flow after refactoring. We consider that the refactoring nodes will change when the following conditions are satisfied: When a node d i remains the same before and after refactoring (condition 1), the ith node is performed. If the output data flow of d i is different (condition 2), the node d i of data flow is identified as having changed. dfdect(d i ) represents the intersection of changes of each node of the data flow before and after refactoring (condition 3). If the node intersection is empty, this node has changed, and the node information d i is stored to dfdect(d i ). If the final dfdect(d i ) is not empty (condition 4), it means that there are nodes that have changed before and after refactoring.

The algorithm
In this section, using three examples, we design three detection algorithms to accurately detect security problems.

Deadlock detection
We describe the situation of deadlock threads. Thread A requests acquiring lock L2 while holding lock L1, and thread B requests acquiring lock L1 while holding lock L2. The example program is shown in Figure 3.
In Algorithm 1, the main idea is, first, to acquire the monitor object of the synchronized block and then acquire the pointed address of the monitor object. Finally, if the pointed address of the monitor object in the two different synchronized blocks is the same, we detect a deadlock.
Method doPerformAnalysis is the step to perform the algorithm. javaProject is a Java project that needs to be detected, and basicAnalysisData contains multiple variables for analysis.
Method getSynchronizedClassTypeNames accesses the monitor instruction instructionInfo and acquires instances that meet the conditions. Method populateSynchronizedBlocksForNode calls the method ComparedVariable. If we access the monitor instructions, we will call the method getAccessedField. Method getAccessedField is the core part of the algorithm. We acquire the instruction pointed to instances pointedInstances. This step of the algorithm assigns the pointed address of the instance to i and assigns the pointed address of the next instance to j which is used as the final object of judgment.

Object reusability detection
The object reuse problem is very likely to occur in synchronized methods or blocks when the lock objects are Boolean, Integer, or String objects. For example, a Boolean object has only two values: true and false. If we use a Boolean object as the monitor object, the object may point to the same address and cause problems. In Figure 4, the lock monitor object is a Boolean object in a synchronized block. Because the two constants, Boolean.FALSE and false, represent the same memory location, they are the same synchronized   object, which makes the resources access mutually exclusive.
Algorithm 2 is the object reuse detection algorithm we designed. By detecting the type of a monitor object, we can determine whether the monitor object is a Boolean, an Integer, a String, or other types. If the type is a reusable type, we output the detection result.
Method doPerformAnalysis is the step to execute the algorithm. If the program method is detected to be not empty, we will call the method populateBugInstances to detect the monitor object and assign it to the instance bugInstances to acquire the final reused object.
Method populateBugInstances determines whether the ''acquire'' instruction is a type of reused object. The instruction instruction acquired must be a monitor instruction. We assign a value to reusableLockObject Types by calling the method getReusableLockObject Types. If the reusableLockObjectTypes is consistent with the object reuse type, we return bugInstances.
Method getReusableLockObjectTypes analyzes the instruction to acquire the lock object type. We acquire the pointed address of the monitor instruction monitorInstruction and assign it to the instanceKey. We use the method createReusableChecker to determine whether the instanceKey is a reusable object.

Algorithm 2. Object reusability detection algorithms
Input: javaProject, basicAnalysisData Output: bugInstances BugInstances doPerformAnalysis(IJavaProject javaProject, BasicAnalysis-Data basicAnalysisData) while acquire all node do if node 6 ¼ null then node populateBugInstances(cgNode, bugInstances) end end return bugInstance BugInstances populateBugInstances(CGNode cgNode, BugInstances bug-Instances) acquire instruction if instruction belong to isMonitorEnter then monitorEnterInstruction (SSAMonitorInstruction)instruction reusableLockObjectTypes getReusableLockObjectTypes(cgNode, monitorEnterInstruction) if reusableLockObjectTypes 6 ¼ null then acquire Instruction and bugInstances end end InstancesTypes getReusableLockObjectTypes(CGNode cgNode, SSAMonitorInstruction monitorInstruction) acquire lockPointedInstances for instancesKey in lockPointedInstances do instanceKeyReusableChecker createReusableChecker(instanceKey) add instances that match the reused object to instancesTypes end return instancesTypes Boolean createReusableChecker(InstanceKey instanceKey) if acquire instanceKey type is Boolean then return true else if acquire instanceKey type is String then return true else if acquire instanceKey type is Integer then return true else if acquire instanceKey type is Long then return true; else return false; Method createReusableChecker is the core part of the algorithm to detect the type of instanceKey. If its type is a reusable type such as Boolean type, Integer type, or String type, it is detected that the program has object reuse problems.

Static shared field detection
For software programs, shared resources are subjected to conflicts due to simultaneous access by multiple threads. As shown in Figure 5, they create two instances of the monitor object when two runnable tasks start. In this situation, it locks two instances, separately.
Algorithm 3 is the static shared field detection algorithm. The algorithm acquires all static shared fields and checks whether the field has been modified in the program. If it is modified, it acquires the pointed instance of the field and outputs the detection result.
Method doPerformAnalysis is the step to perform the algorithm. We call the method getAllStaticFields to acquire the static field and store the detected field in staticFields. Method populateAllInstancesPointedByStaticFields acquires the static field pointed to the instance pointededInstances and stores all the static fields pointed to the instance pointededInstances.
Method populateModifyingStaticInstancesMap acquires the modify static instance. If the modified instruction modificationInstruction instance belongs to pointedInstances, we acquire the instance bugInstances of the instruction modifyInstruction.
Method getModifyingStaticFieldsInstructions is the core part of the algorithm. It determines the static field that needs to be modified by calling the method canModifyStaticField. If the field instruction instruction is static, the detection is successful.

Benchmarks
We select four benchmarks to evaluate our refactoring tool. Quark is an open-source tool for developing applications for networked devices based on IoT sensing data.
JGroups is an open-source group communication tool. Apache Mina is a network communication application framework, but it mainly provides a programming model for event-driven and asynchronous operations based on the IoT TCP/IP and UDP/IP protocol stacks. In addition, the Apache Mina-core is a core network application framework and HSQLDB is a small database. Table 2 shows the benchmarks and their respective attributes. The second column represents the total number of classes in the program; the ''Method'' column represents the number of methods in the benchmark; ''Sync'' represents the number of methods that may involve synchronization; and ''No sync'' represents the number of methods not to involve synchronization.
In summary, the result shows that our analysis can search synchronization methods in real-world programs and analyze their synchronization dependencies. All experiments were conducted on a 16-core 2.60 GHz Intel Xeon E5-2650 workstation with 128GB RAM. The workstation ran on Windows 7 operating system with Eclipse 4.5.1 and JDK 1.8.0 installed.

Experimental results and analysis
Experimental results. The refactoring tool Eclipse was used to convert the benchmarks. We evaluated all the benchmarks, except for Mina which only detected the core package Mina-core.
We refactored the software in each benchmark. By executing three detection algorithms, we detected the existing problem in each of the benchmarks. For example, we found deadlock problem in Quark, Mina-core, and HSQLDB. We detected the object reuse problem and static shared field problem in JGroups.
By using the three algorithms, we detected the three problems (such as deadlock, object reusable, and static shared field). The experimental results are given in Table 3. We assessed the number of inconsistencies and detection time in all benchmarks. Detection of inconsistency indicates that the problems can occur in the refactored program. The detection time shows that our tools are efficient in a short time.
Case study. The importance of IoT software is highlighted in the ''Introduction'' section. IoT software is subjected to security problems. In many cases, refactoring does not preserve program behaviors in the presence of concurrency. The new behavior will cause problems that did not exist before refactoring, such as security problems and deadlock. Figure 6 is the benchmark Mina-core, which classifies the synchronized blocks to parent classes AbstractAcceptor. The original program is identified to have no problems. After refactoring, thread A acquires the bindLock lock, which acquires the boundAddresses lock. Thread B acquires the boundAddresses lock, which acquires the bindLock lock.
By using Algorithm 1 to acquire the pointed address of a monitor object of a synchronized block, we found that the pointed addresses i (boundAddresses) and j (bindLock) were the same. We determined that a deadlock occurred after the refactoring and caused security problems of IoT software.

Conclusion
This article presents a detection approach which uses control flow analysis, synchronization dependency analysis, data flow analysis, and three detection algorithms to ensure consistency and security of IoT software. Static analysis analyzes the structure of changes, and the three detection algorithms are used to detect software security problems. The three detection algorithms solve three problems: deadlock, object reuse, and static shared field. In the experiment, we evaluated our approach by four benchmarks, that is, Quark, JGroups, Mina-core, and HSQLDB. Experimental results show that our approach are efficient in detecting existing problems. One possible area of future work would be to explore more complex refactoring detection beyond the field of IoT software. For instance, some advanced refactorings inccur new problems and lead to more challenges in software development. The approach proposed herein is not enough to solve all of the problems, but the concepts and techniques developed in this study are expected to serve as a basis for addressing new challenges.