Dynamic and semantic-aware access-control model for privacy preservation in multiple data center environments

With the rapid development of intelligent perception and other data acquisition technologies in the Internet of things, large-scale scientific workflows have been widely used in geographically distributed multiple data centers to realize high performance in business model construction and computational processing. However, insider threats pose very significant privacy and security risks to systems. Traditional access-control models can no longer satisfy the reasonable authorization of resources in these new cross-domain environments. Therefore, a dynamic and semantic-aware access-control model is proposed for privacy preservation in multiple data center environments, which implements a semantic dynamic authorization strategy based on an anomaly assessment of users’ behavior sequences. The experimental results demonstrate that this dynamic and semantic-aware access-control model is highly dynamic and flexible and can improve the security of the application system.


Introduction
With the development and application of data acquisition equipment and technology in the Internet of things, the joint use of multiple data centers is now regarded as essential for many online services. 1 Simultaneously, data center privacy has also become a focus of attention. For example, Europe's General Data Protection Regulation (GDPR) requires data center users to focus on privacy. 2 Due to the dynamic and heterogeneous nature of multiple data centers, privacy and security are regarded as the most difficult challenges; the misuse of legitimate access to data is a severe information security concern for both organizations and individuals. For example, after the 9/11 attacks, the US government allowed a greater sharing of information among government, public security, military and other departments as a defense procedure against future terrorist strikes. However, this plan leads to a massive leak of sensitive data by low-ranked personnel. 3 From a security engineering viewpoint, this problem is partially due to abuse of privileges that have been granted by authentication or authorization services, both of which may be regarded as functions of access control.
Access control constitutes a fundamental aspect of security and privacy protection. Typically, access control is used to prevent unauthorized users from gaining access to system resources, to prevent authorized users from accessing resources in an unauthorized manner, and to allow authorized users to access resources in an authorized manner. Due to the complex and dynamic nature of the multi-center big data application field, some of these factors cannot be addressed or phrased in terms of traditional access control. In Crampton and Huth, 4 a new access-control architecture was formulated, the realization of which might form part of an overall strategy for addressing the insider problem. In this architecture, trustworthiness and risk-assessment methodologies were combined and extended in traditional role-based access control. Here, a risk-assessment methodology is used to make decisions based on the context of the resource, and the requestor's trustworthiness is defined as the probability of the requestor not abusing the authorization, which could be derived via statistical analysis of the requestor's behavior.
Moreover, cross-center data processing applications are typically implemented as workflows, 5 which renders the access-control-based privacy protection more complicated. A work flow could be represented by a task graph G = (V, E) that consists of a set of tasks that are represented by vertices V. In mass workflow environment systems, tasks are executed sequentially according to the execution order. Therefore, it is impossible to identify unusual entities from a running system that deviate from its normal pattern of behaviors. The use of user access sequences for dynamic access control provides a new approach. 6 However, the current sequence anomaly analysis is still used in the application of system security detection, and little research has been conducted on the application of user access control. In-depth research on abnormal user behavior in access requests for resources or services is lacking. Therefore, a dynamic and semantic-aware access-control (DSAAC) model that is based on sequence anomaly evaluation is proposed by considering the characteristics of the workflow in a multiple data center environment. Through the sequence anomaly evaluation method and by introducing semantic constraints, users and resources in the access-control process can be restricted, and simultaneously, user resource associations can be defined; hence, this process is suitable for dynamic access control in a massiveresource environment.
The rest of this article is organized as follows. Section ''Related work'' presents the concept of the access control and different access-control models. Section ''Our DSAAC model'' presents the scheme and specific algorithm steps of the dynamic access-control model. Section ''Experiments and analysis'' presents experiment and analyzing. Finally, section ''Conclusion'' concludes the article.

Related work
From privacy and security perspectives, access control is one of the most fundamental security methods. Access control allows authorized users to obtain physical or logical access to various resources and ensures the confidentiality and integrity of system resources. 7,8 Earlier, most information systems were small in scale and number of users, simple in business, and corresponding access-control systems mainly adopted the method based on access-control list. Along with the increasing complexity and scale of systems, in order to improve the flexibility of authorization management, role-based access-control (RBAC) model was proposed in combination with the management model of user authorization and role authorization.
Based on the proposed concepts of tasks and workflows, the task-based access-control (TBAC) model has begun to receive research attention, which focuses on the process of user execution and authorizes each requested resource according to the execution status of the current task. [9][10][11] Many scholars have expanded TBAC according to their needs and have proposed such models as the task-role-based access-control (TRBAC) model [12][13][14] and the extended TRBAC model. 15,16 These models are expanded versions of the TBAC model and have realized satisfactory application results in various scenarios. However, extended access-control models of this type still use static policy rules, do not effectively use historical execution information, and cannot dynamically adjust the allocation of permissions according to the historical execution of tasks and other issues. Due to the relative complexity of TBAC in formulating static authorization policies, less research has been conducted on TBAC.
In a massive-resource environment, dynamic authorization is an effective technique for providing dynamic resource authorization by combining user behavior, context, task status, and other attribute information. Dynamic authorization approaches are characterized by the use of not only the policies but also environment features that are estimated in real time to determine access decisions. Among them, the risk-assessment method is an effective dynamic solution for assessing the uncertainty of a user's behaviors in a complex environment to control the insecurity from a requestor. In Zheng and Cai 17 and Diep et al., 18 a model for risk assessment has been proposed, which provides a useful reference for the consideration of context in risk assessment. An access-control model that is based on user behavior, context vulnerabilities, and resource properties is proposed in Bouchami et al., 19 but it focuses only on defining the risk level between collaborative environments. Lakshmi et al. proposed a model for the identification of insider attackers by adjusting the session based on risk assessment. 20,21 Fall et al. 22 also proposed a risk-adaptive authorization mechanism to satisfy the dynamicity in the cloud environment.
In addition, several preliminary studies combined risk and access control in using trust to assign users to roles. In Shaikh et al., 23 a risk-trust authorization mechanism is proposed for assessing user credentials to adjust the access rights dynamically. In Ni et al., 24 riskbased access-control systems that are based on fuzzy inferences are proposed. This study showed that fuzzy inference is a satisfactory approach for implementing a risk-based access-control system.
The current research on dynamic access control mainly focuses on the relationship between the user behavior and the context. In the access-control model that is based on user behavior and context, effective evaluation of user historical data is lacking, and it focuses on the analysis of the current states of users while paying less attention to the attributes of resources.
In workflow systems, a service is a task that is executed over a period and can be formalized into a task sequence. In security research in web and computing environments, anomaly detection in a user behavior sequence has become a hot topic, and remarkable results have been obtained. Sendi et al. 6 uses a hidden Markov model to analyze user commands, to establish a normal behavior archive for the user's command sequence, and to compare the current sequence with normal behavior sequences to determine the similarity, which is used to distinguish legal users from intruders. Sequence mining is often used to mine hidden user behaviors to realize better personalized recommendation and resource allocation. Sequence mining is also used to detect abnormal behaviors of users. As discussed in Xie and Yu, 25 a hidden Markov model is used to describe the browsing behaviors of web users and to detect abnormal behaviors. Zhou et al. 26 proposed a method for user behavior anomaly detection that is based on data stream sequence mining. By studying the sequence relationships between subsequences, a user behavior anomaly was discovered, and the common problems of low delay and low accuracy in sequence anomaly detection algorithms were overcome. Sequence anomaly detection can provide lowlatency and high-efficiency detection if the amount of data is large. However, in access-control technology, few studies that are based on sequence anomaly detection have been conducted. The user's behavior is a chronologically ordered set of observation records. A method that is based on sequence anomaly detection is introduced into access control, which can well realize dynamic and flexible authorization in a multiple data center environment. Thus, we study how to introduce the method of sequence anomaly detection into the access-control process and how to improve the available sequence pattern mining and detection algorithm so that it can provide dynamic authorization management in the TBAC scenario and protect the privacy.

Our DSAAC model
In this DSAAC model, users must pass the identity authentication of the system first. Then, the request in the workflow process is formally modeled as a behavior sequence, and the user's permission for each step of the task execution is decided via the method of sequence anomaly detection. The scheme is illustrated in Figure 1. First, characteristic attributes and historical behavior requests are extracted from the request object. The characteristic attributes include subject attributes, object attributes, and other related attributes of the request. We model the historical behavior request using a serializing model. Then, we can authorize user access to the system automatically if the current behavior sequence has a high likelihood of being a normal sequence or warn the administrator if the current behavior sequence has a high likelihood of being an abnormal sequence (risk assessment). Finally, administrators can review the decision results, add the positive sample to the sample library for semantic sequence pattern mining, update the sequence pattern library, and provide normal behavior sequence patterns for the sequence anomaly detection module. Next, we will introduce behavior sequence modeling, sequence pattern mining, and sequence anomaly detection in detail.

Behavior sequence modeling
The modeling of a behavior sequence is the representation of the user's request in a standard behavior sequence format. We represent a request as a behavior sequence via formalization, pruning, and merging ( Figure 2). Then, we can identify and use this standard behavior sequence in the pattern mining and anomaly detection process.
Basic unit definition. The most basic unit in the behavior sequence is requests. Requests are also associated with subjects, resources, and tasks, among others. We divide each request into a triple where request R i denotes the user's request, which includes all attributes that are related to the request; subject S denotes the originator of the request and is an active entity that initiates access to resources, which is typically a user or a program that is executed by a user; object O denotes the resources to be protected, which include various hardware and software resources; and operation P denotes the operations of the subject on the object, such as reading and writing.
Standardizing the sequence is equivalent to standardizing the sequence R to sig(R i ). As expressed in formula (1), f (s) represents the standardized function that is defined on the subject S, g(P) is the standardized function that is defined on the operation P, d(O) is the standardized function that is defined on the object resource O, and f (s), g(P), and d(O) are formulated by relevant authority A task is composed of a standard R i and a semantic c, and a set of tasks includes the formal description of the request and the semantic set. A single task is defined as follows where behavior sequence Fu is composed of a series of tasks t and n represents the length of the behavior sequence.
We establish a sequence pattern library according to the behavior sequence. The sequence pattern library is a database that contains many normal sequence patterns Semantic-related definition. Due to the diversity needs among the types of businesses in various systems, the modeling of basic units alone cannot satisfy the expressions regarding the tasks, behaviors, and resources. For example, in a public security system, the staff must specify a file number to query a case. The file number is distributed according to the level and grouping of the staff, and it is private attribute of the staff; it has no relation with the forward step. This file number is semantic information. Therefore, we introduce semantic-related definitions. When matching sequence patterns, semantic constraints are used to check the legality of current behaviors with resource operations. We also use semantic information to guide risk assessment when mining sequence patterns. The following describes the establishment of a semantic-related model.
Semantic recognition. The main objective of semantic recognition is to identify semantics. In special tasks, we must attach semantic information to the tasks. We typically use a URL to represent a task; thus, specified semantic information can be appended through URL parameters. In a user request log, a request record is of the following format: A request record can be sliced to obtain recognizable information according to the preset regular rule ''/ id=([0-9a-z] + )&iaction=(\w+)&ast=(\w+)/'', and we can obtain triple{S, O, P} as [id:1407246132s7jn1j8b","action:view","ast:0f"] Semantic variables. The semantic variable is defined as a quad (X , S(X ), U , G), where X represents the variable name; S(X ) is a set of items of X , where each item is a fuzzy variable that is represented by s and each item is in the range of the field U of the basic item u; and G is the syntax rule for generating the item s of X , as shown in formula (2) where s denotes the meaning of the item, d denotes the formalized representation of the item after blurring, and the value of d is in the range u(u & U ). These grammar rules can be used, for example, to model, divide classes, define attributes, and define sub-attributes.
Semantic rules. A semantic rule is a regular sentence. A sentence is composed of elements of an alphabet P , namely, x 2 P Ã , where x is called a sentence on P . A sentence contains \noun phrase., \verb phrase., \noun phrase., and [\noun phrase., \value phrase.] elements. Formally, a semantic rule is a process of dividing a noun phrase and a verb phrase in a sentence and expressing their relationship using symbols.
For example, consider sentence ''User A accesses resource B with semantic constraint person_job_family 1.'' The process is as follows.
Use the relevant semantic rules to check the semantics of the statements that are processed above, as expressed in formula (3) in which X b represents resource b's semantic variable and can be expressed as X b = fs 1 , . . . , s i , . . . , s n g, where an item that represents the resource text is generated via relevant grammar rules and can be used to describe semantic information such as the type of resource b, the domain in which it is located, the level, the operation restriction, or the associated resource. The semantic representation of a user can describe the user's group or the user's domain, for example. The semantic of the access operation can describe the sensitivity of the operation, among other properties.
Semantic constraints. Semantic constraints are related constraints from the perspective of resources, such as inclusive relationships, functional relationships, mutually exclusive relationships, self-reflexive relationships, and self-checking constraints. These constraints are combinatorial conditions that are based on common logical relationships, such as (A and B), (A or B), and (nor A). For example, semantic value A is PERSON_JOB_FAMILY and equals 1; B is PERSON_LOCATION and equals 2. We posit that only when A and B are satisfied simultaneously can they be satisfied normally. Corresponding semantic constraints are defined on each step of the task to constrain the operation of resources. A standard semantic constraint is of the following form: \PERSON_JOB_FAMILY==1 and PERSON_ LOCATION==2.
Sequence modeling process. Based on the above definition, the process of modeling a user behavior sequence will be described below. Each step of the user's request involves the corresponding subject and object and semantic information. Behavior sequence modeling is the extraction of information for anomaly detection from the user's request. The steps are illustrated in Figure 3.
In the figure, a request R i specifies the requested object resource and related information regarding the request and is expressed as sig(R i ). We can obtain standardized semantic environment set c i by consulting relevant standards. The current task and semantic GET/http://market.scau.edu.cn/goods. php?id=1407246132s7jn1j8b&iaction=view&ast=0f Figure 3. Sequence modeling process. environment information are regarded as a task t. Collect historical tasks of this request using a data search engine and construct the task sequence (also named the behavior sequence). The behavior sequence can be expressed as Session information maintenance. Data collection is a prerequisite step for sequence modeling. In this step, we will complete session information maintenance to obtain the forward task. We collect basic subject information and object information that are related to the current request R i first, and we collect historical task data that are related to the current request. There are three main sources of historical data for users: server data, customer data, and intermediate data (proxy server data and packet detection).
Semantic environment extraction. Process the basic information of the request to obtain semantic environment information. For example, a standard network request log format is presented in Table 1. Through semantic constraints, useful subject environment semantic information and semantic annotation of object resources can be extracted from basic information, such as the IP address, date, file type, resource domain, and user agent.
Sequence construction. Sequence construction is the process of merging a single request into a request sequence. In the process of sequence construction, it is necessary to conduct data preprocessing, clean up dirty/noisy data, extract and merge data from various sources, and convert the data into a suitable format. We identify tasks by semantics, and we filter out repeated tasks by pruning and setting time windows. Finally, we merge the tasks to construct the task sequence Fu.

Sequential pattern mining
A sequential pattern mining algorithm is used to realize risk assessment. The advantage of using sequential pattern mining is that it can mine more instructive behavior patterns without requiring security administrators to formulate many complicated policy rules. Moreover, the pattern library will not store behavior sequential patterns that have the same meaning, which can reduce the burden of database storage. We utilize a closed sequence mining algorithm that is based on behavior.
Core strategy in algorithm design. The core strategy of the closed pattern mining algorithm is to expand the s pattern and the I pattern of the newly added sequence, recursively mine the closed sequence pattern with the expanded sequence, extract the frequent set and remove the closed pattern in the closed pattern mining, and, finally, obtain the frequent sequence.
Due to the particularity of TBAC scenarios, the mining of behavior sequence patterns differs substantially from the traditional mining of sequence patterns. In mining behavior sequence patterns in TBAC scenarios, the following three requirements must be satisfied: 1. The closure of patterns. All sequences in the pattern database MFu must satisfy the closed sequence pattern, namely, there are no patterns that have the same meaning. In the behaviorbased closed sequence mining algorithm, the closed sequence pattern must be filtered, and simultaneously, the patterns of the same task must be merged; 2. The first fixed principle. In the process of user execution and access, the first operation is always fixed. Hence, only the suffix is extended in the mode extension; 3. The particularity of behavioral items. In the process of mining, we define tasks as sequential items. The tasks include requester information, requested actions, and requested resources. Information of this type is difficult to handle in traditional sequential pattern mining.
To overcome the problem that the result set is too large and contains many sequences that have the same meaning, the behavior sequence pattern mining algorithm adopts the closure sequence mining algorithm. For example, only closed sequence patterns are mining and identical are filtered out. We use first fixed frequent sequence for expansion, which can increase the mining efficiency. We also use semantic constraints to extend the pattern.  T = ft i , t i + 1 , . . . , t j g, 1 ł i ł m Definition 3. Behavior sequence set D (the transaction set) is expressed as D = fT 1 , T 2 , . . . , T n g. Definition 4. Behavior sequence pattern set A is a set of tasks, and task set T contains pattern set A, A T . K-mode indicates that A is of length k, namely, it contains k tasks (items). I 1 is closed, namely, I 1 is a frequent sequence in the MFu of the pattern database, there is no sequence I 2 that satisfies I 1 as the parent sequence of I 2 , and I 1 and I 2 have the same support degree. Definition 5. The degree of support for pattern A, namely, A:count, refers to the number of transactions in the transaction set that contain the pattern, and jDj represents the total number of transactions in the transaction set sup(A) = A:count=jDj Definition 6. Frequent patterns and frequent itemsgiven a minimum support m in sup, if mode A satisfies A:count ø min sup, we call A a frequent mode. A single item in task set I is called a frequent item if it occurs more frequently than min sup in D's transactions.
Implementation of the sequential pattern mining algorithm. The flow of the sequential pattern mining algorithm is illustrated in Figure 4.
To overcome the particularity of behavior items, semantic constraints are used for guidance in the mining process. Semantic constraints are defined on each task and are constraints on resources. In this algorithm, we separate S-extension and I-extension and check the validity of the extended sequence pattern; the steps are as follows: Step 1. First, we look for frequent items with length k (starting from 1). We focus on the tasks that are related to the request query i = fS, O, Pg in the process of sequence mining, while other information that is specified by the request is used when the pattern expanded.
Step 2. For each 1-frequent sequence b, establish a suffix map database MFu: set a as a sequence pattern in the sequence database S, and map sequence b 0 of b with a as a prefix, namely, b 0 = a [ b.
Step 3. Call the schema extension to the schema database: Step 3.1. First, check the ending condition to determine if there is a backtracking pattern, namely, if the currently expanded pattern is a subset of the existing pattern or the values of the two patterns are the same. If the backtracking condition is satisfied, end the mode extension, return to step 1, and searching for 2-frequent sequences.
Step 3.2. S-mode expansion, such as \(a), (b), (c). to \(a), (b), (c), (d). . When expanding, semantic check is used to check whether it is the same task, for example, for a task t 1 {get, S1} and another task t 2 { get, S2}, since S1 and S2 differ, the traditional mode expansion will regard these tasks as different tasks. t 1 and t 2 can be determined to be the same task via semantic checking; hence, only t 1 items must be expanded.
Step 3.3. I-mode expansion under semantic constraints, such as \(a), (b), (c). to \(a), (b), (c, d).. Unlike the traditional sequence, the subitem in the behavior sequence item corresponds to the description of the request resource and the description of the request attribute; hence, we must check the semantic constraints. For example, for behaviors t 1 and t 0 1 , assuming that all operations are the same but the resource fields and types of the operations differ, we can filter this unreasonable pattern extension based on semantic constraint checking.
Step 3.4. Add the extended schema to the schema database.
Step 4. Continue mining frequent sequences that are of the next length, namely, k + 1.
Simultaneously, to reduce the storage space, we use a pseudo projection database in our algorithm. Projected MFu(P) is P's map database, and MFu is a pointer instead of a physical copy.

Sequence anomaly detection
The normal sequence patterns that are obtained via sequence pattern mining can be used as a pattern library in sequence anomaly detection. Sequence anomaly detection includes pattern matching of sequences and semantic checking of current behaviors. Pattern matching of the behavior sequence is the comparison of the normal behavior sequence of the user with the current behavior sequence pattern to judge whether the current user request is abnormal. Define sup Mfu (Fu) as the support degree of the normal behavior sequence to the current behavior sequence, which has a range of [0, 1]; the larger the sup Mfu (Fu) value, the greater the coincidence between the current behavior pattern sequence and the normal behavior pattern. Define each sequence in the behavior pattern library as a directed tree T (Vp, Ep) in which each node vi 2 Vp represents a request log q i , which is formally expressed as sig(q i ), c i h i , where sig(q i ) denotes request q i and c i denotes semantic information that is related to this request. The edges e ij 2 Ep represent the execution sequence of q i , q j , namely, q j is executed after q i .
The detection process of several request sequences in the pattern library and table is illustrated in Figure 5. The patterns in the pattern library are stored as trees. Figure 5(a) shows the stored partial normal sequence patterns in the pattern library. Each task sequence starts at t 0 . Figure 5(b) shows two normal traffic flow requests, namely, t 0 -t 1 -t 2 -t 5 and t 0 -t 2 -t 3 , and Figure  5(c) shows two abnormal traffic requests, namely, t 0t 2 -t 5 and t 3 -t 6 . Sequence pattern matching uses the sequence similarity and support for calculation.
The support degree of the normal sequence to the current behavior sequence, namely, sup Mfu (Fu), is calculated as the similarity degree between the normal sequence and the current sequence, and the similarity degree depends on the length.
Sequence patterns of equal length, as shown in task sequence t 0 -t 1 -t 2 -t 5 in Figure 5(b), have a similarity of 1. If the length is not equal but the prefix of the normal sequence in the pattern library exactly matches the current sequence, the similarity is also 1; for example, the task sequence t 0 -t 2 -t 3 in Figure 5(b) also has a similarity of 1. Define the similarity between the two cases as expressed in formula (4) The lengths differ, and the prefixes of the current sequence pattern are normal behavior sequences in the matching pattern library, for example, t 0 -t 2 -t 5 . We define the similarity formula in formula (5) Select a set of similar behavior patterns for calculating the support degree of the normal behavior pattern with respect to the current behavior, such as in formula (6), wherein distance(Fu i ½i À Fu½i) is the relevant semantic attribute distance on each behavior node, such as the level of resources or the time distance By combining the length similarity and the semantic attribute distance, we can obtain the support degree for the current mode, as expressed in formula (7), where m is the number of similar behavior modes  Finally, we can grant authorization or not according to the support degree of the normal sequence in the pattern library with respect to the current sequence and the threshold value.

Experiments and analysis
In our experiments, the training and test data sets are obtained from the Amazon access sample data set (https://archive.ics.uci.edu/ml/datasets/), which consists of the visit records for the Amazon website for a week in October 2016. The access is divided into two parts: one part contains the access records of all users, and the other part is the sequence data set after sampling and processing. There are 9,022,000 sequences, the users of these data are identified by serial numbers, and each user ID has a corresponding attribute tag. There are 38,000 normal authorized access sequence records for anonymous users.
This experimental environment is the host of a core i5 processor, with 8GB memory, 256 GB, and Windows 10 operating system.

Performance comparison of mining algorithms
The sequence pattern data-mining model is used to train the legitimate request sequence to obtain the training behavior pattern set. Since the selected training set size and the setting of the minimum support (minsup) threshold affect the accuracy of the training data set, we adjust the minimum support to yield the optimal result during the experiment. We select 20,000 legitimate requests as the training set, and we compare the available pattern mining algorithms (the pattern extension-based mining algorithm prefix and the closure-based mining algorithm clospan). We use various values of minsup (5%, 10%, 15%, 20%, 25%, and 30%) to mine frequent sequence patterns, and we compare the three algorithms in terms of efficiency. As presented in Table 2, the sizes of the training sets that are generated by the three algorithms differ among the support levels, and the training set of the prefix algorithm for mining frequent patterns is very large because there are patterns that have the same meaning in the prefix and there are no restrictions. However, the training sets of clospan and DSAAC are much smaller because the existence of closure patterns is considered in both algorithms, which substantially reduces the sizes of the training sets. Under the previous premises, we compare the running times of the three algorithms, as presented in Figure 6.
The running time of DSAAC is substantially shorter than those of clospan and prefix because we consider the behavior sequence in this DSAAC algorithm, and we identify the same task. By introducing semantic rules for guidance, the algorithm can identify closure sequences quickly; thus, it can reduce the time that is required for pattern expansion and mining of each sequence.
The running time is related to the complexity of the algorithm. Since prefix will conduct repeated mining on each sequence in the mining process, the time complexity is high, whereas clospan only mines frequent sequences under closures; hence, compared with prefix, the running time is reduced by nearly 10 times. Fortunately, our pattern mining algorithm, which is based on the user access request sequence, considers the characteristics of the user sequence and reduces the generation of repeated semantic sequences; hence, it is more efficient than the prefix and clospan algorithms. Therefore, our algorithm realizes higher efficiency in the pattern mining of user access request sequences. In the following experiment, we will evaluate the correctness of this dynamic authorization scheme.
We design sequence rules that are based on the existing sequence and the pattern that is generated by the DSAAC algorithm, and we design these rules as the access-control policy library in the standard language of the TBAC model for evaluation of the traditional policy-based access-control model. The abnormal sequence in the experiment is constructed by inserting items, deleting items, and modifying item information.
The semantic information in the experiment is based on the rule base that is provided by the data platform. Its main function is to identify tasks. Since no relevant professional knowledge is available for reference, this experiment lacks the formulation of semantic constraints. Therefore, in practical applications, security managers can formulate suitable semantic constraints that are based on their professional knowledge. In a perfect semantic library system, the actual application performance of this model should be higher than that in this experiment. The experimental results are presented and analyzed below.
Performance test. We chose the response time as a reference index in the performance test of access control. This experiment compares the three algorithms in terms of the average response time, namely, the time between when each request is sent and the authorization result is obtained.
According to Figure 7, the times for anomaly detection that are realized using the generated pattern databases that are based on the clospan algorithm and the DSAAC algorithm are similar and much shorter than that of the data set that is obtained using the prefix algorithm. This is because prefix generate a large pattern database; thus, the response time is longer than those of the other two algorithms.
Accuracy analysis. We use the statistical false-positive rate and the accuracy rate to evaluate our model.
False-positive rate test. The false-positive rate refers to the proportion of the number of requests that are regarded as abnormal behaviors in the test data. First, we compare the false-positive rates of the training sets that are generated using the three algorithms. The experimental results are presented in Figure 8.
Algorithm DSAAC realizes the lowest false-positive rate. Using a suitable training set, the false-positive rate of this DSAAC algorithm is approximately 5%. Then, we compare the algorithm before versus after the introduction of the semantic constraint (DSAAC with the semantic constraint) and the policy-based access-control model. The experimental results are presented in Figure 9.
As seen from the above figure, after using semantic constraints, the false-positive rate of this algorithm decreased from 7%-5% to 2%-3%, which is close to the policy-based access-control model. Because the static policy rules of our model are generated according to legal sequences, the false-positive rate is low.   Accuracy test. The correct rate refers to the proportion of abnormal behaviors that are regarded as abnormal in the test results relative to the number of abnormal requests. We compare the accuracies of the training sets that were generated by the three algorithms, and the experimental results are presented in Figure 10.
According to the experimental results in Figure 10, the correct rate of the training set that was produced by DSAAC is higher than those by clospan and prefix. Thus, the training set that was generated by this algorithm is relatively accurate and is suitable for the identification of user behaviors. Next, we evaluate the correctness of this algorithm after the introduction of the semantic constraint (DSAAC with semantic constraint) and the policy-based access-control model. The experimental results are presented in Figure 11.
According to the experimental data in Figure 11, when using DSAAC algorithm in combination with semantics, the accuracy rate reaches approximately 93%, while the accuracy rate of the traditional policybased access-control model is lower. Based on the experimental results, it is concluded that our model realizes a high accuracy and a low false-positive rate for dynamic permission control under mass services.

Conclusion
In this article, we studied the privacy protection issues of joint analysis across multi-data centers from the perspective of access control. In traditional service-based authorization models, static rules are used, which render the authorization process inflexible and unable to support the authorization requirements in multiple data center scenarios. According to the characteristics of scientific computing workflows across data centers, we incorporated the method of semantic verification and sequence anomaly detection into the access-control process and provided dynamic authorization management in the TBAC model and protected data privacy and security in multiple data center environments.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.