Discovering Human Presence Activities with Smartphones Using Nonintrusive Wi-Fi Sniffer Sensors: The Big Data Prospective

With the explosive growth and wide-spread use of smartphones with Wi-Fi enabled, people are used to accessing the internet through Wi-Fi network interfaces of smartphones. Smartphones periodically transmit Wi-Fi messages, even when not connected to a network. In this paper, we describe the Mo-Fi system which monitors and aggregates large numbers of continuous Wi-Fi message transmissions from nearby smartphones in the area of interest using nonintrusive Wi-Fi sniffer sensors. In this paper, we propose an optimized Wi-Fi channel detection and selection method to switch the best channels automatically to aggregate the Wi-Fi messages based on channel data transmission weights and human presence activity classification method based on the features of human dwell duration sequences in order to evaluate the user engagement index. By deploying in the real-world office environment, we found that the performance of Wi-Fi messages aggregation of CAOCA and CACFA algorithms is over 3.8 times higher than the worst channel of FCA algorithms and about 76% of the best channel of FCA algorithms, and the human presence detection rate reached 87.4%.


Introduction
Big data is leading a new prospective of data computation, storage, analysis, and mining in the recent years [1][2][3]. With the explosive growth and wide-spread use of smartphones with Wi-Fi enabled, people are used to accessing the internet, for example, watching videos on Youtube and chatting on Facebook through Wi-Fi network interfaces of smartphones in the area where Wi-Fi hotspots are deployed in order to save network traffic costs. In the meanwhile, about 40% to 70% of people always turn on Wi-Fi network interface of smartphones instead of turning it off for energy savings. Smartphones with Wi-Fi enabled periodically transit Wi-Fi probe messages, even when not connected to a Wi-Fi network [4]. When smartphones detect and connect to Wi-Fi hotspots in the area of interest, the background programs and services of operating systems in the smartphones can generate large numbers of data transmissions, for example, Apple iOS message push notification service and Android message notification service. As Wi-Fi network interface in the smartphones has the unique MAC address, it is possible to identify the smartphone's owner and distinguish his presence in the area of interest.
By deploying Wi-Fi sniffer sensors in the area of interest, it is possible to capture Wi-Fi message transmissions without disturbing the normal daily use of smartphones, and to analyze the situation of humans stay and even the coarse-grained location traces of the smartphones from the big data prospective. With the features of Wi-Fi sniffing, it can be widely used in the public places, for example, shopping mall, office buildings to analyze the dwelling of smartphones as well as the owners, even the features of human presence activities and classify varied user groups. In the shopping mall, for instance, user or customer engagement index [5] is a significant index of business decision for shop owner, for example, customer traffic, returning customers ratio, dwell duration, visit frequency, and so forth. In this paper, we design and implement the Mo-Fi system and investigate the feasibility of nonintrusive Wi-Fi sniffing for smartphones and human presence 2 International Journal of Distributed Sensor Networks activity patterns by deploying in the real-world office environment.
There are two key challenges to discover human presence activity patterns using Wi-Fi message sniffing on smartphones without disturbing the normal daily use of smartphones. The first challenge is the fact that smartphones with Wi-Fi enabled work on 14 channels, and the area of interest may deploy no Wi-Fi access point or multiple access points working on different channels. It is difficult to select the most active working channel in order to aggregate the Wi-Fi probe or data messages as much as possible in the realworld deployment. The second challenge is how to extract the features of human presence activities from the original Wi-Fi message sequence with timestamp.
The primary contributions of this paper are as follows.
(i) Non-intrusive Wi-Fi sniffing approach to aggregate Wi-Fi packages from smartphones.
(ii) Optimized channel selection algorithm to switch Wi-Fi sniffer sensor to work on those busy channels.
(iii) An activity classification algorithm to discover human presence behaviors in the area of interest.
(iv) Evaluation of human presence activity patterns in the office environment with the real-world deployment.
With the real-world deployment of Mo-Fi system in the office environment, we found that smartphones with Wi-Fi enabled generate numerous and continuous Wi-Fi probe message transmissions, even when not connected to a wireless network nearby or smartphones turn the screen off. During the data aggregation with one deployed Wi-Fi sniffer sensor for one month, we totally collected over 1,380,330 Wi-Fi messages from 12,496 mobile devices with Wi-Fi enabled via the IEEE OUI Registry [6], with the average of 46,011 messages per day. Among those monitored 12,496 devices, 1,437 of them are recognized as the visitors whose dwell duration is over 10 minutes. From the results of our classification methods, we dug out four types of human presence activity patterns, including the outside, walkbys, bounced, and engaged.
The remainder of the paper is structured as follows. Section 2 starts with the research backgrounds and requirements for the Mo-Fi system. We describe the overall operation of Mo-Fi system in Section 3. In Section 4, our optimized Wi-Fi sniffing channel selection algorithm is described to meet the first challenge, followed by human presence activity classification method in Section 5. Then Section 6 gives the Mo-Fi system deployment in our experimental office environment and performance evaluations. Finally, related works and some concluding remarks are made.

Motivation
From the retailers perspective, human presence activity can be virtual gold mine [7]. In the shopping mall, for instance, the shop owners are eager for the information about the customers behaviors after entering into the store, for example, which goods shelf or area where shoppers spend long time and monthly returning customer visit ratio. Tracking this information, the shop owner can predict the shoppers, purchase intention or needs and adjust their business decision, for example, more attractive discounts. Researchers have investigated several approaches to achieve the offline analytics for retailers on vision recognition approaches [7,8].
In [2], the experimental results distinguished that smartphones with Wi-Fi enabled transmit Wi-Fi probe messages periodically, even not to associate with a network. In [9], they set up a Wi-Fi enabled laptop as a traffic sniffer monitoring on three strongest APs in the cafe and used tcpdump command to monitor the devices connecting to the APs. Figure 1 shows the CDF of people dwell times at a university cafe. They found that more than 30% of the devices dwelled for less than 10 minutes (e.g., the user had a coffee/food to go); and more than 20% of the devices stayed at least two hours (e.g., web browsing).
From the real-world use scenario (e.g., shopping mall, cafe, and outdoor show), we can elicit the following requirements that support discovering human presence activity with smartphones using non-intrusive Wi-Fi sniffer sensors.  interface, the behaviors of the owners stay could be distinguished and extracted from the Wi-Fi messages sequence from Wi-Fi sniffer sensors.
(v) 5: human presence activity classification: to evaluate the user engagement index, varied human presence activity patterns should be classified from the features of the users dwelling behaviors, for example, walkbys and engaged customers.
(vi) 6: privacy protection against unsecure exposed Wi-Fi sniffing: when the user enables Wi-Fi network interfaces of his smartphones, user privacy should be guaranteed so that he might choose not to reveal his identity during passive Wi-Fi sniffing.
To address the abovementioned requirements, we propose and implement Mo-Fi system which can monitor and aggregate large numbers of ubiquitous Wi-Fi messages from smartphones using non-intrusive Wi-Fi sniffer sensors and discover human presence activities, as the unique MAC address could be used as an identity. The Wi-Fi sniffer sensors can detect the best active working channels and capture the device's MAC address with one-way hashing encryption and uplink the encrypted MAC IDs with extracted timestamp to the back-end big data analysis. The back-end functions provide large-scale storage and process the data, extract the stay of smartphones and classify the human presence activities, and display the statistics results on dashboard in the web interface.

System Overview
This section presents the system overview of Mo-Fi system before we discuss the Wi-Fi channel detection and selection and human presence activity classification issues. Figure 2, the architecture of Mo-Fi system is divided into front-end Wi-Fi sniffer devices and back-end data analysis services and web interfaces.

System Architecture. As shown in
The front-end Wi-Fi sniffer sensor is an ARM-based embedded device, which detects Wi-Fi channels, aggregates Wi-Fi messages from nearby smartphones, and uplinks to back-end clouds. Channel Detection and Selection module detects and selects better active sniffing channel due to the user's configuration. Data Filtering and Compression module filters out redundant, duplicated, or non-mobile-device data packets. Data Uploading module uploads the sniffed messages to back-end cloud RESTful API. Log Management module keeps the runtime status of all modules and stores them in local database. System Status Monitoring module monitors the parameters of CPU, memory, and disk usage in the Wi-Fi sniffer sensor for device diagnosis.
The back-end data analysis service is running on cloud servers, provides RESTful API to push/pop filtered and compressed Wi-Fi messages, and stores data in the database. Data Input module receives data from front-end devices and stores them into NoSQL-based database. Data Processing module loads raw data and analyzes data to human presence activities  using Analyzing and Mining module and then stores the analyzed statistics results. Data Output module provides Open API for data fetching of third-party modules. The web interface demonstrates human presence activities and device status with charts, bars, and tables, which could display analysis data in real time. Figure 3, the data processing procedure contains three phases of data fetching phase, data storage phase, and data analyzing phase. Data fetching method is designed to aggregate as much data messages from smart devices, which is embedded on Wi-Fi channel detection and selection algorithms; data storage phase is designed to provide better persistence for big data Wi-Fi data analysis, which includes data filtering and data compression for eliminating redundant messages; data analyzing phase is designed to extract the useful information from Wi-Fi messages (e.g., RSSI, SRC MAC address, DEST MAC address, PROBE/DATA message type, and the timestamp), analyze the dwell duration of the human behaviors, and finally dig out and classify the human presence activities.

Optimized Channel Detection and Selection
In this section, we discuss the Wi-Fi channel detection and selection methods to enhance the efficiency of Wi-Fi message sniffing and aggregation.

Channel Detection and Selection
Algorithms. The purpose of channel detection and selection method is to monitor and aggregate Wi-Fi messages from smartphones including probe message and data message as much as possible. Smartphones transmit periodically Wi-Fi probe messages to all 14 channels and send Wi-Fi data messages in the fixed working channel of connected Wi-Fi network when associated with a Wi-Fi AP. Therefore, it is key and significant to detect and find the working channel to monitor in the area of interest. Traditional sniffing method uses Fixed Channel Allocation (FCA) method to pick up one fixed channel randomly. The front-end Wi-Fi sniffer sensors of Mo-Fi system are designed not only to scan the working channels but also to switch to the one in order to capture message packets more efficiently. Besides the FCA methods, other three channel detection and selection methods are provided in the Mo-Fi system as described below. (iii) Channel Activeness Based Optimized Channel Allocation (CAOCA). The CAOCA method is derived from the CACFA method which discards those channels with extremely low level activeness and tries to switch the channel with high quality activeness and allocates more time slices as much as possible. Instead of sorting the list in ascending order, the CAOCA algorithm shown in Algorithm 2 sorts the channel's weights in descending order in the first 5 minutes in the detection phase. Then it separates the reminder 55 minutes into time slices by 15 seconds interval (default) and allocates the time slices to those high-quality channels first in the list according to the packets percentage of total packets number. When over 95% time slices have been allocated, the algorithms stop. Eventually, one or a few channels with high-quality from total 14 channels have been selected. Table 1 gives an example of the message counting results of each Wi-Fi channel. Normally, CH.6 is the default configured working channel on the most Wi-Fi APs, which has the best high-quality activeness. The detection and selection procedure is divided into a 5-minute detection phase and 55minute selection and monitoring phase. Totally, the 720 5second time slices are divided and allocated to those channels. The packet counting result shows that the CAOCA algorithm is a little better than the CACFA algorithm theoretically, and these two algorithms are much better than the ATCA algorithm. The FCA algorithm could have the best performance if the best high-quality working channel is chosen occasionally, while the inactive channel could get worse performance. The expectation value of FCA more or less equals the ATCA.

Redundant Message Elimination and Compression.
Smartphones with Wi-Fi enabled can produce large amounts of Wi-Fi message transmissions in seconds [2]. According to our experimental records, one smartphone can send 3 to 5 probe requests in average and receive the same amount. In Table 1, an example of time slice allocation of FCA, ATCA, CACFA, and CAOCA algorithms of probe response within 1 hour is shown. The total amounts of uncompressed and unfiltered packets could be up to more than 10 millions per day, which is a big burden to data analysis or data mining, in which we found that the messages contain tons of duplicated data. The front-end sensors of Mo-Fi system enable redundant message elimination and compression for sniffed packets. The algorithm shown as Algorithm 3 has two steps: elimination and compression. It maintains a MAC library of all the brands of smartphones and a dictionary of existing devices Wi-Fi modules MAC address with packet type within a certain time threshold (the default threshold is 1 second, which means if a Wi-Fi message is recorded, the algorithm would discard those other same messages within the threshold window). When a Wi-Fi message is sniffed, the algorithm 6 International Journal of Distributed Sensor Networks : the time stamp of the last recorded Output: (1) Get data from sniffer thread == . then (10) discard data (11) end (12) end (13) add data to ; (14) return ; Algorithm 3: The data filtering and compression algorithms.
would filter out packets of nonmobile smart devices based on the IEEE OUI Registry.

Human Presence Activity Classification
In this section, we discuss the human presence activity recognition and classification method from the collection of the aggregated Wi-Fi messages to the behaviors of human presence activities.

Feature Extraction on Wi-Fi Message Sequence. While
Wi-Fi sniffer sensors detect and capture message transmissions with all types of Wi-Fi packages, Mo-Fi system filters out three types of all, including Wi-Fi probe request, probe response, and data messages. In the experiment, we found smartphones transmit Wi-Fi probe request or response messages periodically and send or receive data messages when associated with Wi-Fi APs. By processing the timestamp of Wi-Fi message sequence from the unique targets identified MAC address, the system calculates the unique MAC IDs dwell duration, as well as the start time and the end time. We design a TS2VD algorithm to convert the discrete Wi-Fi message sequence to unique devices or users sequential dwell duration. As the probe request period is fixed, we design a time difference value (TDV), greater than probe request period, to distinguish between two unique visits. Any time interval of two message packets with the unique MAC address which is smaller than TDV belongs to the same dwell duration, as Algorithm 4 has shown in detail. The Mo-Fi system regards the unique devices or users dwell duration as the feature of the behaviors of visitor's stay in the monitored area of interests.

Human Presence Activity Classification on Dwell Duration.
Varied time lengths of dwell duration indicate varied human presence activity patterns. In order to classify the patterns, the Mo-Fi system uses -means clustering methods [10] to classify the human groups based on the time threshold features of dwell duration (DD), for example, capture time value (CTV), inside time value (ITV), and engaged time value (ETV). As a result, the users are divided into four groups, for example, outside, walkbys, bounced, and engaged patterns, International Journal of Distributed Sensor Networks 7 Input: : Packet List of a WiFi with timestamps ordered by timestamp, TDV: time difference value Output: : dwell duration list (1) = (2) V : Initialize visit duration data, start time, end time and duration ; (12) end (13) else (14) push V to ; (15) reset V object and do initialization; ; (19) end (20)  as Figure 4 has shown. In the case of the experiment in the laboratory, the average total traffic is 1,676 per day, of which the capture rate of outside pattern is 68%, walkbys pattern is 8.4%, bounced pattern is 17.4%, and engaged pattern is 6.2%.

Deployment and Performance Evaluation
Our experiment evaluation focuses on the following.
(i) The portability of front-end packet sniffing program, including channel detection and selection algorithm and data filtering and compression algorithm.
(ii) The performance of channel detection and selection algorithm. The experiment would demonstrate how channel detection and selection algorithm affects Wi-Fi package sniffing.
(iii) The performance of redundant message elimination and compression. The experiment would demonstrate the comparison between using and not using filtering and compression algorithm.
(iv) The comparison between analyzed human presence and the real circumstance in the real-world deployment in the office environment.

Prototype Implementation and Real-World Deployment
(1) Prototype Implementation of Mo-Fi System. On one side, we design and implement the front-end functions of Mo-Fi system in Raspberry PI [11] ARM-based embedded device with a Ralink Wi-Fi USB card as shown in Figure 5(a). The modules of channel detection and selection and data filtering and compression mentioned above are implemented in Python which are running on Raspbian operating system in Raspberry PI device. While the sniffing procedure starts, the Wi-Fi messages are sniffed and aggregated into local database as Figure 5(b) shows. On the other side, we design and implement the functions of human presence activity recognition and classification in Python in back-end server. Also the Mo-Fi system provides a web portal where the human presence activities are drawn in a figure dashboard. In addition, the users can monitor device status in real time and manage the Wi-Fi sniffer sensors as shown in Figure 6.

Performance Results of Channel Detection and Selection
Algorithms. To evaluate the performance of the Wi-Fi channel detection and selection algorithms, we use 5 Wi-Fi sniffer sensors to run the different algorithms at the same place and the same time in the testing office environment as follows.
(i) Fixed Channel Allocation algorithms on the best highquality active channel, for example, CH. 6.
(ii) Fixed Channel Allocation algorithms on the worst high-quality active channel. (iv) Channel Activeness based Channel Fair Allocation algorithms.
(v) Channel Activeness based Optimized Channel Allocation algorithms.
As shown in Figure 8, the different lines demonstrate number of sniffed packets using different channel detection and selection algorithms. The sniffer sensor with Fixed Channel Allocation algorithms on the best high-quality active channel, for example, CH. 6, received the most numbers of Wi-Fi packages. The result shows that about 80% of routers were working on CH. 6, the channel with the best highquality activeness. On the other hand, the sniffer sensor with Fixed Channel Allocation algorithms on the worst highquality active channel works worst, only 4.2% of the best one. It proved that the sniffer sensor should detect and select the best high-quality to improve the performance of Wi-Fi sniffing. Actually, as we do not know the deployment of Wi-Fi APs in the area of interest and the most active channel, the sniffer sensor just only has 1/14 chances to hit the best channel. For the Average Time Channel Allocation algorithms, the performance improved to reach 18% of the best one. For the Channel Activeness Based Channel Fair Allocation and Channel Activeness Based Optimized Channel Allocation algorithms, the performance improved highly further to reach 78.6% and 76.1%. As shown in the experiment, there is no distinct difference between Channel Activeness based Channel Fair Allocation and Channel Activeness based Optimized Channel Allocation algorithms opposite, not as we expected. We supposed two possibilities but not confirmed yet; one is the performance discrepancy between the sniffer sensors hardware on channel detection, and the other is the improper configuration value of channel hopping interval (5 seconds by default).
To summarize, the channel detection and selection algorithms can facilitate the Wi-Fi sniffer sensors to aggregate Wi-Fi messages in an easy way with a different performance. By deployment on CACFA or CAOCA methods, the performance is highly improved to the normal FCA method.

Performance Results of Redundant Message Elimination and Compression.
To evaluate the performance of redundant message elimination and compression, we run the program on three elimination and compression policies on one Wi-Fi sniffer sensor as follows.
(i) Algorithm that filters and compresses data.
(ii) Algorithm that filters but does not compress data.
(iii) Algorithm that does not filter or compress data.
As shown in Figure 9, the difference between bars indicates the performance of data filtering and compression algorithms with the abovementioned policies. The filtering algorithm can filter out about 90% of nonsmartphone Wi-Fi packets, and the compression algorithm can filter out about 70% of redundant filtered data. The big contrast between raw data and filtered and compressed data clearly proved the contribution on data filtering and compression.
To summarize the experiment, the filtering rate is approximately 9.29% and the compression rate is approximately 27.57%. The total contribution of data filtering and compression algorithms is to save about 97.44% storage spaces and provide better performance on further data analysis and mining. As shown in Figure 10, we calculated the users' traffic in the office environment for 14 days. The average engaged traffic is 55, which is close to the working employees' number, and the capture traffic value 760 is close to to the institute's total employee's number. Considering those people without taking smartphones or taking more than 2 devices (e.g., iPhone and iPad), we conclude that engaged traffic is approaching to long-time presence people number and capture traffic is approaching to the sum of walkbys and inside people. By calculating, the engagement rate as a significant index of customer engagement in the working days is stable to 5.1%, which is in accordance with the features of office environment. And we found that the engaged traffics of the weekends (e.g., the points day 4 and day 11 in Figure 10) dropped to less than half of the engaged traffics in the working days, which is in accordance with people's weekly work habits.

Performance Results of Human Presence Activity
The experimental results are in line with the real-world environment and as a result the patterns of human presence activity proved that the approach in this paper is feasible and acceptable.

Related Works
Activity recognition has become an active research area in recent years due to the pervasiveness of sensor assisted  phones [12][13][14][15][16]. Researchers study on human activity recognition on varied approaches. In [17,18], a system for sensing complex social systems with data collected from one hundred mobile phones for six months was designed to use standard Bluetooth-enabled mobile telephones to measure information access and use in different contexts, recognize social patterns in daily user activity, infer relationships, identify socially significant locations, and model organizational rhythms. In [19], a novel method was proposed to assess daily living patterns using a smartphone equipped with microphones and inertial sensors. And a feature-space combination approach for fusion of information from sensors sampled at different rates was proposed to identify various high level activities. In [20], a general technique was proposed to exploit this "multidimensional" contextual variable for human mobility prediction and extract different mobility patterns with multiple models under a probabilistic framework. To summarize, the above approaches focus on the complex activity recognition techniques by rich sensor assisted functions on the smartphones, which assume that large numbers of smartphones have installed the elaborated Apps or software to aggregate multilevel contextual information. Actually, the assumption is a little bit ideal to the daily, practical human use.
In [2], the experiment results on Wi-Fi message transmission of smartphones give a new way to recognize human activities. By deploying Wi-Fi monitoring equipment in an area of interest, it is possible to detect these transmissions, providing a coarse-grained location trace for each phone that passes through the area without modifying the phones. Every Wi-Fi transmission contains a unique device identifier (MAC address). In [9], a system for predicting length of stay at Wi-Fi hotspots was proposed to predict dwell time with and without the aid of client sensor data using machine learning algorithm at hotspot APs. Based on the above-mentioned contributions, we investigate the Wi-Fi channel detection and selection issue to improve the efficiency of Wi-Fi package sniffing and human presence activity classification issue to dig out presence patterns.

Conclusion
This paper proposes Mo-Fi, a non-intrusive solution for discovering human presence activity, by aggregating large amounts of Wi-Fi messages from smartphones using Wi-Fi sniffer sensors without installing any Apps in smartphones and without disturbing the normal daily use of smartphones. We proposed an optimized Wi-Fi channel detection and selection approach to sniffing Wi-Fi packages as much as possible and activity classification based -means on dwell duration sequences to identify the patterns of human presence. By deploying the Mo-Fi in the real-world office environment for one month, we found that the performance of Wi-Fi messages aggregation of CAOCA and CACFA algorithms is over 3.8 times higher than the worst channel of FCA algorithms and about 76% of the best channel of FCA algorithms, and the human presence detection rate reached 87.4%. We dug out four types of human presence patterns in the office environment, for example, outside, walkbys, bounced, and engaged pattern. Finally, we declaimed that our non-intrusive approach is acceptable to discover the human presence in the condition of restricted and practical deployment requirements.