Use of Technology in the Study of Team-Interaction and Performance

Direct observation of groups is labor-intensive. As a result, current research on small groups often relies on retrospective ratings. Recent developments in sensor-technology have eased data gathering, leading to a renewed interest in direct observation of groups. Sensor technology has potential, but also limitations; research has been technology- and data-driven with less recognition of the large body, and long history, of research and theory building. We review the literature on technology in small group research, argue for more interdisciplinary research and propose combining sensor technology with methods of interaction analysis, and the theories that underlie them, developed prior to 1980.

computer engineers (Keyton, 2016;Lehmann-Willenbrock et al., 2017). There is an increased interest in group processes and teamwork in several disciplines outside traditional psychology and sociology. For example, in educational science we see a growing interest in teamwork pedagogy (Riebe et al., 2016) leading to increased assessment-challenges (Forsell et al., 2020) and increased attention to observational methodology (Chiriac & Einarsson, 2018). In management and communication science, globalization creates a need for more knowledge on communication in intercultural teams (Oetzel et al., 2012), virtual cooperation (Ebrahim et al., 2009), and influence in group decision-making (Pavitt, 2014). These disciplines bring with them different and complementary perspectives on both theory and methodology. Dynamic conditions in groups have, for several decades, mostly been studied without directly taking into account temporal change and instead using, for example, questionnaires and cross-sectional designs (Santoro et al., 2015). The dynamic nature of groups could have been better understood by using direct observation, but that requires trained observers and time-consuming data processing afterward. The rapid development, in the last decade, of electronic sensors and data processing enables human action and interaction to be measured and analyzed cheaply, efficiently, and thoroughly (Kozlowski et al., 2016). If this technological development is to replace direct observation, or in other ways improve research on groups, it cannot take place only through the efforts of either social scientists or engineers (Chaffin et al., 2017). A professional collaboration between the disciplines is required (Lehmann-Willenbrock et al., 2017). Direct observations of interaction in teams was the primary method of group research until the early eighties. The shift away from observation also induced a shift away from studying changes in group-dynamics in shorter sessions. The major reason for this shift is probably that the methodology is demanding, not only when performing the observation and analyzing the collected data, but also in training the observer. The increased interest in studying interaction, combined with the increased capabilities of sensor-technology and algorithms, may bring the research of small groups "back home" to the primary interest in the field before the eighties. In this paper we will focus on the study of group interaction and the use of sensor technology. We will also highlight classical observation methodology and discuss both promises and limitations of their combined use.

Purpose of this Review
There is a growing body of research on the use of wearables and sensor technology in studying small groups. This line of technology-based research is often based on the technology at hand, and in many ways decoupled from traditional small group research (Lehmann-Willenbrock et al., 2017). As such, new theoretical contributions are often developed from empirically identified clusters of interaction data without regard to the vast knowledge generated over decades of small group research (Chaffin et al., 2017).
In parallel, there is an emerging, and critical, turning point in the literature on small group research in that understanding intragroup interaction has given way to trying to explain team performance based on individual attributes such as personality, or by bench-marking teams. This raises the need for methods of interaction analysis which, in many respects, is a coming home to the origins of small group research from the 1950s to the 1970s. The purpose of this paper is to review the current literature on wearables and sensor technology in studying interaction in small groups, to suggest how this technology can improve small group research, and to point to its present limitations.

The Traditional Way: Direct Observation of Group Interaction
Small group research came to the fore after the second world war, based on a deep interest in finding out how some teams successfully completed their missions while others did not, even though they consisted of the same type of people, with the same training, the same equipment, and under the same conditions. The assumption was that something happens among people when interacting in teams that determines team success or failure. This "something" became a central part of research and gained significant ground both in the US and Europe.
Robert Freed Bales, often identified as the father of the small group research tradition, and his colleagues at Harvard were able to build laboratories with the newest observational technology at the time. From this epicenter of research activity emerged methods and technology to measure interaction processes in small groups that are still in use (Bales, 1951;Bales & Cohen, 1979;Leary, 1957;Parsons, 1953). The most popular of these instruments was Bales's Interaction Process Analysis (IPA) (Bales, 1951). The 12-category IPA observation scale used a mechanical device with a moving paper tape for recording group interaction, his "Interaction Recorder" (Bales & Gerbrands, 1948). This technology constituted a technical apparatus for the sequential analysis of social interaction. It made it possible for the observer to map interaction patterns over time during a group interaction. In the late 1970s Bales developed the SYMLOG system (Bales & Cohen, 1979) which was both an advanced category system for direct observation and a questionnaire for peer-ratings of group members' behavior. SYMLOG became so popular that most of the papers in the International Journal of Small Group Research, later merged with Small Group Behavior to form Small Group Research, were based on research using SYMLOG.
Although the work of Bales and the small group research tradition was well known, and the SYMLOG method was used at universities all over Europe up to 1985, the European tradition was still synonymous with the Tavistock tradition (Heinskou & Visholm, 2004) based on the thinking of Bion (1961). That was true in particular for practitioners. That European qualitative and psychodynamic approach over-shadowed the quantitative and behavior-oriented approaches typical of the American small group research tradition. At the same time, as the use of SYMLOG diminished in Europe, several studies identified clear connections between the two lines of thinking (Lion & Gruenfeld, 1993;Orlik, 1987;Sjøvold, 1995). This is not surprising since there are several common denominators between the two traditions, which to a great extent is due to the influence of Kurt Lewin as a bridge builder between European and American research. For our discussion in this paper, we find Lewin's (1951) concept of field theory important.
The concept of a social field, with actors influencing each other based on what they bring in terms of personal and contextual characteristics, has inspired many thinkers from Moreno (1953), with his sociograms ( Moreno & Jennings, 1938), to modern social network analysis (Freeman, 2004). When creating his SYMLOG analysis Bales was highly inspired by sociograms and Lewin's idea of displaying a group's social field by vectors and topology (Lewin, 1934) and by graph theory (Flament, 1963). Bales (1985) himself presented SYMLOG as the new field theory in social psychology. Further analyses of group dynamics were developed based on this framework which at the time appeared to be very promising (Polley, 1985(Polley, , 1987. For many years it seems that the concept of field theory had died, which may have been caused by the shift in attention toward personality and Input-Process-Output (I-P-O) in the research on small groups. However, there is now renewed attention to the importance of group members' actions as influenced by contextual factors. If sensor technology, combined with powerful computers, can bring back attention to the study of interactions in groups, we argue that field theory thinking will follow with it.

Alternative Approaches to Direct Observation
During the individual-oriented 1980s, often labeled the YAP-period (Young Aspiring Professionals), person-oriented inventories and tools dominated practical team training and also influenced research. The idea was that building a team was more or less like building a structure with Lego bricks, by putting persons with complementary personalities together in a team (Belbin, 1988;Frohman, 1978;Quinn, 1988). Replacing laborious group observation with quick self-reports of one's group role was probably in line with the demand for time efficiency of the 1980s. While there was a return to the person-oriented approaches in the early 2000s, when personality research was at its peak in psychology (Greenwood & Suddaby, 2006;Neuman et al., 1999;Peeters et al., 2006;Salas et al., 2005;Sheldon et al., 1997), the 1990s and the start of the new century were dominated by the search for the "ideal team." In many ways, Katzenbach and Smith's (1993) paper in the Harvard Business Review was a turning point. Practitioners and researchers turned from person-oriented approaches to seeing the complete team as a unit of study. This was a time when benchmarking was very popular in organizational consulting and, in the same way, practitioners searched for the ideal team so that they could extract its secrets (Boyton & Fisher, 2005). McGrath's (1964) I-P-O model became popular, although its use placed less emphasis on the process part, which McGrath saw as the most important aspect of his theory. Researchers were looking for characteristics of the most effective team (Hackman, 1983;Hackman & Morris, 1975;Wageman & Hackman, 2010), factors sustaining teams over time (Hackman, 1992), the effect of group cohesion (Mullen & Copper, 1994), and the effect of different group tasks (Hackman, 2002). Even the search for team personality via the "big-five" in personality theory was suggested (Salas et al., 2005). Most of the research was based on self-ratings and questionnaires. Even the use of Bales' SYMLOG system changed to emphasize a more normative most effective team member and most effective team as universally preferred positions and configurations in the behavior space.

A Renewed Interest in Direct Observation
After the turn of the century, criticism of the typical misuse of the I-P-O model during the 1990s was increasing (Braun & Kuljanin, 2015;Cronin et al., 2011;Kozlowski, 2015;Kozlowski et al., 2016). The critics pointed to the weakness of self-reporting (Kozlowski, 2015;Kozlowski & Chao, 2012), but also to the very idea that it is possible to find causality between inputs and outputs without studying what's happening in the process black box (Glouberman & Zimmerman, 2002;Mathieu et al., 2008).
Around 2010 there was a new turning point, this time back to the focus that was more typical of the earlier decades of small group research. The MIT book X-teams (Ancona & Bressman, 2007) became very popular among practitioners. The message was that the old approach to understanding and building teams, popular in the 1980s and 1990s, proved to be less successful, and that teams able to adapt in response to other groups, and to shifts in their environment, were those that succeeded. Edmondson's (1996Edmondson's ( , 2012 work supported this assumption and developed this branch of thinking based on studies of teams in several different disciplines and contexts. The notion that groups that are able to change their interaction to match their context are the most effective was wellreceived by practitioners and supported by other research (Arrow et al., 2000;Danielsen, 2015;Healey et al., 2015;Heldal & Antonsen, 2014;Kozlowski et al., 2009;Schippers et al., 2014;Sjøvold, 2007;Stålsett et al., 2016).
The search for the best mix of Lego bricks was now replaced by an interest in finding out how communication can be improved for better informationsharing and improved understanding of the situation and mission regardless of group members' personalities. Some studies even claimed that quality of communication was more important than any individual characteristics, including intelligence, in team performance (Pentland, 2012).
The most interesting thing about this shift is that the new approaches and topics of interest were exactly what concerned researchers in the early years of small group research (Bales, 1951;Bion, 1961;Parsons, 1953). The primary method of research at the time was direct observation, based on systematic categories of behavior and interaction patterns. When opening the process black box for study, direct observation is the only method that can be used. Catching micro changes in a group's dynamic over even a limited period of time is, however, very demanding and a well-trained observer is mandatory. That may be one of the reasons that questionnaires and survey methodology were so dominant during the 1980s and 1990s. This is changing. Today, wearables, sensor technology and (not least), the power of computers hold the promise for automatic observation and analysis of group interaction. In Pentland's (2012) study of the importance of communication, cited above, he used a wearable sensor platform called a sociometer to reach his conclusions. There is an increasing use of electronic sensors or wearables in empirical studies of team-interactions (Alshamsi et al., 2015(Alshamsi et al., , 2016Chaffin et al., 2017;Gundogdu et al., 2017;Kim et al., 2012;Orbach et al., 2015;Wageman et al., 2012;Watanabe et al., 2014). Although there is also an increasing number of different technologies to support team research (Carter et al., 2015;Chaffin et al., 2017;Kozlowski, 2015;Kozlowski et al., 2016;Luciano et al., 2018;Santoro et al., 2015;Tonidandel et al., 2018), a vast number of these are still purely technology driven and do not sufficiently connect to existing theories and knowledge in the small group field (Chaffin et al., 2017).

New Technology: The Future of Interaction Analysis
We argue that, since both direct observation of groups, and the use of sensortechnology are increasing in popularity, it is important to bridge new achievements in the field of small group research with the vast, and almost forgotten, legacy of the field. We think that sensor technology and software combining potent theory-based algorithms, will not only make the hard work of manual observation a thing of the past, but also speed our understanding of interaction in, and performance of, small-groups. But the present technology has severe limitations. Sensors can measure communication patterns and, to a certain extent, the quality of communication. However, important measures, such as group context and mental models, are by no means in reach of sensortechnology. We will still need to complement sensor-technology with traditional methods, such as interviews and questionnaires, to understand interaction context and to anchor observations in theory. The scope of this review is to investigate the present status, and future promise, of the use of wearables and sensor technology in the study of small group interaction, its limitations, and how the use of novel technology can benefit from well-documented classical observation methodology.

Data Collection and Method
Step One: Identification of Literature Through Search The systematic review of literature on the use of new forms of technology on teams and team research was conducted in the period September 2020 to November 2020 and followed a modified PRISMA search process (Moher et al., 2009). We started out with a broad search combining "team" with "technology," "machine," "sensors," and "wearables" using Web of Science and Pro Quest. We defined team (social science) literature as published articles within scientific journals related to research fields in management, economics, psychology, and groups ( Figure 1).
What quickly became clear was that, within the social sciences, there were few articles that targeted the use of sensors and wearables in research on teams. The articles identified were often related to general articles on potential future use of technology (Moher et al., 2009).

Step Two Identification of Literature From Scientific Journals
We systematically went through 10 years of issues of three journals related to team research in the second phase, to see if we could identify particular articles addressing the subject. From the journal Small Group Research, we identified no articles based on empirical evidence discussing the subject. We found several wishes for embracing interdisciplinary cooperation in the study of small groups (e.g., Kettner-Polley, 2016;Salas, 2013) and maybe more importantly, a special issue (Lehmann-Willenbrock et al., 2017). The special issue discusses the possibilities and problems related to cooperation between computer scientists and social scientists for advancing small group research.
In the Journal of Organizational Behavior, we found no articles on the subject except for a special issue (Wageman et al., 2012). From this issue we identified one article (Kim et al., 2012) that lays out some important issues using sensors in organizational research. The Journal of Applied Psychology published a special issue on the occasion of the journal's 100 year anniversary had some interesting aspects but, no articles on the subject.

Step Three Screening and Selection of Relevant Literature
Articles identified in step two that were cited backwards (ABI/Inform (ProQuest) and forwards (Web of Science) were used to identify scientific work that was related to teams and technology. We then ended up with a list of 136 articles, which was reduced to 21 by removing duplicates and nonsensor-based articles and book-chapters. When reading carefully through the identified literature, three main topics emerged: • • Methods for embracing new types of technology in team research, and • • Empirical articles in social science journals using electronic sensors or wearables.

Call for Interdisciplinary Research
At the most general level were reflections on embracing interdisciplinary perspectives and cooperation (e.g., Kettner-Polley, 2016;Salas, 2013) to advance theory, or even more importantly, to solve real social problems. Kettner-Polley also points out that difficulties of cooperation within the field of small group research have long existed even among researchers affiliated with the same academic department. These papers describe collaboration across theoretical disciplines and not specifically collaboration in the development and use of hardware and software technologies. Bringing big data into social science demands a basic understanding of what big data (or smart data) is, how it is generated, and how to use it in a purposeful way. For example, George et al. (2014) identifies five types of big data; probably the most accessible for the purpose of group research is selfquantification data, "types of data that are revealed by the individual through quantifying personal actions and behaviors" (George et al., 2014, p. 322). Using sensors to investigate group dynamics will definitely generate vast amounts of data while helping to ensure data integrity and quality.
A special issue of SGR (2017) describes an effort to bring social scientists (groupies) and computer scientists (geeks) together in a joint effort to describe the two academic fields' takes on the subject of analyzing dynamic group interactions. Twelve groupies and 13 geeks gathered for a workshop to carve out the possibilities and difficulties of group researchers and computer researchers working together. Even though the two fields traditionally are seen as soft and hard science, they share a common scientific method and empirical knowledge base. They differ in that social science deals with human factors while computer science deals with technological factors.

Methods for Embracing New Types of Technology in Team Research
The articles listed in this section focus on how new types of sensors might give access to a richness of knowledge on dynamic relationships within groups. The potential is described as considerable, but the issues that must be dealt with, regarding data collection, data storage, data processing, and data presentation are important to address, but also unfamiliar to most group researchers. For a summary of this section see Table 1.
Common to the articles was the focus on weaknesses in the dominant methods for mapping dynamic conditions in groups. Traditionally, research on dynamic conditions using observations of groups has been labor-intensive (Lehmann-Willenbrock et al., 2017) and thus expensive. In addition, the amount of data (for example with coded observations) increases dramatically when several groups are compared or when the number of dynamic conditions is increased (Hoey et al., 2018).
Thus, cross-sectional questionnaires have been a preferred alternative, when it comes to research on dynamic conditions in groups (Santoro et al., 2015). Such a tool shows a snapshot of a group and makes it difficult to map change and the development of dynamic conditions in the group over time (Kozlowski, 2015). New types of sensors, handling of large amounts of data, data processing, and developments in user interface have, during the last decade, been predicted to revolutionize research on small groups, especially when it comes to dynamic conditions (Carter et al., 2015). But it turns out to be challenging to find fruitful forms of collaboration that connect established group research and engineering disciplines. As a result, some technological groupings have developed sensors and data-related methods based on alternative or incomplete team theories (Chaffin et al., 2017). Since new types of sensors can collect extremely large quantities of data, these big data will bring to the table new types of issues. For example, sampling error will probably be a weak indicator for validity since the sample size will greatly increase (Tonidandel et al., 2018).
Several of the articles criticize the I-P-O model (e.g., Carter et al., 2015;Kozlowski, 2015) because the P (process) is often treated as a black box, as the I (input) and O (output) are used as independent and dependent variables. In general, algorithms for analyzing the data collected from sensors, will be a new type of black box, demanding taking into account new types of validity (Tonidandel et al., 2018). A similar phenomenon is bias; researchers must always take into account bias, but sensors and data processing remove some human biases, (e.g., the observer and the one who completes the questionnaires). At the same time, new forms of bias are introduced, such as the biases implemented by the computer scientist that develops the algorithms used for analysis, the data that an algorithm is developed from, and how the sensors are designed (e.g., Tonidandel et al., 2018). The existing methods, explicitly or implicitly, take contextual considerations into the research. When interviews or questionnaires are designed, and when results are analyzed, the researcher will consider the context (Johns, 2006;McCracken, 1988). But when machines are collecting and analyzing data (Luciano et al., 2018;Wageman et al., 2012), the contextual considerations may be lost or weakened. This is an example on how re; search based on new types of technologies may bring a richer picture and shed light on constructs in novel ways, but still be dependent on traditional research methods; they will complement each other (Tonidandel et al., 2018).
These articles show a whole range of methodological assessments that need to be elucidated when designing group-related research projects involving new technology. In order to be able to compare the results collected and analyzed with new technologies to established theoretical constructs, and to ensure that topics such as context are taken into account, team research will be dependent on using traditional methods as well as the new technologies.
However, all the authors cited above agree that the opportunity to understand more about dynamic conditions in groups has never been greater than right now, but that the complexity of conducting research projects will be considerably greater, and the dependence on interdisciplinary teams higher, in the future. To ensure that methods using new technology are based on established and validated group theories, interdisciplinary research could be covered in special issues of established journals (e.g., SGR) or even by dedicated journals

Empirical Articles in Social Science Journals Using Electronic Sensors or Wearables
The articles listed in this part all use wearable sensors to test different teamrelated constructs. They mostly use co-location measures from the sensors combined with more traditional validated surveys and questionnaires to investigate dynamics in groups. In the presented material there are also some other commonalities. Most of the research presented in Table 2, spans longer time periods, lasting from 20 to 60 days. All the research presented in this  section makes use of infrared sensors to capture face-to-face interaction, either as a focus of the research or as a part of the research. From these articles there seem to be at least six levels of issues to be discussed before and during a research project that is going to use electronic sensors for looking into group dynamics and these issues are as follows:

Issue 1. Group Dynamics to be the Subject of the Research Project
The first issue is: What type of group dynamics is suitable for successful measurement by electronic sensors? Not all aspects of group dynamics are possible to capture with an electronic sensor and this must be carefully considered before choosing an effective technology and method. Pentland (2012) documented inter-team and intra-team communication patterns to predict team productivity. Others (e.g., Alhamsi et al., 2015;Orbach et al., 2015) use sensors to track informal interaction patterns and networks and compare the results to email-correspondence and questionnaires on personal states. An important issue to consider is: how can it be validated? A comprehensive validation of constructs is difficult for several reasons (Chaffin et al., 2017). There are several sources of error, and these sources of error don't have equal value when it comes to validation of constructs. For example, even if all wearable sensors (WS) come from the same vendor, there will exist variation in sensitivity among multiple wearable sensors. This can be avoided by choosing a different construct or sensor type, or by detecting and correcting variation using methods Kayhan et al. (2018) has made available for researchers on their webpage badgevalidation.com. Luciano et al. (2018) describe a method for aligning constructs to the actual measurements to provide a good starting point to the next step in the process, the sensor to be used for data collection.

Issue 2. The Sensors Used for Capturing Data for the Research Project
There were four types of sensors used in the research presented in this section. These are: (1) Infrared sensor capturing direction within a specified sector, typically 30° in front of sensor, (2) Bluetooth sensor capturing distance to other sensors, (3) speech sensor, capturing speech time, volume, and conversation, and (4) accelerometers, capturing acceleration in a three-dimensional space. Several of the research projects highlight the importance of choosing the accurate detection level (RSSI level).
Technologically there is a distinction between type of sensor (e.g., Infrared and Bluetooth) and platform (e.g., Sociometers), meaning that a platform can contain several types of sensors. When Pentland (2012) presents his measurements of communication patterns, he bases this on a collection of tone of voice (microphone), body language (accelerometer), and which team members have direct contact (infrared sensor). Even though Alshamsi et al. (2016) and Gundogdu et al. (2017) use the same type of sensor platform as Pentland, sociometers, they only use data from the IR-sensor. The IR-data were used for documenting social interactions and held together with survey data on personal states. Validation and testing must be carried out regarding each sensor type and fitted for the construct to be measured and the environment the measurement will be carried out in. Chaffin et al. (2017) points to two types of variability to consider before starting to capture data. Within-sensorvariability is inaccuracy at each individual sensor component. This type of variability will have less impact as the period of data collection increases. Between-sensor-variability refers to the fact that average detection level will vary between sensor components, even if they are in the same context / environment.

Issue 3. Communication and Storage of the Data Material
Choosing the right method for the transfer and storage of data is an important decision but seems to be highly dependent on at least two things. First, what type of sensor platform is available for the research project and how easy is it to access and understand its technological solutions? Second, how can the collected data be stored on the device, transferred from the sensor to a storage device (e.g., wireless communication or docking), and in which form can it be stored permanently? In one study (Gundogdu et al., 2017), which used what is probably is a common approach, participants put the sensors on when entering the workplace and took them off when leaving each day, during the 6-week duration of the study. The study's staff then downloaded the data and carried out necessary maintenance, preparing sensors for the next day. Watanabe et al. (2014) collected infrared data from different sensors: wearable badges and stationary beacons placed on desks and in break rooms, combining these data sources to find the total number of interactions in the organization. Kim et al. (2012) collected data from one organization that was spread over several locations and later synchronized them on a central server after downloading data from the sensors. Sensors and wearables produce large amounts of data, so called machine generated data (Heggernes, 2018), or Big data. Big data is characterized by volume (the size that has to be stored), velocity (data added constantly), and variety (multiple data sources) (Tonidandel et al., 2018). According to Toninandel et al. (2017), this requires a mind shift, not only when it comes to the data itself, but with regard to the total analytic approach, to what they describe as an "algorithmic culture" for processing the data.

Issue 4. Processing the Data
Sensors can capture dynamic aspects of groups that, for example, the I-P-O model's black box perspective do not. But the introduction of algorithms, sometimes algorithms that are developed by the vendor of a specific sensor, will possibly create a new black box in the research. A minor change in an algorithm may result in a different result and, for research purposes, you may have to consider developing your own algorithm, if the results from the software that follows the sensor don't seem valid (Kayhan et al., 2018). Data will be available at several levels, according to Chaffin et al. (2017), a raw level as in data processed by the sensors' firmware, a basic level (e.g., co-location), and a higher level (e.g., mirroring of body language). This is in line with Kayhan et al. (2018), that in addition to the basic data gathered, refers to derived metrics as data that have been processed by the Sociometric Solutions software. The algorithm's processing of data to a higher level may become a new type of black box, depending on issues such as proprietary software developed by the vendor or the researcher's basic understanding of algorithms. As a part of the preparation for data collection, the software for downloading and analyzing data must be tested, as well as firmware installed on sensors must be tested (Chaffin et al., 2017). Hoey et al. (2018) use artificial intelligence and machine learning to analyze group interaction, classifying interaction based on IPA (Bales, 1951) to understand dynamic behavioral patterns in groups. Machine learning can also be used for validation of results, for example by correcting asynchronous clocks in sensors (Kayhan et al., 2018). To analyze fine grained data on interaction, synchronized time between sensors is extremely important for many constructs (e.g., mirroring of body language or interruptions of speech) based on derived metrics.

Issue 5. User Interface
How is the result presented? Is it easily understandable for the research participants and later for non-participants, such as reviewers or other researchers? These questions are important for developing a common understanding of the results from the research and to replicate the study. Finding the most appropriate interface to present the result is important but also demanding. Since the flow of data could be continuous (if the data source is connected), the result, for example based on speaking time in a team meeting, can even be presented live to researchers and team members (Kim et al., 2012, see Figure 7, Appendix B). Different types of network diagrams are used to show face to face activity in organizations (e.g., Alshamsi et al., 2016, p. 7;Watanabe et al., 2014 p. 3) and seem to be a good way to provide an overview over large data sets. Temporal change in volume between 10 sensors are in Kayhan et al. (2018, Appendix 3, Figure 6) plotted during three meetings and is one way to visualize differences between sensors over time. And precisely that, change in dynamic constructs over time, is one promising aspect (e.g., Kozlowski, 2015;Luciano et al., 2018;Santoro et al., 2015) that new technology can help to investigate. Then it becomes important to be able to visualize and interpret the result

Issue 6. Iterative Process Fitting the Total Research Project Between Levels and Disciplines
The first five issues presented, show that there are many different competencies needed to get a good and valid research project using wearable sensors. This is not a one size fit all approach, but a carefully tailored process taking into account. For example, will the construct fit the sensor, what type of technology can be utilized; what type of algorithm will fit the desired outcome? These are all dependent on different disciplines and competency domains. This shows that to succeed in bringing new types of sensors and technology into research projects, is not as easy as implementing a sensor out of the box and using the results as prescribed from the developer of the sensor or algorithm (Chaffin et al., 2017;Kayhan et al., 2018). In the same way as Edmondson and Mcmanus (2007) describe the process of methodological fit as an iterative cyclic learning journey, Luciano et al. (2018) describe the process to align constructs to measurements as an iterative fitting process. For example, as Chaffin et al. (2017) show, pretesting (and possibly adjusting) sensors in an environment similar to the actual premises must be done as a part of the preparation after choosing sensor type. Minor changes in the algorithms can alter the results obtained, based on the same raw data, and it is then important to document firmware version, software version, and other technological choices made during the research project. This fitting process is compiled as a visual presentation in Figure 2, which is in line with Whetten's (1989) how in theory building.

Discussion
In this review we argue that the use of sensor technology may enhance research on interaction in groups which, in many ways, has been neglected during recent decades. The present state of sensor technology and belonging algorithms clearly has a potential, but also many pitfalls. We suggest that looking back to observational methods developed in the small group tradition before the eighties could fulfill the potential of the technology.

The Challenge of Measuring Dynamic Phenomena With Static Instruments
The direct observation of group members' interaction was the main method of studying groups until around the 1980s, when the often demanding and time-consuming observation was replaced by surveys and questionnaires. That meant moving to more static tools to measure interaction in groups that change dynamically over time. The field of small group research lost an important perspective in that shift, which may return through the use of sensor technology.
The I-P-O model has traditionally been criticized for treating processes as a kind of black box, but new technology has created some possibilities to open up the box (a point which we will return to later). The challenge in using external measurements is the underlying intention behind the movement, be it a movement of the head or an utterance caught by a sensor. Methodologically speaking this is an old challenge with observational studies and/or ethnographic studies, that you may study or observe what happens, but you may not observe the underlying meaning or intention. In fact, for the observer, attributing intention may be a major source of error (Antonsen, 2009). A sensor may, for instance, show outputs of a movement or an utterance, but the meaning of the utterance is interpreted only in accordance with another output from another team member. In other words, what goes on within each person's head is not revealed explicitly but is left for interpretation by an external observer.
The problem is not only how to use such devices to measure (and study) team interaction, but also how to use them as tools for development. Already, tools such as Kanban boards are used by DevOps and agile teams for visualization of work progress. Interaction measures could be included to visualize team development. This would be a move further ahead from the I-P-O model, into what Ilgen et al. (2005) proposed as IMOI. They argued that a more adequate model than I-P-O may be IMOI (Input-mediator-outputinput). Ilgen et al. propose "adding an extra 'I' to include cyclical causal feedback, eliminating the hyphen to highlight the non-linearity between factors and finally substituting M for P to reflect the broader range of variables" (p. 520). M is here an acronym for mediation, intended to emphasize the flexibility between input and output. One may imagine this for productive purposes as well as training purposes. The latter could resemble measures that are currently in use in, for instance, soccer, where players are monitored with GPS sensors. Movements are analyzed and fed back as suggestions for improvement. In a similar manner, as a tool for group reflection, sensor measurements could serve as an input or mediator for development. On the other hand, there are also measurements, such as sociograms, that may serve as valuable inputs for developments without any deeper reflection or understanding of underlying intentions or meanings.
There is also a real danger that measurements are only able to catch snapshots of what is going on, which is what most of the articles argue. When measurements (e.g., retrospective questionnaires), methods (e.g., cross sectional design) and models (e.g., I-P-O) support this, using sensors to collect continuous streams of data, will provide a more complete understanding of the constructs investigated (Carter et al., 2015).

Can Technology Help?
Even though there are available computer-based systems for automatic identification of facial expressions that can interpret emotional reactions to a certain degree (e.g., Noldus, 2021), these are not applicable for studying group interaction. The complexity of interactions in a group of, say, five persons is so great that we do not have sufficient computer power available to handle it in practice. Less complex data, such as transpiration or heartrates can be easily gathered by relatively cheap and available wearables (Kozlowski, 2015). However, the problem when studying group interaction is not the lack of data, but the huge amount of data you get from even a limited set of parameters during a team interaction. For all practical purposes computer processing power effectively limits the number of variables that can be processed in a reasonable time span. So we have to choose between quality and efficiency.
Sensors that are available today typically measure acceleration (body movement), speech (changes in pitch), IR (direction, who you are looking at) and Bluetooth (changes in proximity). Even limiting the measurement to these four parameters, the amount of data will easily be too large for a reasonable processing time when adding some degree of interpretation to the algorithms. The conclusion is that, given the technology available today, our observations using sensor technology will be limited to measuring only overt behavior.
On the other hand, processing time is less of a problem when it comes to research. Tools for handling big data and the potential of machine learning holds great promise for identifying unknown patterns in group dynamics (Kayhan et al., 2018). This is especially true if sensor data can be moved up to the cloud, since there are already flexible cloud-based services that can efficiently store and process large amounts of data (Schwab, 2017). Developing and using IoT (Internet of Things)-based sensors can thus help to make this type of research more efficient and usable in the near future. If we look even further ahead in time, something like edge computing (e.g., increased computing power in IoT devices and 5G) will contribute to more data being processed on site and again be able to give faster insights for both researchers and practitioners using these types of devices.
To fulfill this promise we argue that we need to draw on models and theories that are developed for the study of interaction between group members rather than traits of individuals or traditional I-P-O (Carter et al., 2015;Kozlowski et al., 2016). This would entail that, rather than the I-P-O model of measurements, an approach using an IMOI model is more realistic.
As discussed above, technologists have a tendency to create intuitively interaction variables from statistical clusters of data gathered by the sensors used. We argue that a more fruitful path is to go the opposite way, using models developed from the vast knowledge of small group research (such as IPA and SYMLOG), and identifying which of the sensor parameters correspond to the categories of the model.
One of the most-used systems for systematic observation of behavior in groups is Bales's (1951) IPA (Interaction Process Analysis). Bales even used a recording machine to plot group dynamics. The recording was dependent on a human observer for both interpreting and plotting the data, as do the computer devices available today. The promise of sensor technology is that both plotting and interpretation can be automated. There has been some interest in using Bales's IPA category system for indirect observation of group dynamics (without a human observer) (Hoey et al., 2018). However, such studies were based on email correspondence or other textual information and the activity measures are performed over 20 to 60 days. The reason for this is twofold. First, the 12 categories of the IPA system require human interpretation, which is very difficult to obtain with sensor data alone (see Table 3). Second, the timestamp of sensors, as Kayhan et al. (2018) found when testing the sociometers, is still too inaccurate to measure group-interaction in situ. On the other hand, according to Kayhan, the problem can be corrected using machine learning or algorithms tailored to each specific project.
The challenge of using sensors is to measure group interaction over a short period of time, say, a 1-or 2-hour face to face meeting, typical of what IPA was intended to measure. Given that the clock accuracy of sensors will improve sufficiently so that interaction measures can be trusted (that is, a unit of communication sent from person A to person B is actually measured as such), we argue that a more elaborate system for analysis, such as Bales's SYMLOG system, will be more appropriate. The categories of the SYMLOG observation system (Table 4) describes more distinct and overt behavior as compared to IPA, and so may be measured more reliably when using sensor technology. For instance, SYMLOG item 2: Outgoing, open, sociable could be measured with a combination of high values of body movement (accelerator), proximity (Bluetooth), and measured time of speech. Source. Bales (1951, p. 20). Passive, introverted, inhibited Source. Bales and Cohen (1979, p. 245).
It is reasonable to expect that it will be difficult, or even impossible, to identify a one-to-one relation between categories from existing theoretical models and combination of sensor parameters but starting with an existing model will be a head start compared to starting the other way round. Even though it probably will be difficult to measure all categories of an existing model today, it is fairly easy to create simple sociograms or frequency distributions, using sensors, provided that the clock synchronization is sufficiently accurate (Omenaas, 2018).

Mindset, Bias, and Ethics
When it comes to effective use of big data in research, Tonidandel et al. (2018) believe that a change in the mindset of social scientists is required to be able to use it effectively. They argue that the established culture (data modeling, which is based on the response variable being generated from a given stochastic model) is not effective when it comes to big data. Instead, a transition to an algorithmic culture, which focuses on finding a function that describes the response process, is required. Since social scientists aren't trained in using the necessary software and corresponding statistical techniques, there will be a need for more research carried out through multidisciplinary teams and mixed methods (Tonidandel et al., 2018).
To date, the source of data for observing interaction in groups is human (participants in groups or observers) and this will, to some degree, change to device (wearable and computer) as source, which will address at least two issues in group research. First is the black box concept that often follows the I-P-O model, in which the process component is treated as an unknown intermediary. But the new technology will also be a new form of black box, in which those who develop the technical parts and the algorithms will be very important. This will entail a form of unknown component in the research project but, at the same time, it addresses the second issue, human bias. Bias exists in all types of research in social science. Conducting research on groups, the researcher's bias will affect the design, execution, and interpretation of research projects. The group members' biases will affect the assessment of their own and others' behavior and interaction in the group and, perhaps even more so, their assessment of other groups. With the introduction of new technology, new forms of bias will arise based, among other things, on manufacturers of wearables and developers of algorithms that will be related to both their own experiences and what is generated, for example, by the data material (e.g., related to machine learning, training of algorithms).
The introduction of new technology in group research will depend on combining it with traditional types of instruments such as questionnaires, so that many types of bias will continue to exist. But, by having several simultaneous methods, the impact of bias will be reduced. For example, hindsight bias and confirmation bias could affect more traditional research designs and should be reduced by using "neutral" technical devices. But there will be new forms of bias, for example that which is represented by those who develop the algorithms for analyzing. Selection bias can occur when an algorithm is developed and tested on a particular type of groups (e.g., technical management teams) and then passed on to other types of groups (e.g., medical emergency teams). Interaction bias occurs when the algorithm is created and a narrow basis, latent bias, occurs when the algorithm identifies something incorrectly based on historical data or stereotypes.
It should be mentioned that the gathering of information we depict in this paper may also have a darker side. As Zuboff (1988) mentioned in her prescient book The Age of the Smart Machine, these information gathering devices could be used for surveillance and control. It is possible to imagine management tools that employ data from such interactions to monitor employees. Further, a deterministic view of technology adopts a rather dark perspective on the role of technology (Law, 1987), suggesting that technology not only has negative influences on social relationships but that it may take control on its own. A constructivist perspective, on the other hand, would regard technology as an enabler and facilitator of social relationships. We are, in this article, firmly on the latter side, but it is interesting and worthwhile to mention that history has shown quite a few examples of technological developments used not for human good. This could include not only military applications, but also implementation as a management tool for recording the misconduct of employees. Fordism or Taylorism for instance, were both firmly linked to technological developments, and had some very dark consequences for organizational employees.

Methodology and the Measurement of Dynamic Phenomena
Traditionally, small group researchers have had to choose between qualitative (e.g., interviews) and quantitative (e.g., questionnaires) methods to investigate a research question and are expected to thoroughly understand their chosen method without having to rely on outside experts. This will not be possible using new technologies such as shown here; there has to be some degree of interdisciplinary cooperation to successfully carry out such a project. Several of the articles refer to early work on group interaction and link this work to their own research using new types of technologies. They emphasize a consistent technology as an important tool to gain new understanding using old classifications in combination. But at the same time, some research provides insights into the weaknesses inherent in the sensor technology and the data processing tools that follow. For example, synchronization of time between sensor units is an extremely important issue when considering processed data such as interruption of speech or mirroring of body movement in group meetings. It will also be important to use other methods to complement the results collected from the sensors.
Although this type of research will be exploratory and use relatively new methods and technology, it may already be based on some established approaches. Methodological fit (Edmondson & Mcmanus, 2007) and measurement fit (Luciano et al., 2018) provide starting points and may be important for the iterative process of validating research projects and results. But they will probably not provide a clear and unambiguous way to design a research project with this type of technology.
As we have discussed, important aspects that influence group interaction, such as context and group members mental models, are not possible to measure with sensor technology. We are able to measure the micro units of communication, but not the meaning they have to the individual. Mead (1934) talks of gestures as social acts, where the gesture of one party is both a response to, and input to, another party. It is through such an interaction that behavioral meaning is established. One may argue that Mead and Bales have something in common through the focus on expression of emotions. Mead draws on Wundt (1874) in arguing that gestures express emotions. A gesture in itself may not be intended to communicate, it may be a bodily disposition (e.g., such as a shiver when you have fever); but it is observed, interpreted, and responded to by another. Although Bales's SYMLOG rating scale is, to a certain degree, intended to measure meaning (Osgood et al., 1957), we need other instruments, such as questionnaires or interviews to get the full picture. This is also true when attempting to map group members' mental models and to understand group context. Group dynamics will be different in different contexts. The stable and structured context in a car workshop or a hospital's operating theater, may be well suited for sensor measuring, while the more complex dynamics in a research group will be difficult.

Interdisciplinary Collaboration
Equations, definite values, and a belief in absolute and measurable truths are some of the fundamentals in many technical and engineering disciplines. The positivistic paradigm (e.g., Guba & Lincoln, 2005) is not quite aligned with the post-positivist paradigm to which most social scientists consciously or unconsciously adhere. This may be one reason for the low rate of relevant articles published in social science journals and also one reason that engineers find Bales's work appealing. Bales's more positivistic approach, including the use of categorization, computers, and the development of SYMLOG shows an interest in a holistic method suitable for future digitalization of human behavior.
One serious problem in interdisciplinary research is the fact that it is difficult to get such research published. Journals are highly specialized and easily reject papers that do not fit their profile or reviewers' knowledge base. Lehmann-Willenbrock et al. (2017) discuss the topic related to social science and computer science: "Publications in one set of disciplinary venues have little influence on the other discipline," and "Scholars may collaborate but maintain a strong presence in their disciplinary area." They also provide some suggestions for advancing mutually beneficial research. Anderson (2008) argues for scientific discoveries in the future, that are not based on previous research, but instead on data itself, and concludes with "What can science learn from Google?" A few years later, in 2012, Google tried to find the most effective team based on data from over 180 active teams (Duhigg, 2016). They were looking for perfect combinations of personalities among others, but 2 years later they gave up. The data couldn't really explain how the most effective teams functioned. But when looking to established team theories, they found that concepts, such as psychological safety, could better explain successful teams at Google. This practical example illustrates the importance of combining established theories with new forms of technology to get better insight.

Future Research
Seen from the perspective of social science, we need to look into new approaches to research on teams, such as combining inductive and deductive research (Tonidandel et al., 2016), more descriptive research (Kozlowski, 2015), and a focus on using big data approaches as a complement, not a substitute for the existing methods. This will demand greater use of mixed methods, and maybe more importantly, an acceptance of new forms of research to be published in established journals.
Other fruitful approaches will be to develop more reliable technologies. As Kayhan et al. (2018) and Chaffin et al. (2017) find, having a reliable internal clock in sensors is important for within-sensor variation but even more important as a foundation when looking into dynamic constructs combining data from several sensors. Defining constructs that cannot be measured with this type of technology is also an issue that need to be considered. One example is the construct shared mental models that probably cannot be measured using sociometers. We need valid research to resolve these issues.

Conclusion
We argue that use of sensor technology to map interaction in small groups, combined with the systematized understanding of the phenomenon embedded in theories and models, such as Bales (1950Bales ( , 1985 work may start a new era in small group research. The burden of manual observation can be reduced through automation, and the potential of big data analysis and machine learning may enlighten patterns of group dynamics of which we are not now aware. Perhaps nearly forgotten perspectives, such as Lewin's Field theory, will be revisited. However, the current technology has some severe limitations, especially when it comes to practical applications.
What will the result be for practitioners in the long term, getting access to sensors that can collect data at the individual level, processing these at the group level, and thus predicting the group's progress toward a desired goal? Will this be an automated process by which each member of a group receives clear feedback on their own behavior and how it can be improved for the benefit of the group? Already, suppliers of wearables collect data and provide feedback and concrete improvement suggestions related to exercise and sleep patterns, among other health factors. Apple Watch is approved for heart rate monitoring and thus predicts, and provides feedback on, heart problems. But this is at the individual level; the complexity increases exponentially with the transition to measuring interaction at the group level. Much more research is necessary before such feedback and predictors can be implemented in organizational teams. In addition, issues such as context and a group's level of purpose will entail a need for interpretation of results performed by competent and trained personnel. The use of sensors in groups and organizations will also result in many ethical issues that must be elucidated and resolved before this can be an established way to develop groups in real organizations. On the other hand, many technical professions find that dealing with relationships in an organizational context can be uncomfortable. In those cases, the use of sensors may be seen as an objective component in measuring interaction in groups, so that individuals will not have to subjectively evaluate others or be evaluated by close colleagues. This may make it easier for the assessments to be accepted in those groups and give more credibility to the results, because it will be difficult to argue against data collected with modern electronic wearables and processed with AI. So, an automated feedback process may be applicable, at least in situations where teams operate in noncomplex environments and their work tasks are standardized.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.