An investigation into various visualization tools for complex biological networks

Network biology has become crucial to understanding the complex structural characteristics of biological systems. Consequently, advanced visualization approaches are needed to support the investigation of such structures, and several network visualization tools have subsequently been developed to help researchers analyze intricate biological networks. While these tools support a range of analytical and interactive features, it is sometimes unclear to a data analyst or visualization designer which features are of most relevance to biologists. Thus, this study investigates and identifies essential factors for the visualization of complex biological networks using a mixed methodology approach. Based on the findings, essential factors were categorized as either generic and heuristic, where the former concern different analytical and interactive functionalities, such as an efficient layout, advanced search capabilities, plugin availability, graph analysis and user-friendliness, while the latter concern usability, such as information coding, flexibility, orientation and help.1 Furthermore, the findings indicate that 12 of the 15 generic factors identified were moderately important, while all 10 heuristic factors identified herein were moderately important.


Introduction
Network biology has become a vital research concept in revealing the structural features of biological systems, 2 but the study and modeling of advanced biological processes necessitate the creation of highly integrated networks that can handle heterogeneous and complex data. 3 As a result, numerous biologists and bioinformaticians routinely study and elucidate biological networks using interactive graphs, enabling the mapping and classification of signaling pathways, as well as anticipating the functions of unidentified proteins. 4 Meanwhile, the ubiquitous nature of big data across many fields, including biology, as well as advancements in computation have led to the emergence of many complex networks, the primary goals of which include modeling and understanding real, complex systems. 5,6 Special properties, such as being small-world 7 and scale-free, 8 are key indicators of complex networks, 9 where the former is the value derived when the average path length scales logarithmically and when the clustering coefficient is higher than the random network of the same size. 10 Conversely, the latter is a functional form that cannot be changed in a multiplicative factor while rescaling independent variables. 11 Visualization is an essential concept used to understand and analyze data. 12 Currently, several visualization methods and tools are available, having been introduced in the literature, but due to the magnitude 1 School of Computing, Newcastle University, Newcastle-upon Tyne, UK 2 and intricacy of biological datasets, it is difficult to obtain useful information from interaction networks. 13,14 Furthermore, Cromar et al. 15 mentioned that due to this difficulty, knowledge of big molecular assemblies and physiologically active fragments has not been well-captured in the research. Therefore, the literature has introduced a wide range of methods and techniques that can be used to develop, represent and evaluate biological networks. 16,17 Further, in response to Pavlopoulos et al.'s 12 assertion that advanced methods are required for the visualization of biological networks given their complexity, several network visualization tools have been introduced to assist researchers in studying complex biological networks, some of which include Cytoscape, 18 Gephi, 19 Medusa, 20 Ondex, 21 Osprey, 22 Pajek, 23 and Proviz. 24 Given the large number of network visualization tools made available to biologists, as well as the consequential challenge of reviewing and choosing the right one, a hypothesis was generated to determine which qualities are most critical to the efficient and effective visualization of complex networks. This list of quality factors can facilitate the analysis and comparison of complex network visualizations and enable investigators to grasp each tool's critical components. Thus, this study focuses on identifying appropriate factors that can be utilized to assist designers and users of existing tools for the visualization of complex biological networks in improving and selecting the most suitable tools for different purposes.

Biological networks
Complex network theory spans many disciplines, from computer science to the biological and molecular sciences, and within these disciplines, many biological networks exist, including protein-protein interaction (PPI) networks, 25 gene-regulatory networks (GRNs), 26 signal transduction or metabolic networks 27 and biomedical networks. 28 Concerning PPI networks, in computational biology and bioinformatics, such models as affinity purification, 29 pulldown assays, 30 yeast two-hybrid (Y2H), 31 mass spectrometry, microarrays 32 and phage display 33 are used to identify protein functions from their relationships and interactions with other biomolecules. 12 Further, concerning GRNs, control over gene expression in cells is assigned to the regulatory network. As such, the study of gene regulatory networks on a large scale is now feasible with the help of data collection, analysis and visualization tools. 34 Signal transduction networks, alternatively, use multi-edged directed graphs to visualize and represent interactions within various bioentities (proteins, chemicals or macromolecules). 35,36 In addition, studying the transmission of the signal can be done either from the outside toward the inside of the cell or by investigating transmission within the cell. 12 Metabolic and biochemical networks are powerful tools for investigating and studying metabolism patterns in various organisms. Similar to bacteria in humans, modifications to the biomedical reaction network can be done via modern techniques for sequence operation. 37 Further, biological networks can be described using computer-readable formats, such as Systems Biology Markup Language, 38 Proteomics Standards Initiative Interaction (PSI-MI), 39 Chemical Markup Language, 40 Cell Markup Language and the Resource Description Framework. 12

Data visualization and visualization tools
The emergence of big data and its associated challenges has recently piqued the interest of researchers across many sectors, including healthcare, academics, information technology (IT) and government. [41][42][43] Other manual operations are now being digitalized, 44 producing another set of data requiring suitable visualization tools. 45 Thus, the analysis, interpretation and presentation of the results also face serious challenges in a meaningful way. 46 One of big data's greatest challenges is visualization, as the best tool for structured data will be incapacitated for unstructured data. 47 For instance, visualizing complex biological data requires advanced techniques to identify patterns in the data structure, which then aid in making decisions that fit the data content.
Data visualization can be defined as a method of unveiling data content by graphically presenting and conveying messages. According to Munzner, 48 data visualization is defined by how a designed dataset provides a visual representation of data to help people carry out their work more effectively. While visualization has been used for centuries to communicate data, the associated challenges and opportunities have greatly changed with the emergence of big data. Conventional data visualization methods are becoming inefficient and obsolete, considering the rate at which data are generated.43 Big data have five main characteristics, known as the ''5Vs'': huge volume, high velocity, high variety, low veracity and high value. 46 The main problem relates not to the processing of huge amounts of data but to the diversity among the data. 46 For instance, biological data present numerous complexities, and only networking approaches can assist in visualizing such data. Examining the content of a genome goes beyond what can be visualized via bar charts and histograms: first, the data structure must be investigated, and suitable visualization tools with relevant features are developed. Hence, choosing the right tools and features is crucial to obtain accurate information from the data. 49 Major network analysis factors include structure and dynamics, which are the by-products of network science. 50 Concerning the network structure, the importance of a node is measured by nodal centrality, 51 and network communities can detect similar nodes. 52 Whether nodal centrality or network communities are used, both approaches endeavor to identify essential nodes in complex networks. 53 Further, the manifold's width of data visualization has increased significantly due to the development of digital technology through internet advancements, due primarily to such visuals as graphs and graphic diagrams. 54 Thus, visualization has become a great tool for information analysis and sharing, 55 and it is essential to the scientific process because, no matter the research significance, if policymakers, other experts or the public cannot grasp the science presented to them, society will not profit from its results 56 ; as such, visualization uses images to represent data to give viewers a clear understanding, 57 and it enables the comprehension of highly complex biological data, 58 making it possible for administrators to understand much data quickly and easily at first glance on the state-of-the-art network. In addition, it offers decision-makers the power to visualize analytics to help them comprehend complex ideas and patterns. 59

Network visualization
Network visualization focuses primarily on interpreting, interacting, identifying and exploring the patterns within a dataset, 60 for which several tools have been developed. The investigated tools are classified into two major sections: 2D and 3D visualization tools. By investigating them, it is possible to understand their algorithms and the data structures they utilize, as well as to explore their application domains and learn about their capabilities and features. These features reviewed in this section to enhance our understanding of them.
2D network visualization tools. Among the 2D visualization tools analyzed are Cytoscape.js, Osprey, Medusa, ProViz, Pajek, ONDEX, Gephi and Tulip, and overviews and key feature analyses of the selected tools are provided in Tables 1 and 2 respectively. Su et al. 61 explored the use of Cytoscape 3 for biological network data, the main advantage of which is that it offers users an interactive and versatile visualization interface with which they can easily navigate available features to explore network data. 61 Their work highlighted other features added to Cytoscape 3, which only advanced users can access. As rendering interactive graphs in a web browser is among its most frequent use cases as a visualization software component, it can be implemented in this capacity easily, and it can be utilized heedlessly, which is helpful for graph operations on a server, such as Node.js. 61 Oeltzschner et al. 62 analyzed Osprey as an opensource processing approach toward reconstructing and estimating magnetic resonance spectroscopy (MRS) data, and they used it to load a series of MRS data formats and carry out phased-array coil combinations, as well as to determine the frequencies and phase corrections of transients. An MRS voxel co-registers an anatomical image, so it was found that Osprey has the capacity to load, process, model and quantify MRS data successfully using different conventional and spectral editing methods.
Meersche et al. 63 explored Medusa's ability to predict protein flexibility in sequences, having derived Data import, data processing, layout algorithms, visual representation, interaction and exploration, analysis algorithms and others.
With regard to the graph size (number of nodes and edges) that might be managed and visualized, Tulip's basic framework and low-level data structures are optimized to meet challenging objectives.
Node and edge creation, node and edge removal, node and edge editing, selection and grouping, drag-and-drop, copy and paste and undo and redo.
protein homologous sequences and amino acid physiochemical features from evolutionary trends to serve as inputs for a convolutional neural network. This was possible using the Medusa tool due to its flexibility, as its output was found to allow users to identify highly deformable protein regions and the general dynamics of protein properties, though it is important to note that the tool is currently inactive. Jehl et al. 24 discussed ProViz, a web-based visualization tool, to investigate the functional and evolutionary features of protein sequences. With the goal of streamlining the study of proteins' operational and developmental characteristics, ProViz, a potent browser-based tool, was created to assist biologists in developing concepts and designing experiments. Resources outlining the modular architecture of protein, sequence variations, post-translational modifications, structures and experimental characterizations of functional areas are used to derive feature information automatically. The data are presented via a user-friendly, interactive visualization medium, made available via a straightforward protein search tool, enabling people with modest bioinformatic expertise to obtain appropriate information quickly for their research. User-defined data can also be added to visualizations via manual customization or by using a representational state transfer (REST) application programing interface (API).
The Windows program Pajek evaluates and visualizes huge networks with dozens -sometimes even millions -of vertices, 64 and its primary objectives are to offer users a powerful visualization tool and to develop various effective algorithms for investigating huge networks. The tool was primarily built on the experiences gained while developing the libraries of the graph data structures and the algorithms for graph and X-graph, which are network analysis and visualization programs that identify transformations, numbering, partitions, maximum flow, random networks, hierarchical components, decompositions, citation weights, k-neighbors, the critical path method (CPM), paths between two vertices, vectors and counts in NET. 64 Concerning heterogeneous biological networks, Taubert et al. 21 visualized and explored Ondex Web, an updated version of the Ondex data integration platform that includes new network visualization and research characteristics. The appearance of heterogeneous biological networks may be explored and altered easily by users thanks to such novel capabilities as context-sensitive menus and annotation tools. Further, open source, Java-based Ondex Web is effortlessly embeddable as an applet into websites, and data can be uploaded onto Ondex Web in a variety of network formats, including Pajek, XGMML, OXL and NWB.
Jayamohan and Chatterjee 65 analyzed Multiviz, a Gephi plugin that uses a multi-layer network-scalable tool to visualize complex networks that are also multilayered. They discovered the availability of different settings that can be used to transform extant multilayered networks, which shows that the Gephi plugin can visualize multi-layered data in complex real-life situations.
The TULIP framework was created to foster extensibility and reusability 66 ; generally, it encourages the implementation of new technologies and scientific collaborations, and it gives users the option to build rapidly and browse via cluster trees or graph hierarchies (nested subgraphs). These methods have served as a key visual framework for the research team, as they frequently supply data analysts with the necessary answers.
Allegri et al. 67 designed and developed a new network-based visualization tool called CompositeView, an open-source application developed in Python. It mainly improves the visualization and extraction of complex interactive networks, increasing the chances of obtaining actionable insights. The authors found that although CompositeView was developed to visualize network data using ranking properties, it functions better on non-network datasets.
3D network visualization tools. Some of the 3D visualization tools analyzed include Arena3Dweb, CellNetVis, Graphia and OmicsNet, overviews and key feature analyses of which are provided in Tables 3  and 4 respectively. The first web program to enable the visualization of multi-layered graphs in 3D space was Arena3Dweb, which is entirely dynamic and independent. 68 Users of Arena3Dweb can combine numerous networks with their intra-and inter-layer connections into a single view, and a wide variety of inter-and intra-layer layouts and network indicators is available for node scalability, with easy use by beginners on a web browser. Moreover, it was created using R, Shiny and JavaScript, and it supports weighted and unweighted undirected graphs.
CellNetVis creates an adaptive network structure where nodes are organized into flexible cellular components using an iterative force-directed process, 1 where a correctly documented network in the XGMML format serves as the tool's input. It provides some capabilities that are crucial to modern biological network analysis and that are not offered by other tools, including simultaneously being web-based, supporting enormous networks and automatically displaying nodes within their cellular components.
The open-source platform Graphia was developed for the graph-based analysis of the massive volumes of quantitative and qualitative data currently being produced from research on cells, genes, proteins and metabolites. 69 Computing the correlation matrices of any tabular matrix, whether of discrete or continuous values, is at the heart of Graphia's capabilities, and the program is built to demonstrate swiftly the frequently enormous graphs that emerge in 2D or 3D space. Another web-based application, called OmicsNet, was designed to simplify for users the creation, visualization and analysis of multi-omics networks for the exploration of intricate correlations between lists of relevant 'omics traits. 70 Some highlights include a new 3D module called layout and improved network visual analytics, with 11 2D graph layout possibilities. It includes steps to enhance study reproducibility by introducing the companion OmicsNetR package, linking R command history and creating ongoing links for the exchange of interactive network views.
Comparison of 2D and 3D network visualization tools. From the in-depth analyses performed in Tables 2 and 4, it is clear that 3D visualization tools provide an enhanced user experience, offering interactive features and making it easy to explore a 3D space. Conversely, 2D network visualization tools provide several layout algorithms that can be used to provide a wide range of visualizations. Therefore, depending on the requirements, the analyzer must select the proper tool to maximize their output.

Factors for evaluating visualization tools
The factors that can be adopted in the evaluation and selection of visualization tools for different purposes are classified into generic and heuristic. This section will describe these factors, including how they were derived and their use among the reviewed network visualization tools.

Generic factors
There are 15 generic factors derived from the literature (see Table 5), of which ''Factors in evaluating visualization tools'' was used as a keyword to search for scientific publications in such online databases as Google Scholar, Research Gate, JSTOR, IEEE and Science Direct, among others. A number of publications were discovered and sorted using the keyword ''generic factors,'' and different factors in each publication were identified and ranked based on the number of publications in which they appeared. Each of the 15 factors appeared in more than three journals; hence, they were included in the study.
A brief discussion of the generic factors is included below.
Filtering Tools (GF_1): According to Heberle et al., 1 this is one of the options used to improve network layouts. Rusch et al. 71 also used the filtering tool to eliminate redundant probe sets in the genetic locus. Filtering tools are also used to enable compatibility between different file formats and visualization systems and to enable easy network manipulation. Baitaluk et al. 72 evaluated Cytoscape and VisANT as filtering tools, indicating that while Cytoscape has flexible filters with different nodes and edge attributes, VisANT has several available ''select'' filters. Furthermore, Kohl et al. 73 investigated the filtering features of Cytoscape, ProViz and VANTED, indicating that the tools probe session-relevant features of the networks. In addition, Yeung et al. 74 stated that in Cytoscape, the numerical attributes were filtered to determine the minimum and maximum values, and nodes and edges within this range were identified.
Plugins (GF_2): The presence of plugins in network visualization and analysis is a major factor, as they are important means through which advanced users can extend and customize applications. Millán 75 stated that Cytoscape helped to access PPI repositories and the BINGO plugin for the GO enrichment analysis of the resulting network. Furthermore, Koh et al. 76 mentioned that Cytoscape offers visualization features that may be combined to display complex information. Moreover, Cline et al. 77 investigated Cytoscape, Osprey, VisANT, GenMAPP, BioLayout Express3D, PATIKA, CellDesigner, PIANA and ProViz, and they found that additional functionality in such areas as download services and data integration was provided. Furthermore, Gerasch et al. 78 stated that, compared to other systems, the network visualization model of BINA is based on hierarchical graphs and focuses on interactive and comprehensive visualizations for signaling high-quality networks.
Visual Styles (GF_3): Due to their increased perceptibility and ability to highlight patterns in complex information, visual styles are essential for data visualization. 79 Moreover, they play a key role in ensuring effectiveness, as they avoid confusion. 80 Visual styles are linked to themes, each of whose file contains a section defining which visual style will be used while it is active. A visual style comprises numerous components that come together to form a seamless whole that supersedes the sum of its parts.
Advanced Search (GF_4): The advanced search feature is important in a visualization tool, as it allows narrowing of the search query's scope to exclude unrelated information, so a user can easily find the specific content they want. However, with Google, this feature limits the results of complex searches. 81 It is a straightforward application that enables a recursive search for files utilizing a basic filter, and it allows a file filter to batch-move files from one folder to another. Baitaluk et al. 72 stated that Cytoscape enables node name searches on the graph, while no search option is available in VisANT. In addition, after investigating Ondex Web and Cytoscape Web, Taubert et al. 21 stated that an advanced search helped the user obtain information relevant to the network inputs.
Free/Open Source (GF_5): Including this feature in a visualization tool enables its users to overcome limitations, as well as to examine and validate source code independently and to rely on the user and volunteer community. This feature does not require permission to access, study or use tools; moreover, it is accessed free of charge. An open-source application is one in which the executable binaries that constitute the program are distributed together or wherever in the source code that generates the application. 82 Faysal and Arifuzzaman 83 stated that SocNetV, Cytoscape, Gephi, TULIP and Pajek are all free and open-source software, though Pajek has commercial and noncommercial versions.
Graph Analysis (GF_6): The embedding feature is critical, as it requires less time and effort to organize data while combining more data points or sources, rendering it easier to work with. The data it analyses can also be modeled, stored and retrieved. In many contexts, identifying, visualizing and analyzing links between items are referred to as graph analysis, a process Kohler et al. 84 identified as supported by Ondex, which maps and automatically links data from various heterogeneous sources, unlike other graph-based systems.
Feedback to Users (GF_7): This is an important feature of a visualization tool, as it measures its ease of use. Users may be asked to share their opinions on the system under assessment as part of the usability evaluation process, and considerations for this may relate to the system's appropriateness to the usage context, expectations of usability issues and design recommendations. 85 Faysal and Arifuzzarman 83 stated that only Gephi and SocNet were identified as having good reporting strategies, while TULIP and Cytoscape offer limited options for users to save the resultant graphs from operations in specific formats.
Efficient Layout (GF_8): These primarily entail the most important features of visualization tools. 86 A layout algorithm is a specific tool that helps expedite the production of various diagram types, and it automatically organizes diagram elements. 87 The algorithm determines the placement of diagram shapes and connectors based on predetermined principles and arranges them so that even the most complicated diagrams are understandable and instructive. 88 Yeung et al. 74 investigated Cytoscape and stated that applying the layout to the network produced a more vivid visual representation of the data and rendered the network structure more interpretable. Moreover, Pavlopoulos et al. 89 stated that most of the tools have several sophisticated layout algorithms, although TULIP is highly recommended.
Scalability (GF_9): The capacity of a system to adapt its performance and costs to changes in application and system processing demands is known as scalability, 90 and the capacity to scale will allow the visualization tool to grow without compromising its user effectiveness or caliber. Pavlopoulos et al. 89 stated that while TULIP is preferred as a medium-scale network, it is not as scalable as Gephi. Further, Cytoscape cannot scale well with analysis, while Pajek outperforms other tools as the most scalable for network visualizations. Furthermore, Faysal and Arifuzzaman 83 identified Gephi and Cytoscape as effective tools for scaling vast networks due to the tools' ability to read and visualize all of the networks presented in the table.
Different File Formats (GF_10): It is important that different file formats are supported by visualization tools, as they determine how data are utilized. Standard methods for encoding data for storage within a computer file are referred to as file formats. The way bits are utilized to encrypt data in a digital storage medium is specified, though both proprietary and open-source file formats are available. Cline et al. 77 stated that creating an image file of network data through Cytoscape is comparatively easy, while Pavlopoulos et al. 89 stated that Cytoscape is the most suitable tool, as it accepts several input file formats compared to Gephi and TULIP, while Pajek is the least suitable, as it is not flexible in its input file format.
Text Mining (GF_11): The ability to analyze vast amounts of information quickly makes text mining an essential visualization process, 91 where important connections between elements that could not have otherwise been discovered can be revealed by mining. Text mining is a method for obtaining insightful and underlying features from textual data sources to discover knowledge. 92 The data are in the form of images, videos, audio and texts. Kohl et al. 73 stated that Cytoscape, VANTED and ProViz provide flexible and advanced text-mining capabilities.
User Input and Customization (GF_12): A user input comprises whatever data a computer receives for processing. 93 In contrast, user customization entails adapting a basic product design concept to a single customer's wants. 94 User input is critical for visualization tools because it is essential to the identification and comprehension of the functional and technical requirements a product must meet. This information also guides less obvious but often equally significant qualities, such as fulfilment, acceptance or esthetics. Customization allows users to choose what they want to see or to set preferences for the information arrangement or presentation process. Because it gives consumers power over their interactions, it can improve the user experience. Suderman and Hallett 95 stated that while most tools support improved functionality in graphic user interfaces (UIs), most tools' functionality is often insufficient for specified tasks.
Runtime performance (GF_13): This concern how well a page functions when active. Because every browser utilizes a different JavaScript engine, runtime performance figures may vary for every browser. 96 Thus, runtime performance is critical to consider in a visualization tool.
User-friendliness (GF_14): This concerns the ability of a system to offer an environment in which its users may complete tasks safely, effectively and costeffectively using straightforward, logical, dependable and effortless features, among its many shared qualities. Generally, user-friendliness is desired in terms of the qualities considered stimulating and engaging to users, 97 and it is critical to visualization tools, as it ensures they will be well-designed and easy to use. A visualization tool can increase its use by being more user-friendly, as users' will keep it in mind and use it again the next time they need it, if they believe the tool was created with them in mind. Pavlopoulos et al. 89 stated that TULIP is the strongest tool in terms of user-friendliness, particularly compared to Gephi and Cytoscape, which are good and medium, respectively, while Pajek is weak.
Strengths (GF_15): Data visualization is critical for firms to identify data trends, relationships and structures quickly, processes that would ordinarily be time-consuming. 98 Moreover, analysts can perceive concepts and new patterns thanks to the graphical depiction of datasets. 98 In addition, a proper understanding of a quintillion bytes of data is difficult without data proliferation, which incorporates data visualization into the rising rush of daily data. 44 Pavlopoulos et al. 89 stated that Pajek's strength is in its variety of layout algorithms, while PIVOT, Medusa and ProViz are best suited for PPI visualization. PATIKA enables the efficient visualization of transitions, BioLayout Express3D offers various approaches to microarray data analysis and Osprey's filtering capabilities make it a powerful tool for network manipulation. Meanwhile, Ondex's strength is in combining heterogeneous data types into one network, and Cystoscape's strength is in visualizing molecular networks.

Heuristic factors
A brief discussion of heuristics factors 10 is provided here (see Table 6), where ''Factors in evaluating visualization tools'' was the keyword used to search for scientific publications in such online databases as Google Scholar, Research Gate, JSTOR, IEEE and Science Direct. Several journals were identified, and they were narrowed down using ''heuristics factors'' as keywords. Different factors in each journal were identified and ranked based on the number of journals in which they appeared. Each of the 10 factors appeared in more than two journals, so they were included in the study.
Information Coding (HF_1): This requires changing information, such as a gesture, image, sound, word or letter, into a different form, sometimes abbreviated, for transmission across a communication channel or storage on a medium. Forsell and Johansson 99 stated that realistic techniques and added symbols improve information perception, while Vaataja et al. 100 agreed that they are useful for information visualization through the mapping of data objects into visual elements, such as graphics, symbols and visual cues. Lastly, Williams et al. 101 stated that dataset mapping is utilized for visual elements.
Flexibility (HF_2): This entails the capacity to adapt to user requirements, 102 as flexibility is important to visualization tools because it enables the easy design of controlled visualization stylings. 103 Vaataja et al. 100 stated that flexibility in network visualization refers to easy access to and available means for users to customize the interfaces of visualization tools to understand the processes, working strategies and task requirements.
Orientation and Help (HF_3): Orientation summarizes information related to the tool, and it is important because it provides accurate and concise information concerning visualization tools. The help feature aids users in obtaining support from the tool's developers, and it is a visualization tool because it enhances the user experience. Williams et al. 101 stated that orientation and help are functions that provide support for users to control the level of detail, action and representation of additional information.
Minimal Actions (HF_4): This involves testing actions, where the action accepts a file as an input and outputs an identical file with the prefix ''bak'' before the final suffix, which is the bare minimum it can perform. Vaataja et al. 100 stated that minimal actions refer to the extent of workloads based on the number of actions required to complete a task.
Prompting (HF_5): This alludes to the text or symbols used to indicate a system's preparedness to execute the instructions that follow. A textual depiction of the user's location could alternatively be prompted. This feature is vital to a visualization tool because it helps elicit an action. Forsell and Johansson 99 stated that promotion refers to using a guide or prompts to support users in taking specific actions and providing alternatives within the system, done through data entry or by serving as a guide in the performance of other tasks.
Consistency (HF_6): This feature of distributed systems guarantees that all nodes or replicas have a similar data view at any given time, regardless of which user has modified the data. 104 This feature is important to visualization because it gives the viewer a sense of organization, which aids in a better understanding of visualized data. Vaataja et al. 100 stated that consistency enables the system to become more predictable and improves learning and generalizations, including errors in using the system. Spatial Organization (HF_7): This feature emphasizes identifying and categorizing the geographic space in which human activities take place and in which spatial structures are created, 93 and it is crucial to incorporate into visualization tools because it helps interpret what is visualized. Forsell and Johansson 99 stated that spatial organization refers to the orientation available to users in the information space, the efficiency in space usage, the distribution of layout elements, legibility and precision and the alteration of visual elements.
Recognition rather than recall (HF_8): This implies that users can identify the data, object or event as being acquainted from memory. 105 Conversely, recall means locating relevant knowledge in the memory, 106 which is relevant to visualization tools because it makes actions, objects and options visible to reduce users' memory load. Meanwhile, Vaataja et al. 100 stated that recognition rather than recall is used to reduce users' memory load by making actions, instructions and objects more visible and easier to recognize or retrieve.
Removing the extraneous (HF_9): This involves presenting the largest amount of data possible using only a small amount of ink by determining whether additional information will serve as a distraction or will limit the visualization process. 99 Dataset reduction (HF_10): This minimizes large datasets to hasten the testing process, 107 which is crucial for visualization tools because it reduces storage costs and improves performance and storage capability. Forsell and Johansson 99 stated that dataset reduction would help users redirect their focus to areas of greater interest or relevance and toward an understanding of the available datasets.

Methodology
This study gathered information through a mixed methodology, using a combination of quantitative and qualitative research provided in Figure 1. to study the subject matter, 108 allowing the researcher to avoid the constraints associated with employing a single approach and enhancing the knowledge gained in relation to the stated research problems. 109 Further, it can leverage the benefits and limitations of both strategies and is especially helpful when dealing with complicated, multidimensional challenges. 110

Data collection
Qualitative research involves gathering descriptive opinions and experiences using various methods, including interviews, observations, focus groups and case studies, 111 the former of which were utilized in this study. Conversely, quantitative research produces numerical data or data that can be converted into useful statistics to measure a specific concept using questionnaires, surveys, etc., 112 the latter of which was adopted in this study. The interviews consisted of open-ended questions, whereas Likert-scale questions were utilized in the survey, 5 created and distributed via the Newcastle University Online Surveys tool to gather responses. The survey and interviews were carried out independently, and the interview participants were not required to complete the survey before participation. The survey captured the participants' demographic information and asked them to rate five general statements related to each factor identified in the literature. Alternatively, the interview questions were more dynamic and related to the factors, including participants' opinions about them, particularly their importance when adopting a specific network visualization tool.

Data analysis
The qualitative data were analyzed using thematic analysis, an approach that focuses on the discovery, description, rationalization of, as well as interconnections between themes. 113 Steps utilized for the thematic analysis were derived from Dawadi, as follows 114 : Step 1 (Becoming acquainted with the information): The first step (data familiarization) begins with the investigator's desire to get to know their data, as the name suggests. The types (and quantity) of themes that could be discovered through the data are determined during this step.
Step 2: (Producing the initial codes): Investigators can begin making notes on prospective data items of interest, on inquiries, on linkages between data items and on other initial ideas after the familiarization process of Step 1.
Step 3 (Searching for themes): The main goal of this phase is to identify trends and connections present among the complete data collected.
Step 4 (Reviewing themes): All themes are deliberately gathered to improve the initial grouping of topics and to display them in a more organized manner.
Step 5 (Defining and naming themes): After the thematic map is improved, the investigator proceeds to Step 5, where each of theme is characterized and narratively described by explaining its significance to the larger study.
Step 6 (Producing the report/manuscript): A write-up of the final review and a summary of the results is the last step.

Participants' demographic information
Interview responses were gathered from five participants and survey responses from 98; even though the number of participants gathered for the study is relatively low, the representation of participants ensures equality in general concerning age, job position, domain, experience with visualization tools, etc.
Age. It is important to note that there is no participant representation from the 18 to 24-year age group. However, to overcome this drawback, 34% of survey participants were aged 18-24 years. The statistics in Table 7 show that most survey and interview participants were within the age range of 25-34 years (at 37% and 40%, respectively), while participants aged 45-54 years comprised the fewest survey participants (7%) and the majority of interview participants.
Job position. The various job positions held by the participants are outlined in Table 8. The specialist category includes computer specialists, customer service agents, data analysts, economists, writers and engineers, constituting 7% of participants, according to the table, while 1% were biomedical specialists from Newcastle University. Meanwhile, lecturers comprised most of the survey participants, while postdocs comprised most of the interview participants.
Experience. For the survey, participant selection focused on individuals with varying years of experience, but for the interviews, selection focused on individuals with relatively more experience. Table 9 shows the experience statistics of the participants, where those with 1-3 years' experience comprised most survey participants (51%) and those with 6-10 years comprised the majority of interview participants. Knowledge and experience with visualization tools. There are additional visualization tools, apart from those specified in Table 10: MATLAB, Spike, QuPath, PowerBI, Tableau, NetworkX, etc. However, from the survey, Pajek tops the list, followed by Cytoscape, even though during the interviews, Cytoscape was a widely discussed visualization tool compared to Pajek. The table below shows the topbottom order in which the respondents rated the visualization tools.

Findings
The factors identified from the survey and interviews with the participants were divided into two major categories: generic and heuristic.

Generic factors
The survey respondents' ratings of the importance of the generic factors are provided in Figure 2. The overall score indicates the level of importance of the factors for the network visualization tools in general according to all participants. As such, Figure 2 shows that the respondents consider filtering tools to be the most important factor, at 73%, followed by user input and customization, at 58%. Graph analysis and the benefit of the tool to its users follow closely at 57%, and userfriendliness comes next at 56%. Meanwhile, the percentage of responses for scalability was 55%, while an efficient layout, plugin availability and runtime performance ranked 54%, followed closely by visual style, text mining and user feedback, at 53%. The participants' response rate for different file formats was 51%, but an advanced search showed 48% and open source (free) had the lowest percentage (44%). Thus, of the 15 generic factors, the majority (12) are considered of moderate importance to the survey participants, as the score ranges between 50% and 60%. Furthermore, the factors ''Advanced Search'' and ''Open Source (free)'' were considered less important (below average) compared to other factors. Nevertheless, the factor ''filter tools'' was considered key (73%) in assessing and selecting a visualization tool for complex biological networks. From Table 11, it is clear that the mean values of the generic factors range between 3.41 and 3.88, indicating that the ratings received from participants are mostly moderately positive. Moreover, the range of the standard deviation is from 0.52 to 0.65, which specifies that the variation between ratings from responses was relatively low. Therefore, it is possible to assume that the opinions of the participants were almost similar.
The interview responses substantiated the survey responses, indicating that open-source or free licensing   is not compulsory or preferred if the visualization tools are suitable for analyzing a complex biological network. It is also specified that the analysts are ready to pay any price for the license for a visualization tool to analyze complex biological networks. Moreover, the interview participants stated that the basic search feature is sufficient and that there is no need to perform an advanced search. Filter tools are also considered essential for visualizing complex biological networks, as they assist the analyst in extracting the relevant segment of the large chunk of data.

Heuristic factors
The importance of the heuristic factors identified by the respondents is outlined in Figure 3.
It is essential to realize that the survey responses from the participants indicate that all 10 heuristics factors are of moderate importance, as the scores range between 50% and 60%. The interview specified that all heuristic factors should be considered when developing a good graph visualization tool for a complex biological network. Accordingly, Figure 3 shows that the heuristic features with the highest percentage (59%) include information coding, flexibility and consistency, closely followed by prompting and recognition rather than recall, at 57%. Orientation and help follow, at 54%, as well as minimal actions. Spatial organization and dataset reduction comprised 54% of responses, while the heuristic factor with the lowest percentage concerns removing extraneous, at 53%. Consequently, from the interview responses, it is safe to assume that even though all heuristic factors are of moderate importance, information coding, flexibility and consistency are considered the most crucial.
From Table 12, it is clear that the mean values of the heuristics factors range between 3.49 and 3.61, indicating that the ratings received from the participants are mostly moderately positive. Moreover, the range of the standard deviation is from 0.55 to 0.68, which specifies that the variation among the ratings from the responses was relatively low but slightly high compared to the generic factors. Therefore, it is possible to assume the opinions of the participants were almost similar.   Remarkably, the survey and interview participants were conversant in the graph visualization tools and the generic and heuristic factors. For example, the interview participants mentioned that Cytoscape is one of the most widely used tools, and several issues related to Cytoscape factors were discussed, among which was the tool's filter feature, which the survey responses also confirmed was the most important among the generic features of the visualization tools. Moreover, all participants confirmed that it is difficult to identify a solution that adequately handles large graphs.

Conclusion
This research studied essential factors for evaluating and selecting a visualization tool for complex biological networks. It employed a mixed research approach to gather responses from participants having a wide range of backgrounds and experience using graph-based visualization tools. In total, 98 participants responded to the survey questions, and five were interviewed to obtain detailed responses.
Importantly, the responses received from the survey and interviews corresponded with each other. From the interviews, it is clear that the users prefer 3D to 2D visualization, as well as that some network visualization tools, such as Cytoscape.js, Gephi and others, which are primarily known for 2D visualization support, offer 3D visualization via plugins. In addition, the interviews provided detailed opinions about and justifications for the factors. This study divided the 25 factors identified into two major categories: generic and heuristic, where the former total 15: efficient layout, advanced search, plugin availability, graph analysis, user friendliness, runtime performance, visual style, text mining, different file format, filtering tools, benefits of the tool to its users, user feedback, user input and customization, scalability and open source (free), and the latter 10: information coding, flexibility, orientation and help, minimal actions, prompting, consistency, spatial organization, recognition rather than recall, removing the extraneous and dataset reduction. The findings indicate that all generic factors except advanced search, open source (free) and filtering tools are moderately important. Furthermore, the advanced search and open source (free) factors are less important compared to others, whereas filtering tools are key considerations as network visualization tools.
The findings indicate that all heuristic factors were essential, and the interview respondents added that they should be considered when developing a visualization tool for a complex biological network to increase the tool's user-friendliness. Future studies should assess the different factors of visualization tools to rate their applicability to complex biological networks.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental material
Supplemental material for this article is available online.