Keywords are missing: Insights from the publication keywords, abstracts and titles of an environment and human health research group

Inequalities within academia – and the research outputs of academic – are a widely acknowledged problem. This results in the reproduction of knowledge gaps within academic praxis. The current study presents a case study from an environment and human health research group, looking at the extent to which the research outputs mirror the wider knowledge gaps in the field. We use systematic review search methods to obtain publications for an environment and health research group since 2010. We use a combination of EndNote and VosViewer to analyse the frequency of key words and concepts in the titles, abstracts and keywords of these publications. We retrieved a total of 950 publications between 2010 and 2022. We find significant gaps with respect to key concepts appearing in the titles, abstracts and keywords of publications. We find that terms such as ‘colonisation’ and ‘racism’ are not mentioned at all. We reflect on the production process of academic research with respect to reproducing blind spots within environment and human health research. We discuss our results in the context of calls to make academic research more inclusive.


Introduction
Inequalities within the university systems that create and reproduce knowledge have long been recognised (Gwayi-Chore et al., 2021).Extant inequalities with respect to research funding awards (Ginther et al. (2011); NIHR (2022); UKRI (2022)) and senior academic positions (e.g.professorships (Rollock, 2019)) represent an academic 'disease'.Indeed, a recent UK call for research into the effects of COVID-19 on Black, Asian and Minority Ethnic (BAME) communities resulted in no Black principal investigators being awarded research funding (Inge, 2020).We outline evidence of one such symptom of this disease: the creation and reinforcement of knowledge gaps within academic praxis.In particular, we present a case study focussing on the academic publications from a research group focused on the intersections of environment and human health.
Health and environmental disparities are preventable differences in health outcomes between different sociodemographic groups or characteristics.Health disparities arise for a variety of reasons, from systemic factors such as access to healthcare facilities to inter-personal factors including unconscious bias (Jeffries et al., 2019).Health disparities exist with respect to a wide variety of sociodemographic characteristics including income (Coveney et al., 2016), race (Zavala et al., 2021), gender identities (Scheim et al., 2022) and sexual identities (Fredriksen-Goldsen et al., 2013).In light of this, the value of intersectional approaches to understanding health disparities is crucial (Homan et al., 2021).
There are a wide variety of documented health disparities.An example of a racial health disparity concerns maternal mortality, with evidence showing that Black women are more than four times as likely to die in childbirth compared to white women in the UK (Limb, 2021).Such disparities have been noted for more than 20 years and better cardiovascular monitoring during pregnancy and childbirth could reduce this disparity and prevent unnecessary deaths (MacDorman et al., 2021).In addition, the COVID-19 has exacerbated existing racial health disparities and created new ones (i.e.Covid infections and deaths) (Lopez et al., 2021).
There is also evidence that transgender people face discrimination in accessing basic healthcare services (James et al., 2019) and that healthcare providers lack basic knowledge around genderinclusive language and concepts (Carabez et al., 2015).This often results in transgender people avoiding necessary medical procedures and drives health disparities between transgender and cisgender people (Padilha et al., 2022).Further, gay people (men especially) also encounter attitudinal barriers and discrimination in accessing healthcare, notably with respect to HIV treatment (Griffith and Jackman, 2022) but also in other healthcare settings (Pratt-Chapman, 2021).
In a similar fashion, environmental research has identified a number of inequalities.A working paper from the United Nations Department of Economics and Social Affairs highlights the potential for climate change to exacerbate these existing inequalities (Islam and Winkel, 2017).The close link between environmental policy and public health outcomes (through various mechanisms including, but not limited to, air and water quality, food environments and waste handling as well as overall environmental degradation as seen in urban environments (Johnson and Lichtveld, 2017).The notion of environmental justice (and injustice) has gained increasing prominence over the years as our understanding of environmental inequalities has improved (Bullard, 1993;Lester et al., 2019).
There is a lack of diversity in environmental research, despite the frequent acknowledgement of the need for internationally collaborative solutions.For example, Gallegos-Riofrío et al. (2022) highlight that the production of knowledge with respect to the health and wellbeing benefits of natural environments are largely based on under representative (i.e.Western/White) samples.The majority of studies identified used either predominantly White participants or otherwise did not specify the ethnicity of participants.Tandon (2021) further highlights the lack of diversity within climate research, showing for example that over 80% of the most cited papers on the topic are lead authored from North America or Europe.This has real implications in the production of knowledge regarding the impacts of climate change.Indeed, Callaghan et al. (2021) show that studies in high-income countries are twice as likely to show an attributable effect of climate change compared to studies conducted in low-income countries.The desire for more diverse research study populations is in line with calls for more environment and human health research using non-WEIRD (Western, Educated, Industrialized, Rich and Democratic) research samples (Apicella et al., 2020;Henrich et al., 2010).
In this paper, we present an analysis of 950 publications by an environment and human health research group over the past decade.We analyse the keywords, abstracts and titles of these papers to present co-occurrence maps showing the most commonly occurringand co-occurringwords within the publications.Further, we search for specific terms relating to important sociodemographic characteristics as well as intersectionality and decolonisation.The discussion reflects on the value of this process and provides some baseline recommendations for shining a light on the knowledge gaps identified within these publications.

Background
Based in a UK university, the research and teaching group focuses on the nexus of environment and human health, with strong specific interdisciplinary research interests in areas such as Antimicrobial Resistance (AMR) in the environment, health and wellbeing from natural environments, food security and climate change and health.
The research group employs approximately 100 people.Whilst the demographics of the research group membership have inevitably changed over time, a recent snapshot showed that the research group is an overwhelmingly White space.Further, 60% of respondents to an anonymous survey indicated that they were female and 20% reported having a disability.There had been some initial activities around intersectionality, racism and colonisation in the research and teaching of the research group, but no concerted effort to explore these important issues.
The project ran from December 2021 through May 2022.Activities included: compilation of resources in racism and colonisation, research group survey by external consultants, in person interviews and a range of webinars by expert Project Advisory Group members.This paper focuses on a review and exploration of the peer reviewed publications of the research group since 2010.

Aims and objectives
Within this wider context of this project on equity in the environmental and human health sciences, the primary aim of this work was to give an overview of the publications by people working in the research group (including by PhD researchers) since 2010.In particular, we were interested in a number of aspects relating to these research outputs, and as such our objectives were based on investigating: • The geographic locations where research has taken place • The extent to which publications included international collaborators as co-authors • The extent to which this group's research has considered marginalised groups and colonial contexts with research outputs We were interested in these aspects as a way to gain insight into the kind of knowledge gaps that may exist within the research outputs of the research group.In this sense, we use the research group as a case study that is emblematic of the knowledge gaps within the wider disciplines of environment and human health research.

Process
This activity was a recurring agenda item for feeding back regular updates to the wider team and there were periodic specific meetings at various stages of the process (i.e.developing the search strategies, planning how to use the list of publications, planning and conducting the keyword analysis, reviewing findings and conclusions).These meetings were open to the entire project team and conducted on the same basis and principles as all project meetings and activities.They also frequently involved practical work being done during meetings: for example: collaboratively building search strings within EndNote.

Literature searching
A search was run on Scopus for the affiliation name with several terms to cover variations.This returned 802 records which were exported into EndNote.For the first 100 records returned in Scopus, each named author from the research group was selected.A search was then carried out for each author using their unique Scopus identification number.Publications that did not include the research group in the author's affiliation were only added to EndNote if published during or within a year after their contract with the research group, and if their affiliation was broadly 'with the University'.After the first 100 records, the remaining names of researchers were combined with the affiliation, and then author IDs searched separately in combination with the University affiliation and in combination with the author names, but made no additional finds.An institution search was also run in MEDLINE and CAB Abstracts, then de-duplicated against the EndNote library (the full search strategy is available in an Appendix).
The searches were run in early February 2022.A total of 962 records attributable to research group authors were found and saved to EndNote of which 897 came from Scopus, 46 from MEDLINE and 5 from CAB Abstracts.A further 12 duplicated records were manually removed, leaving a total of 950 records for analysis.Revised publications (e.g.updated book chapters) are still included where a new edition is indicated.We did not conduct any searching related to grey literature.

Co-Occurrence maps
VOSviewer is a piece of software that can be used to create bibliometric network visualisations in the form of co-occurrence maps (Eck and Waltman, 2009).This involves mapping the most commonly occurringand co-occurringwords in the titles, abstracts or keywords (TAK) of a set of academic papers.Words that co-occur more frequently are placed more closely together within the map, creating thematic clusters of co-occurrences.Further, maps can be colour coded according to the year of publication; this can therefore give insights into how certain topics have gained or fallen in prominence within a literature over time.VOSviewer uses typical academic bibliography files and supports a wide range of input files including SCOPUS, web of science and EndNote.
A thesaurus file is used by VOSviewer to combine similar or functionally identical keywords together.This was used to merge similar terms such as 'men' and 'male(s)' or 'women' and 'female(s)'.This is essential since in this case the two most commonly occurring keywords are 'human' and 'humans'.The thesaurus file is available on request.

Frequency analysis
The frequency of specific words occurring within the TAK of the research group papers was searched for using EndNote.The searches were designed to investigate whether the published research had identified demographic variables (e.g.gender, disability and LGBTQ+) as well as other key words of interest (e.g.inequalities, racism and colonisation).
To gauge the locations about which the group research had taken place, we searched for specific country names in the TAK of the papers to see how frequently they were mentioned.Specific country locations occurring within the author address (which includes the author's institutional affiliation) were also searched to see the extent of international collaborations within the research group's publications.All United Nations member states were searched.While this approach neglects disputed territories (among others), it gives good coverage of the locations in which the group research has taken place.Search strings accounted for common synonyms that researchers might use (e.g.'ageing' and 'aging').
It should be noted that some journals automatically (semantically) add indexed keywords to papers in addition to the keywords provided by the authors.We did not distinguish between indexed and author-added keywords; this affects our results only in very specific edge cases and does not substantively affect our results.We discuss such an 'edge case' in the limitations section below.

Summary statistics
We begin by summarising the journals that appear most frequently, and the number of publications published per year.Table 1 contains a list of all the journals containing at least 10 of the research group 950 publications.As can be seen, members of the research group frequently publish in high impact factor health and/or environment journals.
Figure 1 shows the number of publications per year across the research group from 2011 through 2021.The number of publications per year has broadly increased over time, markedly so from 2018 onwards when the number of publications exceeded 100 for the first time and has increased year on year since then.As such, the majority of the research group publications are in the latter half of the sample -630 of the 950 publications were published in 2016 or later.

Co-occurrence maps
We begin by presenting a keyword cooccurrence map showing the most commonly occurring keywords across the 950 publications.A map showing all keywords that appear at least 50 times (a total of 37 keywords 1 ) can be seen in Figure 2 (an interactive version is available online at https://app.vosviewer.com/?json=https://drive.google.com/uc?id=1m1mgDTJ WpfOmJcITV5DHwIPWcBcPvW5u).Larger circles represent more frequently occurring keywords.Keywords that appear more frequently  A number of thematic clusters can be seen in Figure 2.For example, there is a cluster of keywords associated with non-human based studies that tend to be controlled studies with a focus on genetics and antimicrobial resistance (AMR) in the environment.Further, there is a cluster around major clinical studies that cooccur with cohort-based keywords around gender (male and female) and age (age; child, adolescent and young adult).The other major cluster contains human public health studies, using systematic review and qualitative research methods, and focussing on diverse issues such as physical activity, mental health, the environment and climate change.
Figure 3 presents a map showing the 325 most commonly occurring words across the titles and abstracts (an interactive version is available at https://app.vosviewer.com/?json= https://drive.google.com/uc?id=1U5DSbqyU2NL pRXQjfosymQ0YS7dcFPQM).In this instance, colour coding indicates the average year of publication in which the word was used, with darker (bluer) words being used, on average, further back in time compared to more recent brighter (yellow) words.
Figure 3 highlights a shift over time.Words that appeared more frequently further back in time (i.e.2016 or earlier, on average) relate to more clinical work, with words such as treatment, patient, disease and infection.By contrast, words that appeared more recently in time (i.e.2017 or later, on average) indicate a shift away from clinical approaches to more subjective and experiential approaches, incorporating words such as 'experience' and 'context', as well as a sharper focus on environmental aspects such as 'pollution' and 'conservation'.

Research and Co-author locations
We turn now to analyse the locations in which (a) the research has taken place and (b) the collaborators/co-authors were based.Some papers did, of course, mention more than one country in their TAK or author address section, due to comparative studies and multi-national collaborations, respectively (the latter being far more common).Of note, Switzerland is home to a lot of publishers and appears with a copyright marker at the end of a significant number of unrelated abstracts.As such, for Switzerland, only the titles and keywords of publications were searched, and abstracts were omitted.
Figure 4 contains both a treemap and a geographic world map of countries mentioned in the TAK of the publications.As can be seen and is unsurprising, the most commonly occurring country in the TAK of the papers is the United Kingdom (UK) where the research group is physically based (as such the UK is omitted from the geographic map since mentions of the UK are almost four times the next most commonly mentioned country, the United States).Whilst the United States, Australia and China are the next most commonly occurring countries, an overall majority (around two thirds) of all countries mentioned are European countries.The US, Australia and China contribute the majority of mentions for North American, Oceanic and Asian countries, respectively.A comparatively few studies are focused on countries in Africa, Asia and Central or South America.In terms of co-author locations, Figure 5 presents a treemap and geographic map of countries appearing in the author address/ institutional affiliation field: giving insight into the geographic location of non-UK based coauthors of publications (the UK is again excluded due to appearing in all author address fields by default).The United States and Australia are by far the most frequently occurring locations for coauthors on the research group publications, both of which unsurprisingly have strong track records of producing world leading research in the fields of environment and human health.Europeanbased co-authors also appear frequently.Coauthor locations broadly mirror the locations of research.

Key word frequency analysis
We turn now to important specific terms appearing within the TAK of the papers.As can be seen in Table 2, several terms are completely absent from the TAK of research group publications.Gender-and age-related terms both appear frequently; for example, gender-related terms appear in the TAK of approximately a quarter of the publications.This is consistent with both gender-and age-related terms appearing in the keyword and title/abstract cooccurrence maps (though notably this does not apply to the other specific terms, including 'non-binary' and 'transgender').
Other important topics receive significantly less attention.Race-and sexuality-related terms receive very few mentions, indicating that these topics are not often the specific focus of the papers.For example, race is mentioned in the TAK of eight publications, whilst ethnic (or ethnicity) was mentioned in 27 publications.The term racism is not mentioned at all.
Further, whilst the term colonisation does technically appear twice, both uses of the word colonisation were with respect to antibiotic resistance bacteria; and therefore it does not appear in our intended context at all.As such, both 'intersectionality' and '(de-)colonisation' receive no mentions.

Discussion
We present an overview of nearly a thousand recent academic publications from an environment and human health research group.We highlight several gaps within the titles, abstracts and keywords of these publications that speak to knowledge gaps within the wider environment and human health literature(s).This ranges from the absence of important topics (e.g.colonialism and intersectionality) to the absence of specific forms of marginalisation (e.g.racism or sexism).Further, several important sociodemographic characteristics, but most notably race, receive a lack of attention that is not proportional to recognised health and environmental disparities.
We also find that the research group has conducted research regarding a broad range of countries, although 2/3rds of countries mentioned in the research group's publications are either the UK or in Europe.Beyond the UK and Europe, the US and Australia were the most commonly occurring research locations.It is therefore notable that most of the countries covered in the group's research outputs are considered WEIRD.
Efforts are being made within the literature to properly consider the generalisability of findings derived from WEIRD sample populations (Muthukrishna et al., 2020).Further, members of the research group have extensive co-author networks, especially throughout Europe, the US and Australia.This is, at least partly, driven by the internationally collaborative nature of environmental research in particular.
Systematic and/or structural change would need to be accompanied by individual and group learning/unlearning as well as of awareness of systemic racism and colonialism.Trisos et al. (2021) outline five shifts to improve environmental and ecological research, these are: (1) decolonize your mind; (2) know your histories; (3) decolonize access; (4) decolonize expertise; and (5) practice ethical ecology in inclusive teams.
We found the approach to be a useful one for looking at academic knowledge gaps in a systemic way, which searches of specific branches of literature (i.e.typical systematic reviews) may be unable to properly highlight.Through the dissemination of results back to the members of the research group, the process was also a valuable tool in prompting discussion and reflection on the reasons for the knowledge gaps and the academic value of richer, intersectional research praxis going forward.The approach taken in this paper will hopefully represent a starting point for others wishing to undertake a similar endeavour.

Limitations
Our search for papers associated with the research group is likely to be missing some records which should have been included (including policy reports and grey literature).This typical problem with systematic review searching was compounded by our tight timeline for completing the project.Whilst some papers may have slipped through the net, we are confident that the set of included papers gives a strongly representative overview of the research outputs of the research group.
Since the focus of this paper is on missing words with academic research outputs, it would be remiss not to acknowledge that our own search strategy has limitations.Whilst we searched for all member nations of the UN, this may miss important disputed territories or regional studies.Further, whilst our specific search terms were carefully and purposively constructed, we acknowledge that we may still be subject to biases within our own praxis.
As noted above, some journals automatically add indexed keywords to their papers, and these keywords may not also be appropriate and can create "edge cases" of inconsistency.For example, there is only one paper mentioning Saudi Arabia in the TAK, but this was an indexed keyword added to a paper about Christian pilgrimage to Lourdes although the paper itself does not mention Saudi Arabia once.We suspect this problem had a limited influence on the overall results presented in this paper.

Future directions
Future studies wishing to follow a similar approach may wish to go beyond the scope of our paper by considering the positionality of authors with respect to the authorship order, particularly with respect to minority groups, international collaborations and early career researchers.Some practical insights for undertaking such a task can be found in Elsherif et al. (2022).
People working at the research group (and in the wider environment and human health academic community) in the future may wish to conduct a follow up study in 5 or 10 years to see how the makeup of research being conducted has evolved over time, especially as we identify changing dynamics within the core research interests of the research groupover time.

Conclusions
The results in this paper are presented as a microcosm of the wider environment and human health research literature(s); and highlight several knowledge gaps with the productionand reproductionof knowledge regarding environment and human health.
together are closer to each other within the map, creating clusters of co-occurrence which are colour coded to represent thematic clusters based on keyword co-occurrence frequency.Lines showing the 200 most commonly cooccurring keyword combinations can also be seen, with thicker lines representing more frequent co-occurrence.

Figure 2 .
Figure 2. Co-occurrence map for the most common (N = 37) words appearing as Keywords of the publications.

Figure 3 .
Figure 3. Co-occurrence map for the most common (N = 325) words in the Titles and Abstracts of the publications.

Figure 4 .
Figure 4. Treemap and geographic map for countries mentioned within the Titles, Abstracts and Keywords of the publications.

Figure 5 .
Figure 5. Treemap and geographic map for countries mentioned in the author's address/institutional affiliation of the publications.

Table 1 .
A list of journals containing 10 or more publications from the research group.

Table 2 .
The frequency of specific terms within the Titles, Abstracts and Keywords of the publications.