A chronological and geographical analysis of personal reports of COVID-19 on Twitter from the UK

Objective Given the uncertainty about the trends and extent of the rapidly evolving COVID-19 outbreak, and the lack of extensive testing in the United Kingdom, our understanding of COVID-19 transmission is limited. We proposed to use Twitter to identify personal reports of COVID-19 to assess whether this data can help inform as a source of data to help us understand and model the transmission and trajectory of COVID-19. Methods We used natural language processing and machine learning framework. We collected tweets (excluding retweets) from the Twitter Streaming API that indicate that the user or a member of the user's household had been exposed to COVID-19. The tweets were required to be geo-tagged or have profile location metadata in the UK. Results We identified a high level of agreement between personal reports from Twitter and lab-confirmed cases by geographical region in the UK. Temporal analysis indicated that personal reports from Twitter appear up to 2 weeks before UK government lab-confirmed cases are recorded. Conclusions Analysis of tweets may indicate trends in COVID-19 in the UK and provide signals of geographical locations where resources may need to be targeted or where regional policies may need to be put in place to further limit the spread of COVID-19. It may also help inform policy makers of the restrictions in lockdown that are most effective or ineffective.


Introduction
In this annotation project, we are interested in classifying tweets as indicating that the user, or member of their household, has been exposed to the Coronavirus or has contracted or is experiencing common symptoms of COVID-19 ("Probable Case" ), instances where the user indicates that they were in a situation where it may be possible they have been exposed, or had possible contact with a confirmed or suspected case, or are exhibiting some possible symptoms ("Possible Case") or not having any such indications ("Other Mention"). Our corpus consists of tweets that contain the mention of a certain keywords related to coronavirus. These include, <fill in keywords>. Each tweet will be classified as either a Probable Case, Possible Case, or Other Mention, based on the information in the tweet. The purpose of this document is to define the indicators of each class and describe the criteria that the annotators should use to determine whether the tweet is a Probable Case, Possible Case, or Other Mention. The annotated data will be used to train automated classification systems.
The annotation guidelines are an evolving document and changes and updates will be made over time. All updates will be noted and dated in the Guideline Revision Information section

Annotation Tool
For these annotations, we will use an Excel spreadsheet. The spreadsheet will contain certain information about each tweet such as userId, tweetID, date, drug name and the tweet text. Annotator will have a column to place the appropriate code (0=Other Mention, 1=Probable Case, 2= Possible Case). Additionally, there is a "Notes" column for the annotator, defined in the following section.

General Guidelines
Each tweet should be classified with only one code.
The 'Notes' column is not required to be used but is there for annotators to place any comments or notes that they have about annotating that tweet.
For this study we will consider not only the user (person tweeting) but also discussions about members of their households when determining the correct class annotations. Household members include spouses, children at home, roommates and any other relative (eg, parent, aunt, cousin) if it can be determined that they reside in the same household.
The rest of the guidelines will define and describe for annotators each class and indicators in the tweets that can be used to assist in determining the correct classification.

Probable Cases
Probable cases are those that indicate that the user, or a member of their household:  has contracted Coronavirus disease  and/or expresses that he/she has been tested or diagnosed with it or;  self-diagnoses as having Coronavirus disease (COVID-19) and is symptomatic or;  expresses having been directly exposed to Coronavirus but is asymptomatic There are several indicators, or topics of discussion, that should be considered by the annotator when determining if the tweet should be classified as a Probable Case including diagnosis, testing for the virus, experience of symptoms, direct exposure to someone with confirmed or suspected COVID-19. While some tweets may contain more than one indicator, only one is needed to classify the tweet as positive.

Diagnosis
The user states that they, or a member of their household, have been diagnosed with, or are recovering from, COVID-19. Testing The user discussing getting tested or wanting to get tested for themselves or a household member. For these tweets, we will assume that the user, or their household member is seeking testing due to being exposed or symptomatic, even in the absence of such a situation being mentioned in the tweet. Tweets discussing testing should be classified as Probable regardless of whether the person was able to obtain testing or has received the results of the testing. However, if the user states they were tested and the test was negative, then the tweet should be classified as Other Mention (see: In (iii), the user is stating that they have been tested and are awaiting results. In (iv), the user has been refused testing and though we cannot infer whether they have a legitimate reason for wanting testing, we will mark these cases as positive potential cases.

Symptoms
User describing experiencing symptoms that match those listed as the most common to COVID-19, according to the WHO and the CDC, including fever, coughing and shortness of breath or difficulty breathing; and/or lesser experienced but more unique, reported symptoms such as loss of smell (anosmia) or taste (ageusia). Additionally, users who state that they have pneumonia or flu-like symptoms and/or have tested negative for these should also be coded as positive.
Mentions of symptoms that are sometimes present but not the most common symptoms associated with the disease listed above, should be annotated as possible ( In (v) & (vi), the user is stating signs of infection as well as a negative result for a flu test. In (vii), the user mentions one of the unique symptoms of the disease.

Single main symptom mention
While cough, fever or shortness of breath mentioned on their own can be attributed to other diseases, as they listed as one of the main symptoms by the CDC and WHO, we will define their mention, even in the absence of mentions of other symptoms, as an indicator of the positive class unless it is ascribed to another reason (eg, choking on something, smoking, asthma, etc.) (See: Other Mention: Symptoms).
xi. Everybody's scared over nothing, I know I'm not getting Coronavirus. I just got a light cough.

xii.
A morning of incessant coughing and sending four emails and I'm exhausted. These crap lungs will be the death of me even before Coronavirus hunts me down....

Mentions having flu or pneumonia
Given the similarity of symptoms, if the user mentions that they, or a member of their household, has the flu or pneumonia but there is no indication that they have been tested for either and/or it is possible they are self-diagnosing, these tweets should be classified as positive. Direct Contact with Diagnosed/Suspected COVID-19 patient The user states they, or a household member, have been in prolonged contact with a patient who has/or is suspected to have COVID-19. The annotator can infer contact in instances of a close family member being mentioned, eg, spouse, child, or of someone where the probability they have interacted recently may be high, such as a co-worker. My roommates is currently coughing a lot and throwing up, and blowing his nose a lot. This #coronavirus may be more real then I thought time to suit up. https://t.co/MSC7ZAt2CV Self-Isolating/Self-Quarantine The person states that they are in insolation or quarantine due to the possibility of having contracted or knowingly being exposed to the virus xiv.
Gen Z here, almost everyone in my family has been in contact with someone who was diagnosed with COVID-19, so I'm under strict quarantine and I couldn't be happier.

Possible Cases
Possible Cases are those that indicate that the user, or a member of their household:  has been in a situation or place with a higher probability of exposure to Coronavirus, or  mentions that someone near them in a confined space was exhibiting possible symptoms of COVID-19, or  is experiencing symptoms that may be present with the disease, however these symptoms are not listed as the most common symptoms by WHO and the CDC These are cases where the user, or a member of their household, were in a situation with increased risk for exposure or are exhibiting signs of some illness, however, there little confirmatory evidence present in the tweet that they were definitely exposed to the virus. As such, the evidence in the tweet may not be as strong as those categorized as Probable cases. There are several indicators or topics of discussion that should be classified as Possible Cases including traveling by public transportations, or visiting a doctor's office or hospital and/or being in the presence of someone exhibiting signs of sickness, exposure to someone who should be in quarantine even with no mention of that person exhibiting symptoms, or the user talks about someone with confirmed or suspected COVID-19, however, it is not clear that the user has been in recent close contact with that person.

Travel
The user states that they or a member in their household are, or have recently been traveling, such as by airplane, cruise ship or train, including public transportation. For these, there should be evidence that the person actually traveled and is not discussing future plans (see: Other Mention: Travel). i.

Disembarked a flight this a.m Denpasar&gt;Mel (Tul) &amp; spent at least 35 mins inescapably rubbing shoulders with many families arriving from all over #China. Mark my words, the airport is waiting to claim it's first #coronavirus victim if it hasn't already. #publichealth
ii.

We haven't got a bloody chance of containing this virus. Half of my packed train carriage is sneezing and sniffing and I haven't seen a single tissue or handkerchief. #CoronaVirus iii.
Customs officer apologized on the way out for the wait. Said

Testing Positive for Flu or Pneumonia
Given the similarity in symptoms and the fact that it is unclear whether a person can simultaneously be infected with both disease, tweets mentioning that the user, or a member of their household has tested positive for the flue of pneumonia should be classifies as possible.

Unknown Disease State
The user is in a place where close quarters increase the chance of contracting the virus, however, the person they have been in contact with is exhibiting some symptoms but the cause of the symptoms unknown or conjecture by the user. Direct Contact with Someone who may have been exposed The user is in contact with a person who has a higher risk of exposure due to their recent activity but there is no confirmation that the other person was exposed Tweets discussing other people exhibiting possible symptoms but it is not evident that the user is in close contact with those people:

xi. Y'all need to stop coughing without covering your mouths because I don't want the coronavirus
Tweets that discuss the user is exhibiting one of the main symptoms but attributes its cause to a nonhealth related, or other underlying health condition:

xii. I started coughing because I choked on my water but everyone lookin at me like I got coronavirus
Travel Any travel that is being planned or has not yet occurred:

xiii. I hope this coronavirus scare doesn't ruin my cruise next month
Self-Isolating/Self-Quarantine User is discussing being in quarantine or isolation due to general recommendations of social distancing or shelter in place orders, but not due to having been in contact with anyone positive for COVID and showing no symptoms.