High-Content Phenotypic Profiling in Esophageal Adenocarcinoma Identifies Selectively Active Pharmacological Classes of Drugs for Repurposing and Chemical Starting Points for Novel Drug Discovery

Esophageal adenocarcinoma (EAC) is a highly heterogeneous disease, dominated by large-scale genomic rearrangements and copy number alterations. Such characteristics have hampered conventional target-directed drug discovery and personalized medicine strategies, contributing to poor outcomes for patients. We describe the application of a high-content Cell Painting assay to profile the phenotypic response of 19,555 compounds across a panel of six EAC cell lines and two tissue-matched control lines. We built an automated high-content image analysis pipeline to identify compounds that selectively modified the phenotype of EAC cell lines. We further trained a machine-learning model to predict the mechanism of action of EAC selective compounds using phenotypic fingerprints from a library of reference compounds. We identified a number of phenotypic clusters enriched with similar pharmacological classes, including methotrexate and three other antimetabolites that are highly selective for EAC cell lines. We further identify a small number of hits from our diverse chemical library that show potent and selective activity for EAC cell lines and that do not cluster with the reference library of compounds, indicating they may be selectively targeting novel esophageal cancer biology. Overall, our results demonstrate that our EAC phenotypic screening platform can identify existing pharmacologic classes and novel compounds with selective activity for EAC cell phenotypes.


Introduction
Combined, the two major histological subtypes of oesophageal adenocarcinoma (OAC) and oesophageal squamous cell carcinoma (OSCC), represent the sixth leading cause of cancer deaths worldwide with less than one in five patients surviving five years from diagnosis 1 . A shift in epidemiology over the last 50 years has meant the incidence of OAC now vastly exceeds that of OSCC in western countries 2 , accounting for more than 80 % of oesophageal cancers in the United States 3 . Defining the optimal neoadjuvant treatment regime is an area of active investigation 4 as, current treatments all carry a significant risk of systemic toxicity, histological response rates remain poor 5 and only a limited subgroup of patients experience any survival benefit over surgery alone 6,7 .
OAC is a highly heterogeneous disease, dominated by large scale genomic rearrangements and copy number alterations 8 . This has made clinically meaningful subgroups and well validated therapeutic targets difficult to define. Clinical trials with new molecular targeted agents have predominantly been directed towards epidermal growth factor receptor (EGFR) and human epidermal growth factor receptor 2 (HER2) receptors 9-12 but thus far have proven unsuccessful.
A potential explanation is the almost ubiquitous co-amplification of alternative receptor tyrosine kinases (RTKs) and downstream pathways leading to redundancy and drug resistance 8,13,14 . An alternative to target based drug discovery, and increasing in popularity with technological advances, is phenotypic drug discovery, defined as the identification of novel compounds or other types of therapeutic agents with no prior knowledge of the drug target. Recent advances in phenotypic screening include automated high-content profiling 15,16 . This approach involves quantifying a large number of morphological features from cell or small-model organism assays in an unbiased way to identify changes and phenotypes of interest. One benefit to this method is that a target does not need to be predefined but the mechanism-of-action (MoA) of hit compounds can be inferred by reference to known compound sets using multivariate statistics and machine learning approaches. Thus, this may prove a beneficial strategy for complex, heterogeneous diseases were target biology is poorly understood and where modern-target directed drug discovery strategies have made little impact on patient care, as exemplified by OAC.
Taking an unbiased, profiling approach to phenotypic screening, we chose to apply the Cell Painting assay to capture large amounts of information on cellular and subcellular morphology to quantify cellular state across a panel of genetically distinct OAC cell lines. Cell Painting is an assay developed to capture as many biologically relevant morphological features in a single assay so as not to constrain discovery to what we think we already know 17,18 . Therefore, upon chemical perturbation we can detect changes in a subset of profiled features allowing a phenotypic fingerprint to be assigned to a particular perturbation or compound 15,[19][20][21] . These fingerprints can then be used to identify specific phenotypic changes of interest, identify compounds that cause strong alterations in cell morphology suggesting changes in cellular state or stress, or predict MoA by similarity comparison to reference libraries of well annotated compound mechanisms 17,21   Compound source plates were made at 1,000-fold assay concentration and added to the cells with an overall dilution in media of 1:1000 from source to assay plate. Table S1).

Library concentrations (Supplementary
After 48 hours incubation in the presence of the compounds, cells were fixed by the addition of an equal volume of formaldehyde (8 %, 50 µL) (#28908, Thermo Scientific) to the existing media, incubated at room temperature (20 minutes) and washed twice in PBS. Cells were then permeabilised in Triton-X100 (0.1%, 50 uL) and incubated at room temperature (20 minutes) followed by two more washes with PBS.
The staining solution ( Table 1) was prepared in bovine serum albumin solution (1 %). Staining solution was added to each well (25 µL) and incubated in the dark at room temperature (30 minutes), followed by three washes with PBS and no final aspiration. Plates were foil sealed.

Image Acquisition
Plates were imaged on an ImageXpress micro XLS (Molecular Devices, USA) equipped with a robotic plate loader (Scara4, PAA, UK). Four fields of view were captured per well using a 20x objective and five filters ( Table 1).

Image Analysis
CellProfiler 2D image analysis CellProfiler v3.0.0 23 image analysis software was used to segment the cells and extract 733 features per cell per image. First the pipeline identified the nuclei from the DAPI channel and used these as seeds to aid a segmentation algorithm to identify the cell boundaries from the TxRed channel, and finally these two masks were subtracted to give the cytoplasm. These three masks marking the cellular boundaries were then used to measure morphological features including size, shape, texture, and intensity per object across the five image channels.

Image Preprocessing
The cell level data was aggregated to image level by taking the median for each measured feature per image. Low quality images and image artefacts were then identified and removed using image quality metrics extracted by CellProfiler. Images with less than 20 cells in them were also removed from final analysis. For the remaining images, features were normalised on a plate-by-plate basis by dividing each feature by the median DMSO response for that feature. Features with NA values were removed, as were features with zero or near zero variance, using the findCorrelation and nearZero functions in the R package Caret. All remaining features were scaled and centred globally by dividing by the standard deviation of each feature and subtracting the feature mean respectively. The pair-wise correlations were calculated for all remaining features, and highly correlated features (>0.95) were removed. Finally, the image level data was aggregated to the well (compound) level and this was used in the analysis.

Random Forest Classifier
The random forest classifier was implemented using R's Random Forest package with the following specified parameters; ntree = 500, data was stratified by class and the sample size was set to the smallest class size for balance. The images from three concentrations for each compound were pooled and treated as a single class.

Assay development
Since OAC is such a heterogeneous disease, we chose to develop a high-content The published Cell Painting protocol 17,18 was adapted for our cell lines specifically, as follows; the MitoTracker DeepRed was originally added before the cells were fixed, however, morphological changes have been seen in certain cell lines upon the addition of MitoTracker. Therefore we opted to fix the cells first and add all of the Cell Painting reagents together post-fixation to prevent artefactual morphological changes due to cell staining, and to reduce complexity for robotic handling in a highthroughput setting. This also necessitated that we re-optimise the dye concentrations across our cell panel. Here we increased the MitoTracker DeepRed concentration and reduced the concentration of Hoechst, Concanavalin A, and Wheat Germ Agglutinin and switched to a different phalloidin supply ( Table 1).

Machine learning
Standard assay quality control metrics such as Z'Factor are unsuitable for multiparametric assays, particularly cell based phenotypic profiling assays where a desired phenotype is unknown and/or there is a lack of positive controls [24][25][26] . In order to assess assay quality from a compound MoA profiling perspective we used MoA prediction accuracy on a small well-annotated reference library of compounds with well-defined, known MoA (Supplementary Table S2). For this, we trained a random forest classifier using the CellProfiler extracted phenotypic information from the images of cells treated with the reference set of compounds.
Accuracy in the ability to predict MoA was used to assess whether, the OAC and tissue matched control cell lines were amenable to the phenotypic profiling assay, further validate image segmentation was accurate, and ensure that the phenotypic Here we wanted to confirm that they were equal to the rest of the panel and suitable for the assay pipeline. OAC-P4C is a particularly morphologically heterogeneous line so it was also important to ensure that image level data can be used for phenotypic compound profiling in these types of cell lines.
In order to visualise the phenotypic information extracted, we performed two data reduction methods; PCA and T-SNE, on the well level data for the small reference library of compounds and plotted the first two components, coloured by mechanistic class. These results demonstrate that distinct compound classes (e.g. HDAC inhibitors) generally cluster together, however some classes such as statins do not produce strong phenotypes and clustering shows they are close to the DMSO controls, data for FLO-1 and MFD-1 cells are provided as exemplars. (Figure 2). In order to confirm that the classifier was not overfitting we used leave-one-out cross-validation. We implemented leave-one-cell-line-out and trained it on five of the OAC lines, testing on the remaining line. Here, as expected it performed less well overall (Supplementary Fig. S1) Figure 3A, Supplementary Fig. S2).
Based on cell growth and survival (i.e nuclei count), we identified 27 compounds that were selectively active in two or more of our OAC lines. Here, hits were defined as having a z-score of -3 or more in the OAC lines and a difference of at least 2 in one or both of the control cell lines e.g. for a hit with a z-score of -3 in an OAC line, the zscore in the EPC-2 would have to be greater than or equal to 0. This comparison was made between each OAC line and the control lines to define hits and then selected if they were selectively active in at least two OAC lines across the panel ( Figure 3B).

Compounds from the growth and survival analysis cluster into several therapeutic
classes suggesting mechanistic pathways that may be selective for OAC cell growth and survival. Classes include antimetabolites and HDAC inhibitors. These classes were also identified in the morphometric phenotypic analysis (Supplementary Fig.   S2).
We performed hierarchical clustering of cell line responses to the compounds, as determined by the Mahalanobis metric (morphometric phenotypic analysis) and the z-scores (nuclei count) (Figure 3C and D In addition, using bioinformatic approaches we hope that integration of phenotypic data with genetic data across our panel of diverse cell lines may provide insight into the selective activity of phenotypic hits and generate the basis for future genetic biomarker-based clinical trials in OAC.
Overall, our high-content OAC assay has proven effective in the identification and mechanistic characterisation of hit compounds, demonstrating its utility as a novel empirical strategy for the discovery of new therapeutic targets, chemical starting points and repurposing of existing drug classes to re-ignite drug discovery and development in OAC.  See Table 1 for additional details about the stains and channels imaged.