vignettes/tinselR.Rmd
tinselR.Rmd
Across the United States, public health laboratories perform whole-genome sequencing for many pathogens, a milestone for protecting public health by subtyping organisms at a higher resolution than was previously possible. These high resolution subtypes can be used to determine the relationships between organisms, which can be visualized with phylogenetic trees. Once the phylogenetic relationships of pathogens are known, empirically derived thresholds can be used to identify possible outbreaks, and additional epidemiological data can be added to the visualization. In combination, these data can be used to inform the design of investigations, to confirm the occurrence of an outbreak, and identify potential transmission routes. If appropriate, interventions such as the recall of contaminated products and public announcements may be issued. Thus, creation and markup of phylogenetic trees is an essential component for this public health workflow. Our goal was to develop an open-source graphical user interface (GUI) for phylogenetic tree visualization and annotation usable by persons without specialized bioinformatics or data visualization skills. Given that the R programming language contains some of the gold standard packages for phylogenetic analyses and visualization (e.g. ape, and ggtree), we used the Rshiny framework to develop tinselR (pronounced tinsel-er) to provide GUI access to the tools in ape, ggtree, and other key packages. tinselR’s minimum input requirement is a Newick formatted phylogenetic tree. Once loaded, user-selected inputs change the appearance of the displayed tree. For example, a user can quickly transform tip label formatting. By adding a genetic distance matrix or metadata file or both, the user can include annotations on the image, relabel tips, or add a heatmap to the phylogenetic tree. These modified tree images are downloadable in various formats (pdf, png, or tiff) for presentations, publications, or other communications with collaborators. Below we will detail and outline how to use the application, using one of the example datasets.
When the application is launched, the user can test out the application by using one of the pre-loaded datasets located in the ‘Example Data’ tab (Figure 1). We provide three datasets (i.e. Newick-formatted tree, genetic distance matrix, and metadata file) already combined with the number of isolates ranging from 14 - 19. These data are either Eschericia coli (NCBI Bioproject: PRJNA218110) or Salmonella enterica (NCBI Bioproject: PRJNA230403). After clicking on the ‘Example Data’ tab (Figure 1), the user selects from the drop down menu one of the combined datasets (e.g. example data 1, example data 2, and example data 3; - see Figure 2).
Figure 1: Landing page for tinselR with ‘Example Data’ tab highlighted in blue.
Figure 2: Pre-loaded example dataset 1 is selected and can be seen once the user presses the ‘Visaulize Tree’ button.
Displaying example tree and genetic distance data
Figure 3: ‘Example Data’ tab with available tree visual parameters highlighted within the blue box.
Figure 4: Pressing ‘Visualize Tree’ button will allow the tree to be viewed on screen. Within the larger blue box, part of a phylogenetic tree is displayed with tips right aligned. The smaller blue box indicates that the ‘align the tips’ box has been selected.
Figure 5: With tips of interest highlighted, pressing the ‘Add Annotations’ button will add annotations to the tree image, which indicate the range and median of SNPs for these tips of interest. The tree with annotations is within the larger blue box, while the buttons for adding/removing annotations are in the smaller box on the left of image.
Included example metadata for tip correction and heatmap
Figure 6: Example metadata 1 with the third column included with information used for adding a heatmap.
Figure 7: Tree image with annotations and a heatmap of collection source displayed within blue box. The add heatmap is also within the blue box on the left of the image.
Downloading your image
Once you are happy with the way your tree looks, you can download the image in either pdf, png, or tiff formats (Figure 8). Make sure you adjust the height and width of the image that you download. Note that once the image has downloaded, if you want to re-download with anything changed, just adjust as you wish and push the ‘Download’ button again.
Figure 8: Tree image with annotations and a heatmap of collection source displayed. The download image options is within the blue box on the bottom of the image.
Figure 9: ‘Data Upoad’ tab for user’s own data upload The larger blue box contains the three dropdown menus for the three different files a user can upload. The smaller blue boxes indicates the ‘Data Upload’ tab.
Second, given problems that can occur during user upload, we provide file upload error messages in hopes of helping, you the user, with checking that all three files are concordant in terms of tip label information. As these concordant tip labels are the main mechanism of connecting the three files. If there is variation in the three files, a message will be displayed to the user regarding how the file tip labels differ. File checking happens independently of if the user wants the information. By that we mean, a user can ignore the file check messages if they do not need them and proceed using the application. Please note that this concordant tip checking will only display a complete message if all three files are uploaded (tree, genetic, and meta). Beyond these two things, the application is exactly the same between the example data tab and the upload data tab, thus this is why we highly encourage users to play with the example data first to familiarize yourself with the application.
treeio::read.newick
),to read in a Newick-formatted tree. treeio is associated with the package ggtree, which extends the ggplot2 plotting system to phylogenetic trees. In theory, any application that can produce a Newick-formatted tree should be able to be uploaded. The tip labels in the Newick tree, distance matrix, and metadata files must match before upload. The genetic distance matrix file must contain a square matrix of single nucleotide polymorphism (SNP) differences between the tree tips (Figure 10). The metadata file is a table of additional information to be changed or displayed on the tree. The primary function of the metadata file is to relabel the tips on the tree image. The header of the first column must be Tip.labels, and it must contain the labels for all tree tips in the uploaded Newick file (Figure 6). The alternative identification labels can be provided in the metadata file using the column header Display.labels in column two. If desired, users may include additional columns in the metadata file, such as the collection site (Figure 6), and display the information in a heatmap next to the tree. Headers for these other columns in the metadata file are flexible because they are not automatically recognized and used by tinselR. Acceptable formats include CSV, TSV, and TXT for the genetic distance and metadata files. Users can set file types independently for each input.
Figure 10: Genetic distance matrix example with tip labels that match the Newick phylogenetic tree.
Note, we have only tested the application using trees with ~ 30 tips with success, beyond that tip number we are interested to see how the application does, so please do let us know.
Another caveat is that if the user does this specific sequence of steps, the tree image will be wiped and not include clade annotations already drawn on the tree. However, we expect this sequence of events to rarely happen.
Order of events -
If you have problems, requests, or thoughts, please file an issue.