Visualizing TreeSpace using NLDR
Once you have inferred a set of trees, either using the Cipres Rest API or a third party program, CloudForest can be used to perform several different analyses that will allow you to learn more about your trees. In this tutorial, we will walk through the steps necessary to visualize your trees in treespace using Nonlinear Dimensionality Reduction (NLDR) within CloudForest. Download the sample.trees file here to follow along through this tutorial.
For details on installing and running CloudForest, visit the previous tutorial.
1. Upload the sample data into CloudForest
Once you are within CloudForest, expand the Get Data tab in the Tool Panel on the left of your screen, and select Upload File. In the new window, select Choose Local File, then in the file browser navigate to and select the sample.trees file on your computer. Finally, select Start to upload the file into CloudForest. Close the file window, and you should now see the sample data appear in the History panel on the right of your screen.
2. Compute a distance matrix
In the Tool Panel, expand CloudForest, and select TreeScaper-Trees. In this instance, the Input file field should autopopulate with our sample.trees file; however, in an instance in which you have multiple input files available, the desired one should be selected here. Since we are computing a distance matrix, we can leave the Output Type field as Dist. Below this, the Distance metric field allows us to select the desired phylogenetic distance metric our analysis. For this tutorial, we will be using the Unweighted Robinson-Foulds distance metric. The Distance matrix format field should be left as its default Matrix value for the purposes of this tutorial. The Weighted/Unweighted Tree and Rooted/Unrooted Tree fields allow us to tell CloudForest about the trees we are inputting. For this tutorial, we will leave them as their defualt settings: Unrooted and Unweighted. The remaining fields can be left as default. Once the settings are to your liking, scroll down and select Execute to run the job. You will see output files created in your History panel that will be yellow while the job runs, and turn green upon completion.
3. Perform an NLDR
Once the job has completed, return to the Tool Panel, expand CloudForest, and select TreeScaper-NLDR. The Input File field should autopopulate with the distance matrix computed in the previous step, or it can be selected. The Euclidean Dimension input field specifies how many dimensions to project the trees into. For this tutorial, we will project the trees into 3 dimensions, but feel free to try 2 as well to compare the results! We will leave Cost Function as CCA and NLDR Algorithm as STOCHASTIC for this tutorial (details on all available algorithms and cost functions can be found here). Once you are ready, select Execute to run the job.
4. Log in
In order to visualize the results of our NLDR, we must log in to CloudForest by selecting Login or Register at the top of the screen. Here, you may create a galaxy account by selecting Register here, or (solely for the purposes of the CloudForest beta release), you may login with the username/password admin/admin.
5. Visualize the Results
After logging in, select Visualize, and Create Visualization at the top of the screen. On the next screen, scroll down the alphabetically ordered list, select CloudForest Visualizations, and select Create Visualization. This will take you to the CloudForest Visualization hub. To view the results of our NLDR, select the NLDR Coordinates file under the NLDR dropwdown header. The results will take a moment to load, after which you can scroll down and view the NLDR plot. If you used the sample data set provided, it should look similar to the image below:
Here, by clicking and dragging on the plot to rotate the 3D image, we can see that there appears to be 3 distinct groups of trees. The sample data used here contains trees generated by three different mitochondrial genes with trees from gene 1 numbering trees 1-300, gene 2 numbering 301-600, and gene 3 numbering 601-900. To investigate whether or not the trees generated by each gene comprise the three distinct groups, we can color the individual points (trees) on the plot based on these indices. To do so, scroll down and select the Subset Plot dropdown. Then, select Enter Tree Indexes. In the input field this option creates, we can subset the trees using our hypothesized groupings like so: [1-300: black];[301-600: red];[601-900: blue]. Different index patterns can follow the same format. After entering the indexes, select Execute. The updated plot will be colored according to the indexing scheme we entered (pictured below).
By coloring the trees based on the gene identity, we can see that indeed the three distinct groups of trees are comprised of trees from the three different genes!
The example provided here details one of the many ways to visualize trees using NLDR. Retry this analysis using your own sets of trees or using 2D euclidean dimension to further explore the uses of NLDR!
CloudForest also supports utilization of Community Detection algorithms to mathematically detect communities of trees and dynamically color the trees in an NLDR based on their community identity. Be on the lookout for future tutorials detailing this process!