Welcome to phylofactor!
In this tutorial, we’ll download the R package phylofactor
, and then perform, summarize & visualize phylofactorization of a real microbiome dataset. Slowly, we’ll introduce some more advanced functionality before diving head-first into the guts of the package and the full generality of phylofactorization.
First, let’s download the package:
devtools::install_github('reptalex/phylofactor')
If you had any trouble downloading the package, the most common errors are missing dependencies. If this happens to you, try to read the errors, find out which packages were not found or failed to load, and then install them individually. For example, if the “package.name” package was not found or failed to load, try: install.packages('package.name')
The workhorse of the phylofactor package is the function PhyloFactor
. It contains a useful “Help” file with examples of how to use its more advanced functionality, including some not covered in this tutorial.
library(phylofactor)
? PhyloFactor
The package contains a dataset used in our paper, “FTmicrobiome”. FTmicrobiome contains sequence counts of bacteria from fecal and tongue body sites collected from two humans over time (Caporaso et al. 2011 - an awesome paper whose open access made this method possible). From the time series, we randomly sampled 10 timepoints from each of the two people for each of the two body sites, yielding 40 samples in all, 20 from the feces and 20 from the tongue.
Let’s load the data:
data("FTmicrobiome")
This dataset is a list containing the necessary ingredients for PhyloFactor:
OTUTable <- FTmicrobiome$OTUTable #Our OTU table. Rows are OTUs and columns are samples
OTUTable[1:5,1:5]
## feces feces feces feces feces
## 359105 2 0 0 8 5
## 4371046 0 0 0 0 0
## 191468 0 0 0 0 0
## 362389 0 0 0 0 0
## 292255 0 0 0 0 0
body.site <- FTmicrobiome$X #Our independent variable
str(body.site)
## Factor w/ 2 levels "feces","tongue": 1 1 1 1 1 1 1 1 1 1 ...
tree <- FTmicrobiome$tree #Our phylogeny
tree
##
## Phylogenetic tree with 2713 tips and 2711 internal nodes.
##
## Tip labels:
## 558531, 565292, 345753, 551295, 842714, 327887, ...
## Node labels:
## , 0.868, 0.934, 0.982, 0.475, 0.859, ...
##
## Unrooted; includes branch lengths.
The tips ### 4) Taxonomy
taxonomy <- FTmicrobiome$taxonomy #Our taxonomy
taxonomy[1:3,]
## OTU_ID
## 55158 558531
## 25035 565292
## 9257 345753
## taxonomy
## 55158 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__; s__
## 25035 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__; s__
## 9257 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__; s__
The taxonomy is a data frame with two columns. The first column contains the OTU IDs as found in the phylogenetic tree. The second column contains the taxonomy corresponding to each OTU. The only rule for the format of the taxonomy is that the levels must be separated by “;”.
Let’s focus our efforts on common OTUs. The choice of OTU filtering depends on the interests of the researcher - some research questions will want to pay much more attention to rare taxa. Be deliberate about filtering (and treatment of zeros) to best answer your unique question.
In this case, we’re interested in the phylogenetic factors of commonly found taxa, where “common” taxa are defined (subjectively) as those found in over 25% of the samples (>10 samples).
common.otus <- which(rowSums(OTUTable>0)>10)
OTUTable <- OTUTable[common.otus,]
tree <- ape::drop.tip(tree,setdiff(tree$tip.label,rownames(OTUTable))) #trim our tree accordingly
We’re all set to phylofactor.
The default phylofactorization is done through a regression model Data~X
where Data
is the ILR coordinate and X
the independent variable. PhyloFactor
will choose the edge whose regression model maximizes the difference between the null deviance and residual deviance. Intuitively, we’re picking the edge for which we can “explain the most variance” in the data through regression.
We’ll get into more details later. For now, let’s keep it simple and phylofactor these data.
Let’s look at the first three factors by setting nfactors=3
:
PF <- PhyloFactor(OTUTable,tree,body.site,nfactors=3)
class(PF)
## [1] "phylofactor"
The output of PhyloFactor
is a “phylofactor” object. This object contains many features:
names(PF)
## [1] "Data" "X" "tree"
## [4] "glms" "terminated" "groups"
## [7] "factors" "basis" "ExplainedVar"
## [10] "nfactors" "bins" "bin.sizes"
## [13] "Monophyletic.clades"
“Data”: the compositional data matrix used for phylofactorization. This matrix may differ from the input OTU table, as PhyloFactor
trims rows that are not in the tree, replaces zeros with 0.65, and converts these sequence counts to relative abundances. If you have a preferred method of dealing with zeros it’s necessary to do that before inputting the data into PhyloFactor
.
“X”: the independent variable or data frame of independent variables.
“tree”: the unrooted phylogeny whose tips are the rownames of “Data”.
“glms”: For the default regression-phylofactorization, PhyloFactor
outputs the "glm"
objects from the factored edges.
“terminated”: logical, indicating whether or not the iterations were terminated due to stopping criteria.
“groups” : A list of length “nfactors” whose elements are two-element lists. Each two-element list contains the row numbers of the OTUs being split at each factor (“Group1” and “Group2” for each factor).
“factors”: A convenient summary tool, illustrating what happened at each step of the factorization.
“basis”: The ILR balancing elements constructed from phylofactorization.
“ExplainedVar”: The percent of total variance explained by regression on phylofactors (only output for the default choice="var"
)
“nfactors”: the number of factors.
“bins”: the resultant bins at the end of the phylofactorization.
“bin.sizes”: breakdown of bin sizes by the number of bins with that size.
“Monophyletic.clades”: indexes of which bins are monophyletic.
The “factors” feature summarizes the clades split and how significant these splits were:
PF$factors
## Group1 Group2
## Factor 1 "40 member Monophyletic clade" "250 member Monophyletic clade"
## Factor 2 "16 member Monophyletic clade" "234 member Paraphyletic clade"
## Factor 3 "15 member Monophyletic clade" "219 member Paraphyletic clade"
## pvalues
## Factor 1 "0"
## Factor 2 "0"
## Factor 3 "0"
The first factor splits a 40 member monophyletic clade form a 250 member “monophyletic” clade. Both clades will be monophyletic initially by default as both are monophyletic in an unrooted tree. Researchers can use their own assumptions to re-classify the clades and focus their efforts on monophyletic clades in the rooted tree.
We can look at the regression from each factor through the output glms:
PF$glms[[1]]
##
## Call: glm(formula = frmla, data = dataset)
##
## Coefficients:
## (Intercept) Xtongue
## -5.388 12.872
##
## Degrees of Freedom: 39 Total (i.e. Null); 38 Residual
## Null Deviance: 1712
## Residual Deviance: 54.62 AIC: 132
summary(aov(PF$glms[[1]]))
## Df Sum Sq Mean Sq F value Pr(>F)
## X 1 1657.0 1657.0 1153 <2e-16 ***
## Residuals 38 54.6 1.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Once we have a “phylofactor” object, there are a bunch of functions that start with “pf.” These functions produce some convenient output once you have a phylofactor object.
For instance, we can summarize factor 1 using pf.summary
:
smry <- pf.summary(PF,taxonomy,factor=1)
names(smry)
## [1] "group1" "group2" "TaxaSplit" "glm"
## [5] "ilr" "fitted.values" "MeanRatio" "fittedMeanRatio"
The summary grabs features about factor 1 that may be of use for downstream analysis and plotting. This includes:
To be more succinct, we can input smry
into another function: pf.tidy
.
pf.tidy(smry)
## $`group1, Monophyletic`
## [1] k__Bacteria; p__Actinobacteria;
## [2] k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria;
## [3] k__Bacteria; p__Proteobacteria; c__Betaproteobacteria;
## [4] k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria;
## [5] k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria;
## 5 Levels: k__Bacteria; p__Actinobacteria; ...
##
## $`group2, Monophyletic`
## [1] k__Bacteria; p__Bacteroidetes;
## [2] k__Bacteria; p__Cyanobacteria;
## [3] k__Bacteria; p__Firmicutes;
## [4] k__Bacteria; p__Fusobacteria;
## [5] k__Bacteria; p__Proteobacteria; c__Epsilonproteobacteria;
## [6] k__Bacteria; p__Verrucomicrobia;
## 6 Levels: k__Bacteria; p__Bacteroidetes; ...
##
## $`Predicted ratio of group1/group2`
## feces tongue
## 0.3995171 3.5772570
##
## $`Observed Ratio of group1/group2 geometric means`
## feces feces.1 feces.2 feces.3 feces.4 feces.5 feces.6
## 0.3932904 0.3691791 0.4240780 0.6664638 0.4614384 0.3368341 0.3867820
## feces.7 feces.8 feces.9 feces.10 feces.11 feces.12 feces.13
## 0.3736680 0.3115788 0.4286058 0.5774729 0.3913563 0.3565216 0.3678096
## feces.14 feces.15 feces.16 feces.17 feces.18 feces.19 tongue
## 0.3923586 0.3938998 0.3749477 0.3672121 0.3785641 0.3651888 4.0964156
## tongue.1 tongue.2 tongue.3 tongue.4 tongue.5 tongue.6 tongue.7
## 4.5410468 3.8226428 4.8166589 3.7904626 4.3919111 4.9287731 4.0662406
## tongue.8 tongue.9 tongue.10 tongue.11 tongue.12 tongue.13 tongue.14
## 3.9905971 4.5949302 3.2815087 3.6910945 3.0107416 2.7560454 2.8664792
## tongue.15 tongue.16 tongue.17 tongue.18 tongue.19
## 2.5432438 2.1839945 2.7764474 3.1840250 3.9727941
Phylogenetic structure in a dataset can often be visualized with phylogenetic heatmaps. I use Liam Revell’s “phytools” package for that. When imaging OTU tables, I find that the centered log-ratio transform often produces better images:
library(phytools)
clr <- function(Matrix) apply(Matrix,MARGIN=2,FUN=function(x) log(x)-mean(log(x)))
par(mfrow=c(1,1))
phylo.heatmap(tree,clr(PF$Data))
The tip labels and column labels are a bit ugly, but the colormap and alignment of the data with the phylogeny reveal phylogenetic structure in our data as big blocks of color across body sites.
What were the differences between Group1 and Group2 in factor 1 above?
par(mfrow=c(1,2))
plot(body.site,smry$ilr,main='Isometric log-ratio',ylab='ILR balance')
plot(body.site,smry$MeanRatio,main='Ratio of Geometric Means',ylab='Group1/Group2')
td <- pf.tidy(smry)
predicted.ratio <- td$`Predicted ratio of group1/group2`
predicted.ratio
## feces tongue
## 0.3995171 3.5772570
predicted.ratio['tongue']/predicted.ratio['feces']
## tongue
## 8.953952
The ratio of Group1/Group2 changes by a factor of 9 across body sites.
Who are these OTUs being split by this factor? Taxonomies can be obtained using the “TaxaSplit” feature of smry
:
smry$TaxaSplit %>%
lapply(.,FUN=function(x) unique(x$TaxaIDs)) #this simplifies our list to the unique elements
## $`Group 1`
## [1] k__Bacteria; p__Actinobacteria;
## [2] k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria;
## [3] k__Bacteria; p__Proteobacteria; c__Betaproteobacteria;
## [4] k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria;
## [5] k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria;
## 5 Levels: k__Bacteria; p__Actinobacteria; ...
##
## $`Group 2`
## [1] k__Bacteria; p__Bacteroidetes;
## [2] k__Bacteria; p__Cyanobacteria;
## [3] k__Bacteria; p__Firmicutes;
## [4] k__Bacteria; p__Fusobacteria;
## [5] k__Bacteria; p__Proteobacteria; c__Epsilonproteobacteria;
## [6] k__Bacteria; p__Verrucomicrobia;
## 6 Levels: k__Bacteria; p__Bacteroidetes; ...
Group 1 contains Actinobacteria and all classes of Proteobacteria except Epsilonproteobacteria.
We can summarize our first factor as follows: the Actinobacteria & all non-Epsilon-Proteobacteria appear to be tongue specialists. On average, they are 0.4x as abundant as other taxa in the gut, whereas in the tongue they are on average 3.6x as abundant as other taxa.
Each factor corresponds to a chain of one or more edges in the phylogeny. The function getFactoredEdges
helps us find out which edges those are:
factored.edge <- getFactoredEdges(PF$basis[,1],tree)
factored.edges <- getFactoredEdgesPAR(ncores=3,PF=PF) #Parallel; gets all factored edges
With the factored edges, we can map our phylofactors directly to the phylogeny. In the future, this may help with annotation; for now, it helps with visualization. Let’s color the edges for each group, and exaggerate the length of the factored edge:
Group1.otus <- smry$group1$IDs$otuIDs %>% as.list %>% sapply(.,toString)
Group2.otus <- smry$group2$IDs$otuIDs %>% as.list %>% sapply(.,toString)
Group1.edges <- extractEdges(tree,taxa=Group1.otus,type=3)
Group2.edges <- extractEdges(tree,taxa=Group2.otus,type=3)
edge.colors <- rep('black',Nedge(tree))
edge.colors[Group1.edges] <- 'green'
edge.colors[Group2.edges] <- 'blue'
edge.colors[factored.edge] <- 'red'
### Let's also exaggerate the length of our factored edge:
edge.lengths <- tree$edge.length
edge.lengths[factored.edge] <- 0.6
tr <- tree
tr$edge.length <- edge.lengths
par(mfrow=c(1,1))
plot.phylo(tr,type='unrooted',edge.color = edge.colors,show.tip.label = F,edge.width = 2,main='Factor 1',rotate.tree = 20)
legend(-0.16,0.3,legend=c('Group 1','factored edge','Group 2'),fill=c('blue','red','green'))
Phylofactorization is a change of variables that allows for dimensionality reduction and ordination visualization.
pf.ordination(PF)
This ordination-visualization is very easy to do yourself, and doing it yourself can provide more flexibility. To do it yourself, just project the data onto ILR coordinates and choose which coordinates to plot.
ilr.projection <- pf.ILRprojection(PF,nfactors=2)
plot(ilr.projection[1,],ilr.projection[2,],xlab='PF1',ylab='PF2',main='Ordination by ILR Coordinates')
points(ilr.projection[1,body.site=='feces'],ilr.projection[2,body.site=='feces'],pch=16,cex=2,col='brown')
points(ilr.projection[1,body.site=='tongue'],ilr.projection[2,body.site=='tongue'],pch=16,cex=2,col='pink')
legend(-1,2.5,legend=c('Tongue','Poo'),fill=c('pink','brown'),cex=2)
We can bin taxa based on their common factors at a given level of factorization, forming binned phylogenetic units or BPUs.
bin.projection <- pf.BINprojection(PF,factor=3)
The default pf.BINprojection
returns the observed geometric means of the relative abundances of taxa in each bin.
We can also obtain the predicted geometric means of taxa in each bin using the input prediction
bin.projection.pred <- pf.BINprojection(PF,prediction=T)
names(bin.projection.pred)
## [1] "Data" "otus"
The output from pf.Binprojection
contains the “Data” matrix and the “otus” in each bin.
The taxonomic composition of bins can be summarized with help from the function OTUtoTaxa
:
bin.taxa <- bin.projection$otus %>%
lapply(.,FUN=function(otus,tax) OTUtoTaxa(otus,tax,common.name = F),tax=taxonomy)
bin.taxa[2:4]
## $`Bin 2`
## [1] "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__; s__"
## [2] "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Actinomyces; s__"
## [3] "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Actinomyces; s__"
## [4] "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Arcanobacterium; s__"
## [5] "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Corynebacteriaceae; g__Corynebacterium; s__"
## [6] "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Micrococcaceae; g__Rothia; s__aeria"
## [7] "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Micrococcaceae; g__Rothia; s__mucilaginosa"
## [8] "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Micrococcaceae; g__Rothia; s__mucilaginosa"
## [9] "k__Bacteria; p__Actinobacteria; c__Coriobacteriia; o__Coriobacteriales; f__Coriobacteriaceae; g__Atopobium; s__"
## [10] "k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria; o__Rickettsiales; f__mitochondria; g__; s__"
## [11] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Alcaligenaceae; g__Sutterella; s__"
## [12] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Burkholderiaceae; g__Lautropia; s__"
## [13] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Burkholderiaceae; g__Lautropia; s__"
## [14] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Comamonadaceae; g__; s__"
## [15] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__; s__"
## [16] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__Eikenella; s__"
## [17] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__Neisseria; s__"
## [18] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__Neisseria; s__"
## [19] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__Neisseria; s__cinerea"
## [20] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__Neisseria; s__subflava"
## [21] "k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__Neisseria; s__subflava"
## [22] "k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Desulfovibrionales; f__Desulfovibrionaceae; g__Bilophila; s__"
## [23] "k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Desulfovibrionales; f__Desulfovibrionaceae; g__Desulfovibrio; s__"
## [24] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Cardiobacteriales; f__Cardiobacteriaceae; g__Cardiobacterium; s__"
## [25] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Cardiobacteriales; f__Cardiobacteriaceae; g__Cardiobacterium; s__"
## [26] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__; s__"
## [27] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__Serratia; s__"
## [28] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Actinobacillus; s__porcinus"
## [29] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Aggregatibacter; s__"
## [30] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae"
## [31] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae"
## [32] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae"
## [33] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae"
## [34] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae"
## [35] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae"
## [36] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae"
## [37] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae"
## [38] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae"
## [39] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Pseudomonadaceae; g__Pseudomonas; s__"
## [40] "k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Pseudomonadaceae; g__Pseudomonas; s__"
##
## $`Bin 3`
## [1] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Gemellales; f__Gemellaceae; g__Gemella; s__"
## [2] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Carnobacteriaceae; g__; s__"
## [3] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Carnobacteriaceae; g__Carnobacterium; s__"
## [4] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Lactococcus; s__"
## [5] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [6] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [7] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [8] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [9] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [10] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [11] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [12] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [13] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [14] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [15] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
## [16] "k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__"
##
## $`Bin 4`
## [1] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__"
## [2] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__"
## [3] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__"
## [4] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__"
## [5] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__"
## [6] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__"
## [7] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__"
## [8] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__"
## [9] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__"
## [10] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__copri"
## [11] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__melaninogenica"
## [12] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__melaninogenica"
## [13] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__melaninogenica"
## [14] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__nanceiensis"
## [15] "k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__nanceiensis"
The bins can be stacked to visualize their relative weight or abundances. The default output for pf.BINprojection
will be a data matrix whose rows are the geometric means of the bins. The input rel.abund
allows us to grab the observed and predicted relative abundances as well:
bin.observed.geometricMeans <- pf.BINprojection(PF)$Data
bin.observed.relativeAbundances <- pf.BINprojection(PF,rel.abund=T)$Data
bin.predicted.geometricMeans <- pf.BINprojection(PF,prediction=T)$Data
bin.predicted.relativeAbundances <- pf.BINprojection(PF,prediction=T,rel.abund=T)$Data
library(plotrix)
par(mfrow=c(2,2))
stackpoly(t(bin.observed.geometricMeans),stack=T,main='Observed gMean',xat = c(10,30),xaxlab = c('feces','tongue'))
stackpoly(t(bin.predicted.geometricMeans),stack=T,main='Predicted gMean',xat = c(10,30),xaxlab = c('feces','tongue'))
stackpoly(t(bin.observed.relativeAbundances),stack=T,main='Observed Rel-Abund.',xat = c(10,30),xaxlab = c('feces','tongue'))
stackpoly(t(bin.predicted.relativeAbundances),stack=T,main='Predicted Rel-Abund.',xat = c(10,30),xaxlab = c('feces','tongue'))
The generalized linear models in the default phylofactorization can be used to make predictions. Like the predict
function for regular “glm” objects, pf.predict
combines the glms to predict the relative abundances of OTUs.
prediction <- pf.predict(PF)
dim(prediction)
## [1] 290 40
For user flexibility, pf.predict
returns an entire data matrix.
We can visualize the Data and predictions side-by-side:
prediction <- pf.predict(PF,factors=3)
par(mfrow=c(2,1))
phylo.heatmap(tree,clr(PF$Data))
phylo.heatmap(tree,clr(prediction))
## Warning: closing unused connection 7 (<-LAPTOP-T9KHGO63:11489)
## Warning: closing unused connection 6 (<-LAPTOP-T9KHGO63:11489)
## Warning: closing unused connection 5 (<-LAPTOP-T9KHGO63:11489)