library(tidyverse)
We are importing the metadata which can be found here:
/data/shared/Rfiles/metadata_sub.csv
We are also importing the two files classification_counts.csv and hostread_counts.csv that you created in yesterdays session. If you haven’t created those files, then we provided these files for you here:
/data/shared/Rfiles/classification_counts.csv
/data/shared/Rfiles/hostread_counts.csv
Make sure you copy this file to your current working directory or specify the correct path to the file when importing.
Import the metadata and check the input file using the head()
command
meta<-read_delim(file="metadata_sub.csv", delim="\t")
head(meta)
Import the classification counts and host read counts as follows
classified<-read_delim(file="classification_counts.csv", delim=",")
head(classified)
hostreads<-read_delim(file="hostread_counts.csv", delim=",")
head(hostreads)
We see in the files we still have .report
in the sample names. In order to be able to merge this data with the metadata we need to remove this string and just keep the sample names.
colnames(classified)<-gsub(".report", "", colnames(classified))
colnames(hostreads)<-gsub(".report", "", colnames(hostreads))
head(classified)
head(hostreads)
We want to merge metadata and read counts. To do that we first need to re-arrange our data to make sure that we have the samples as column and not as column names.
classified<- classified %>%
select(-c(taxid,taxlevel)) %>% # this removes 2 columns we don't need
gather(Sample, value,-taxa) %>% # this gathers the data into columns except for column taxa
spread(taxa, value) %>% # this spreads out again the data into the required format
rename("classified"=root) # here we rename the column root into 'classified'
head(classified)
hostreads<- hostreads %>%
select(-c(taxid,taxlevel)) %>%
gather(Sample, value,-taxa) %>%
spread(taxa, value) %>%
rename("host"=root) %>%
rename("nohost"=unclassified)
head(hostreads)
Now we are ready to join with the metadata by their common Sample
column. We use full_join
to do that. Check out the resulting data.
metainfo<-meta %>%
full_join(classified,by="Sample") %>%
full_join(hostreads,by="Sample")
head(metainfo)
First we check how many host reads we had in the beginning
# gather the columns to make them suitable with ggplot
metainfo_gg1 <- metainfo %>%
gather(key="host",value="counts",-c(Sample, group, MouseID, timepoint, mocktreat, classified, unclassified))
head(metainfo_gg1)
Now create a stacked bar plot
ggplot(metainfo_gg1, aes(x=Sample, y=counts)) +
geom_bar(stat="identity",position="stack", aes(fill=host)) +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
facet_grid(. ~ mocktreat, scales="free_x",space = "free_x")
Interesting, Sample22 seems to have had much more host than the others.
Now let’s take a look how many reads of the decontaminated data were actually classified by your kraken2 run.
metainfo_gg2 <- metainfo %>%
gather(key="classified",value="counts",-c(Sample, group, MouseID, timepoint, mocktreat, host, nohost))
head(metainfo_gg2)
ggplot(metainfo_gg2, aes(x=Sample, y=counts)) +
geom_bar(stat="identity",position="stack", aes(fill=classified)) +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
facet_grid(. ~ mocktreat, scales="free_x",space = "free_x")
Let’s now plot host reads, classified fraction and unclassified fraction into the same plot.
metainfo_gg3 <- metainfo %>%
gather(key="reads",value="counts",-c(Sample, group, MouseID, timepoint, mocktreat, nohost))
head(metainfo_gg3)
ggplot(metainfo_gg3, aes(x=Sample, y=counts)) +
geom_bar(stat="identity",position="stack", aes(fill=reads)) +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
facet_grid(. ~ mocktreat, scales="free_x",space = "free_x")
In general we can see that the number of reads classified by kraken2 were pretty similar across the samples.