Nanobe in the shell

Introducing a new tool to convert and combine Kraken reports!

Quick links

Repository: https://github.com/jeanmanguy/spideog
README: https://github.com/jeanmanguy/spideog/blob/main/README.md
Binaries: https://github.com/jeanmanguy/spideog/releases

Earlier this year I started working on a metagenomic project and I also started to learn Rust. To classify and assign sequencing reads to taxons I use Kraken 2¹ combined with Bracken².

Goals

I have some problems with the Kraken reports format. I can't easily make plots with R with them. For example I wanted to use the taxonomy data to draw trees with ggtree³, you can't do that easily: the taxonomic tree is encoded using indentation mixed with the abundance data. The format used by Metaphlan⁴ is slightly better but has the same problems.

So, as I was learning Rust I set up myself with the goal of making a simple command line software in Rust to parse Kraken reports and to transform them into standard and tidy text formats.

Kraken report explanation — Kraken report anatomy

One of my goals was to combine data from multiple reports in order to ease the analysis of multiple samples. It was also important for me to not waste time working on deployment and installation procedures. At the same I was working on setting up a Nextflow pipeline to launch jobs on the university's HPC cluster "SONIC".

Implementation

I developed Spideog to read one or multiple Kraken report and write one tree file or one CSV file. I use the simple Newick format for the tree, and a tidy format⁵ for the abundance data. Newick trees are easily readable in R (and other analysis language) with the {ape} package⁶, and tidy data is the standard for the Tidyverse⁷, the easiest way to format data if you make plots with {ggplot}⁸. These file formats can be combined to merge the results of multiple analysis.

Spideog is implemented in Rust. I set up a continuous integration to build binaries for Linux, OSX, and Windows. No dependencies needed. No Docker container or Conda environment needed. No need to have Rust installed on the machine. I added the binary to my Nextflow pipeline (in the bin/ folder, you could make a container wrapper if you want to have everything in containers and not have binaries in your git repository), it works like a charm on the cluster, no extra hassle.

Example

For development and test purposes I manually crafted 2 kKraken reports with few differences in read counts and also with different species found.

The Spideog's subcommand combine-trees will take all the reports you give it (thanks for glob) and generate a single tree in Newick format, that you can read with ape and ggtree. The subcommand combine-abundances will produce 1 CSV file that can be read with R. Then using fortify on the tree object to get a data frame, you can join the 2 objects to plot heatmap and other data visualisation with ggplot2 (or base R plot functions).

Spideog: a new tool to convert and combine Kraken reports

Goals

Implementation

Example

Conclusions

Thanks

References