Earlier this year I set up to learn a new programming language. I started to learn Rust.
I wanted to try something new and was curious about what other languages are doing. My current bread and butter is R, it is a great interpreted language for data exploration, statistics and data visualisation, but lacking for other applications 1. I am currently working on genomics and metagenomics projects for my postdoctoral contract. A subfield of bioinformatics quite new to me. Most of my work revolves around using and writing pipelines for existing command-line tools using Nextflow. I occasionally write some R scripts for the statistical analyses or to some things too complex to do with Nextflow.
I miss creating applications and tools for other people to use hence my idea to start learning a new a programming language to build tools as a side project to do during my free time.
Note: this post may be a bit long, I try to cover how I started, what I am doing, some things I like, and other things I like less. This is quite a personal take, I am beginning there are plenty of concepts I still don't master and features I may have missed. I am also not going to explain what Rust is, teach you to code with Rust, or review the language.
I chose Rust because I wanted to do something with a low-level language, to be able to write fast software, without losing the little sanity I have left thanks to COVID-19, climate change, politics, and other personal stuff. Although I am not using any low-ish level programming language for my work I did learn programming with C. I first tried C as a teenager, along with some web stuff. Unfortunately at the time, I didn't have any support to learn programming, the English language was a barrier back then, and I had no idea what to build2. Then C was the main programming language we learned during my master in bioinformatics some years ago.
There is something I love about neatly managing memory, being able to go fast and strongly typed and pedantic programming languages. I was intrigued by the lack of garbage collection, about some computer science concepts I haven't approached yet. The syntax didn't look very terrifying or too strange3, and I heard good things about the tooling.
I also chose Rust because earlier this year I also decided to make something completely unrelated to my work, far away from bioinformatics. I started learning Rust by trying to build a videogame using the Godot as a game engine. It is a simulation / story-driven game set in a medieval fantasy world about a witch going to a university with mages, alchemists and druids on an island abroad far and trying to survive academia 4.
My Rust setup
I usually do all my programming and all my work on Linux but I started writing Rust code on Windows because of my video-game project. I only recently got a good enough gaming machine with Windows (Ryzen 2600, RX590 and 16GB Ram). Using this computer instead of my work laptop with Linux also helped me have a better work/life balance. I have now a clear physical separation between work and leisure/hobbies. Do you need a gaming computer to work with Rust? Definitively no. But a Ryzen doesn’t hurt when there is a lot of stuff to compile, and a lot of RAM is always good when you have dozens and dozens of tabs opened.
Since I started using 2 screens to work I can’t go back. Like many people, I use one screen for the IDE and another screen for the documentation. Thanks to Firefox’s picture-in-picture mode I can also watch videos at the same time (a lot of D&D streams these days).
One downside of working on Windows is that I can’t use Valgrind. But my code should work as well on Linux, so one day I will compile my project and use debugging tools available on Linux.
Visual Studio Code is a great IDE, coupled with a nice set of extensions it makes coding with Rust quite easy. I am seriously thinking using it for some work-related work instead of Atom or Pycharm (Rstudio is not going away though).
VS Code extensions worth trying
|auto-completion, runs clippy on save, refactoring, imports, etc.
|keep your imported crates up to date
|integrates git into the IDE
|shows compiler errors and clippy warnings as big red or yellow bars. No need to read the terminal output or hover on the squiggly lines
Other extensions I also use
|Markdown all in one
|some neat mardown editing features
|to create titles in your code
|colors CSV columns with different colours
|to keep track of al lthe work left to do
|Test explorer UI with Rust test explorer
|find all your unit tests in one place, run specific tests (not really necessary with rust-analyzer)
Besides the videogame project, I started with I also tried my hand working at different bioinformatics projects. I work on protein sequence analysis, peptidomics and evolution during my PhD. These fields are quite underrepresented so far in the Rust ecosystem (meaning not on bio-rust).
|colourise an amino acid or a peptide sequence with the Clustal colour scheme (more colour schemes to come, could be used in CLI but also by web applications; uses basic macros)
|macros to help defining regular expression aimed toward peptidic sequences (uses procedural macros)
|Data structures to store a multiple sequence alignment (works withe generic, it could work both with
char or with coloured amino acids from
|Parser for UniProt Fasta headers (uses Nom)
A terminal user interface based application to display a protein multiple sequences alignment (MSA). It will colourise amino acids according to the clustal rules of conservation. It is using the crates I wrote:
multi-seq-align, I may also use
uniprot-fasta-header if the alignment was made using UniProt sequences. I will also try to implement some features s I developed for ProViz.
Here is a working prototype of it:
A command-line application for peptidomics and proteomics. I am trying to implement in Rust some functionalities I developed for my PhD project Peptigram; methods from existing software (such as EnzymePredictor); and new methods to analyse identified peptides files (e.g. comparison of peptide network aware of precursor proteins, search for chimeric proteins, etc.).
Like Peptigram this new software will mostly be for sequence analysis with Fasta files or standard xml files from databases like PRIDE or ProteomeXChange.
My goal is to focus on standardised and open file formats as input (Fasta, MzIdentMl and other files from PRIDE or ProteomeXChange); and to focus on machine-friendly file format as output (CSV5, XML or JSON based files). For this project, I started to write the
aa-regex crate. The next steps will be to write a crate to support endopeptidases specificity motif (known, custom or even made from MEROPS data).
Peptigram and PSSMSearch were web applications. I think web apps are very useful and many people used them, but I think that now is the time to focus on tools that can be part of reproducible workflows. Peptigram has serious technical limitations, it is ok for some data visualisations but is lacking on some areas which I believe didn't help its adoption and citation in scientific articles (I also know some people who used it but didn't cite our paper...).
I am not sure if I will make the data visualisations with Rust or make an R package that will use my Rust library. I am not there yet, for now, the aim is to have something that produces clean text files. If I get to work on this project for my day job I may adjust my ambitions, that would make a nice paper.
Future work for genomics and metagenomics applications
In addition to the work in protein evolution, peptidomics and sequence analysis I also have some ideas for creating some utilities related to metagenomics and genomics. These fields seem to be quite congested with so many software and I am quite new to them so I don’t know what other people need. Plus it needs to be reasonably feasible, Rust is missing some quite essentials tools for some of my ideas. Hopefully, I can find some people to discuss that and be able to make a plan.
Some cool things about Rust
My personal list:
- the helpful compiler and clippy lints
- traits and generics to manage shared behaviour
- error handling with the
- uncertainty management (the equivalent of
NAwith R) with the
- iterators and loops work very well and remind me a lot of R's purrr package
- multithreading and compiler guarantees about data races make writing functions that can run in parallel very easy
- macros to generate code at compile-time preventing a lot of boilerplate code
- no copying and no additional memory allocations for some tasks
- the whole "zero cost abstraction" and "abstraction without overhead" promises from Rust
- all of that with a syntax not too dissimilar to a high-level language (there are some weird things though)
I may go into further detail for each one them in another post.
Some cons of Rust
- a quite steep learning curve, definitively not advised for biologists picking up programming (and that's ok)
- lifetimes are confusing and shouldn't be introduced to beginners at the beginning. My life is easier once I decided I wouldn't bother with them until I get much better at Rust.
- documentation is lacking integration examples in text format (no vignette), the design of doc.rs is a bit "overwhelming". I understand many concepts of Rust but I am not sure how to apply them, when, where, etc.
- what is the deal with
no_std? why every other crate seems to make it a big deal that they have a no_std feature.
Resources to learn Rust
If you are interested in learning how to use Rust, here are some resources that I found useful.
- the official book, you can also order a hardbound copy of it at your local bookstore
- the standard library documentation
- the Rust by examples book
More official documentation on specific subjects can also be found on the official website.
I also found it useful to read the cheatsheets from cheats.rs. There is also the Rust user forums and the Rust community discord server. There is also the official Rust discord server, but I think it is more focused on the development of the language itself.
While checking some stuff for this blog post, I discovered these videos Into Rust. I don't know how much of the code is still valid with the current edition of Rust, the videos are a bit old, but still very good.
Other's people code
When the documentation is not enough my favourite ways to find how to some stuff and assemble different libraries is to look at what other people wrote on github. Searching for repositories using some crates of interest is always instructional. It is also interesting to look a the implementation of some libraries, especially the standard library.
Learning Rust with clippy
Clippy is a utility that advises you to make your code more idiomatic, to detect bad practices and helps writing better code. In addition to the information showed in the output of the command, this website lists all the clippy lints to show why the code is bad, and how to rewrite it.
The basic lints are good but sometimes I run clippy using more pedantic by adding
#![warn(clippy::pedantic, clippy::nursery, clippy::cargo)] at the top of my
lib.rs files to show even more lints and ways to improve my code. Once I fix my code I turn these settings off and only run the basic lints on save, it would too annoying to keep the pedantic lints activated all the time.
I really like writing Rust code. Rust gives a lot of power to users, even to beginners, in a safe way. I hope I will be able to do some Rust development for my day job. I have some ideas of cool new algorithms to implement as part of Bébhar about peptidomics and chimeric peptides. There is a lot of work and libraries to develop to get there though. I would like to collaborate with people on some other projects
I saw the call for blog posts for the 2021 roadmap, I am writing something, Hopefully, someone with my background and use of the language will have some good ideas to improve the language. In the meantime I need to learn how to better write Rust code, reduce all of these clone calls and find a strategy to write more efficient code or code that could be more easily refactored using lifetimes and COWs. A lot to learn but it seems more manageable with Rust than with C.
Thank you for reading this blog post, don't hesitate to concact me is you have comments, advice, or if you found a typo.
Yes, different tasks may require different tools, I believe people waging language wars are toxic and have too much free time. They should find new hobbies and educate themselves. Python, R and Julia are all great programming languages.
Already I had litle motivation to code stuff with no real life purpose, abstract programming exercises like the tower of Hanoï or the Fibonacci number are quite boring in my opinion.
Then I discovered lifetimes
Not so far away from work actually...
A wide CSV file that could be easily read and used to make plots with ggplot2 for example