High Throughput Computing in bioinformatics: workflows, containers and emerging paradigms

2018-12-10T06:40:30Z (GMT) by Peter Van Heusden
Next Generation Sequencing has brought genomic analysis within the range of a great number of laboratories, while increasing the demand for bioinformatic analysis. These typically comprise workflows composed out of chains of analyses with data flowing between workflow steps. Such analysis is amenable to High Throughput Computing, a form of high performance computing characterised by a focus on overall analysis throughput rather than optimisation of a single application. In recent years workflow languages and container technologies have become a key part in composing efficient, reproducible and re-usable bionformatic workflows. These technologies, however, pose a challenge for High Performance Computing providers as they require different characteristics from an execution environment to that provided by traditional HPC clusters. These challenges will be discussed and some approaches to solving them will be discussed.