Contemporary computational genomics is inseparable from all the technical hurdles that arise when there is a need to handle the amounts of data counted in petabytes.
Nevertheless, the sheer volume of data is just the tip of an iceberg. When considering the clinical applications, the true challenge stems from both the nature of the processed data and the purpose behind processing it. The clinical context renders analyses useful only if their results met rigorously defined standards and can be provided on time.
To do so, it is not enough to have access to scalable computing power. It is also important to automate the process that includes variant prioritization, variant description, and reporting following recommendations from bodies like the Association for Molecular Pathology and the American College of Medical Genetics and Genomics.
Reproducibility and precise versioning are crucial aspects to be met, according to the College of American Pathologists (CAP). Requirements of scalability and pipeline version traceability are almost interoperable. If the pipeline is precisely versioned and complete, it means that it is set to run in the cloud without bioinformatician supervision. Versioning also allows managing upgrades and documentation, which are both included in CAP NGS Laboratory Requirements.
In recent years, several languages have appeared that are explicitly dedicated to enabling defining bioinformatics workflows in a platform-agnostic manner, such as WDL, CWL, or Nextflow DSL. Pipelines defined in those languages can now be run on cloud computing platforms including AWS,
Google Cloud Platform, Alibaba Cloud, and more thanks to engines like Cromwell or Nextflow. At Intelliseq, we use WDL language, which allows our pipelines to be run not only on generic public clouds but also on genomic specific clouds like DNAnexus, DNAstack, or by ixLayer.
One of the crucial ideas that pushed the field of computational genomics forward was containerization. All tools can now be encompassed inside a lightweight, semi-virtual machine called Docker containers. This procedure allows tools to be run without installation on every platform that has Docker Engine installed. It is enough to download the Docker image with all the required tools for computation and run it in a local or cloud environment. At Intelliseq, we put inside the docker containers not only all tools but also data sources required for computation – like reference genome or variant annotations. This allows us to achieve precise versioning of our pipelines, thus, achieving CAP requirements for bioinformatics pipelines.
Rather than developing pipelines each time from scratch, use already developed ones – the other way around is reinventing the wheel. Once the base pipeline is established, it can be customized for a client with fractions of the development cost.
Establishing a specialized team of bioinformaticians inside a lab to develop genomic pipelines can be compared to establishing a team of electrical engineers to set up a power generator for a lab. It is much more cost-effective to cooperate with companies like Intelliseq that already have an established portfolio of workflows. Those workflows can be easily customized according to laboratory requirements and then integrated with Laboratory Information Systems or run by external providers like ixLayer.
How does the process of workflow development look like? It starts with a specification of requirements. Then, the team of Intelliseq scientists proposes the outline of workflow, including already developed computational tasks as well as those that need to be developed.
Intelliseq has an extensive collection of tasks performing procedures like a quality check, alignment, variant calling, variant annotation, imputing, polygenic scores computation. Tasks are connected to fully-functional workflows. The initial proposal also includes the pricing and development timeline. Report generation can be included in the pipeline, or it can be produced by other vendors based on results coming out of the pipeline.
Sign up to our newsletter to receive the latest industry news, and trends.