Project Twins
In 2015, geneticist Guy Reeves was trying to configure a free software system called Galaxy to get his bioinformatics projects off the ground. After a day or two of frustration, he asked members of his IT department for help. They installed Docker, a technology for simulating computational environments, which enabled him to use a special version of Galaxy that came packaged with everything he needed called a container. A slight tweak to the Galaxy settings, and he was done before lunch.
Reeves, at the Max Planck Institute for Evolutionary Biology in Pln, Germany, is one of many scientists adopting containers. As science becomes ever more data intensive, more software is being written to extract knowledge from those data. But few researchers have the time and computational know-how to make full use of it. Containers, packages of software code and the computational environment to run it, can close that gap. They help researchers to use a wider array of software, accelerate experiments and promote reproducibility.
Containers are essentially lightweight, configurable virtual machines simulated versions of an operating system and its hardware, which allow software developers to share their computational environments. Researchers use them to distribute complicated scientific software systems, thereby allowing others to execute the software under the same conditions that its original developers used. In doing so, containers can remove one source of variability in computational biology. But whereas virtual machines are relatively resource-intensive and inflexible, containers are compact and configurable, says C. Titus Brown, a bioinformatician at the University of California, Davis. Although configuring the underlying containerization software can be tricky, containers can be modified to add or remove tools according to the user's need flexibility that has boosted their popularity, he says. I liked the idea of having something that works out of the box, says Reeves.
Lab-built tools rarely come ready to run. They often take the form of scripts or programming source code, which must be processed and configured. Much of the software requires additional tools and libraries, which the user may not have installed. Even if users can get the software to work, differences in computational environments, such as the installed versions of the tools it depends on, can subtly alter performance, affecting reproducibility. Containers reduce that complexity by packaging the key elements of the computational environment needed to run the desired software, including settings and add-ons, into a lightweight, virtual box. They don't alter the resources required to run it if a tool needs a lot of memory, then so too will its container. But they make the software much easier to use, and the results easier to reproduce.
Depending on the software used Docker, Singularity and rkt are popular containers can run on Windows, Mac OS X, Linux or in the cloud. They can package anything from a single process to a complex environment such as Galaxy. These tools can interact with each other, sharing data or building pipelines, for instance. Because each application resides in its own box, even tools that would ordinarily conflict with each other can run harmoniously.
Docker uses executable packages, called images, which include the tool to be contained as well as the developer's computational environment. To create a Docker image, a developer creates a configuration file with instructions on how to download and build all the required tools inside it. He or she then 'runs' the file to create an executable package. All the user then needs to do is retrieve the package and run it. Other tools can also generate images. The Reprozip program, for example, assembles Docker-compatible packages by watching as software tools run and tracing the input files and software libraries that the tool requires.
Deborah Bard, a computer scientist at the National Energy Research Scientific Computing Center in Berkeley, California, helps researchers to install their software on the lab's supercomputer. She recalls spending three or four days installing a complex software pipeline for telescope simulation and analysis. Using containers cut this time down to hours. You can spend your time doing science instead of figuring out compiler versions, she says.
For Nicola Mulder, a bioinformatician at the University of Cape Town in South Africa, containers help her to synchronize a cross-border bioinformatics network she runs in Africa, called H3ABioNet. Not all African institutions have access to the same computational resources, she explains, and Internet connectivity can be patchy. Containers allow researchers with limited resources to access the tools that they otherwise might not be able to.
They also allow researchers with sensitive genomic data to collaborate and compare findings without actually sharing the underlying data, Mulder says. And, if researchers at one site obtain different results from their colleagues at another, the standardization the containers provide could eliminate one of the reasons why.
Although computer scientists have multiple options for container platforms, Docker, which is an open-source project launched in 2013, is perhaps the most popular among scientists. It has a large registry of prebuilt containers and an active online community that competitors have yet to match. But many administrators of high-performance computing systems preclude Docker use because it requires high-level administrative access privileges to run. This type of access may allow users to copy or damage anything on the system. An add-on to the fee-based enterprise edition allows users to sidestep that requirement, but it is not available with the free, community edition. They can, however, use a different containerization tool such as Shifter, which doesn't require full privileges, or root access, but still supports Docker images.
The requirement for root access is the biggest obstacle to widespread adoption of Docker, Brown explains. Many academics run bioinformatics tools on high-performance computing clusters administered by their home institutions or the government. Of course, they don't have administrative privileges on most of those systems, he says. Brown spends about US$50,000 annually for cloud computing time on Amazon Web Services, but he says this represents just one-third of his computing work; the rest is carried out on a cluster at Michigan State University, where he lacks root-level access. As a result, Brown creates Docker containers of his tools for distribution, but can rarely use them himself.
Researchers can access Docker images either from the platform's own hosting service, Docker Hub, or from registries of containers such as BioContainers and Dockstore, which allow the sharing of tools vetted by other scientists. Brian O'Connor at the University of California, Santa Cruz, who was the technical lead for the Dockstore registry, recommends that scientists look through container registries to find a tool that works for their project instead of trying to reinvent something that already exists.
But actually getting the underlying Docker software to run properly can be challenging, says Simon Adar, chief executive of Code Ocean in New York, an online service that aims to simplify the process. It's too technical, it was designed for developers to deploy complex systems. The service, launched in February, creates what Adar calls compute capsules, which comprise code, data, results and the Docker container itself. Researchers upload their code and data, and then either execute it in a web browser or share it with others no installation required. Adar likens the process to sharing a YouTube video. The company even offers a widget that enables users to embed executable code in web pages.
Shakuntala Baichoo, a computer scientist at the University of Mauritius in Moka, learned about containers at a communal programming event, called a hackathon, organized by H3ABioNet. Previously, she spent hours helping collaborators install her tools. In making the tools easier to install, she says, containers not only free up her time, but they might also encourage scientists to test them and provide feedback.
At CERN, the particle-physics laboratory near Geneva, Switzerland, scientists use containers to accelerate the publication process, says physicist Kyle Cranmer at New York University who works on CERN's ATLAS project, which searches for new elementary particles. When physicists run follow-up studies, they have to dig up code snippets and spend hours redoing old analyses; with containers, they can package ready-to-use data analysis workflows, simplifying and shortening the process.
Cranmer says that although much of the debate around reproducibility has focused on data and code, computing environments themselves also play a big part. It's really essential, he says. One study of an anatomical analysis tool's performance in different computing environments, for example, found that the choice of operating system produced a small but measurable effect (E. H. B. M. Gronenschild et al. PLoS ONE 7, e38234; 2012).
But containers are only as good as the tools they encapsulate, says Lorena Barba, a mechanical and aerospace engineer at George Washington University, Washington DC. If researchers start stuffing their bad code into a container and pass it on, we are foredoomed to failure. And, says Brown, without pressure from funding agencies and journals, containers are unlikely to make researchers suddenly embrace computational reproducibility.
Indeed, few researchers are using containers, says Victoria Stodden, a statistician at the University of Illinois at UrbanaChampaign who studies computational reproducibility. In part that's because of a lack of need or awareness, but it is also because they might not have the computer skills needed to get going.
Behind the scenes, however, that could be changing. Companies such as Google and Microsoft already run some software in containers, says Jonas Almeida, a bioinformatician at Stony Brook University, New York. Large-scale bioinformatics projects may not be far behind. The cloud-based version of Galaxy will eventually run inside containers by default, says Enis Afgan, a computer scientist at Johns Hopkins University in Baltimore, Maryland, who works on Galaxy.
In 510 years, Almeida predicts, scientists will no longer have to worry about downloading and configuring software; tools will simply be containerized. It's inevitable, he says.
The rest is here:
Software simplified - Nature.com
- MOgene Announces Partnership with Intuitive Genomics to Expand Bioinformatics Capabilities [Last Updated On: January 28th, 2013] [Originally Added On: January 28th, 2013]
- Bioinformatics Organization - Bioinformatics.Org Wiki [Last Updated On: January 17th, 2014] [Originally Added On: January 17th, 2014]
- Bioinformatics Market by Application (Genomics, Molecular Phylogenetics, Metabolomics, Proteomics, Chemoinformatics ... [Last Updated On: April 9th, 2014] [Originally Added On: April 9th, 2014]
- Bioinformatics Master of Science - Northeastern ... [Last Updated On: April 9th, 2014] [Originally Added On: April 9th, 2014]
- Master of Science in Bioinformatics | AAP | JHU [Last Updated On: April 9th, 2014] [Originally Added On: April 9th, 2014]
- Bioinformatics - Bioinformatics.Org Wiki [Last Updated On: April 9th, 2014] [Originally Added On: April 9th, 2014]
- Bioinformatics Organization - Bioinformatics.Org [Last Updated On: April 9th, 2014] [Originally Added On: April 9th, 2014]
- Bioinformatics - Wikipedia, the free encyclopedia [Last Updated On: April 9th, 2014] [Originally Added On: April 9th, 2014]
- Identified epigenetic factors associated with increased risk of developing cancer [Last Updated On: April 10th, 2014] [Originally Added On: April 10th, 2014]
- Home | Department of Computational Medicine & Bioinformatics [Last Updated On: April 10th, 2014] [Originally Added On: April 10th, 2014]
- Bioinformatics tool - Video [Last Updated On: April 10th, 2014] [Originally Added On: April 10th, 2014]
- individual assignment CADD-Bioinformatics tools - Video [Last Updated On: April 10th, 2014] [Originally Added On: April 10th, 2014]
- MARC travel awards announced for the 2014 Great Lakes Bioinformatics Conference [Last Updated On: April 15th, 2014] [Originally Added On: April 15th, 2014]
- PH genome center unveils facility powered by IBM supercomputer [Last Updated On: April 15th, 2014] [Originally Added On: April 15th, 2014]
- Part 1 : Introduction to Bioinformatics, PDB and EMBL-EBI - Video [Last Updated On: April 15th, 2014] [Originally Added On: April 15th, 2014]
- Bioinformatics - Video [Last Updated On: April 15th, 2014] [Originally Added On: April 15th, 2014]
- March's Bioinformatics Papers of Note [Last Updated On: April 18th, 2014] [Originally Added On: April 18th, 2014]
- BMC Bioinformatics - BioMed Central | The Open Access ... [Last Updated On: April 22nd, 2014] [Originally Added On: April 22nd, 2014]
- Bioinformaticsweb.co.nr:Open Access Bioinformatics ... [Last Updated On: April 22nd, 2014] [Originally Added On: April 22nd, 2014]
- Penn Bioinformatics Profiling Identifies a New Mammalian Clock Gene [Last Updated On: April 22nd, 2014] [Originally Added On: April 22nd, 2014]
- Bioinformatics profiling identifies a new mammalian clock gene [Last Updated On: April 22nd, 2014] [Originally Added On: April 22nd, 2014]
- UST Bioinformatics 2014 Project YOUTUBE - Video [Last Updated On: April 22nd, 2014] [Originally Added On: April 22nd, 2014]
- First Sex Determining Genes Appeared In Mammals Some 180 Million Years Ago [Last Updated On: April 25th, 2014] [Originally Added On: April 25th, 2014]
- Pronounce Medical Words Bioinformatics - Video [Last Updated On: April 25th, 2014] [Originally Added On: April 25th, 2014]
- Funding Update: NIH Bioinformatics Grants Awarded March 13, April 24, 2014 [Last Updated On: April 25th, 2014] [Originally Added On: April 25th, 2014]
- BIOINFORMATICS - blogspot.com [Last Updated On: April 25th, 2014] [Originally Added On: April 25th, 2014]
- Biology, Computers Collide in High-Demand Field of Bioinformatics - Video [Last Updated On: April 25th, 2014] [Originally Added On: April 25th, 2014]
- Researchers Discover Effect Of Circulating Cell Types On Cardiovascular Health [Last Updated On: April 30th, 2014] [Originally Added On: April 30th, 2014]
- The Genomics and Bioinformatics Group [Last Updated On: April 30th, 2014] [Originally Added On: April 30th, 2014]
- BIT001 Bioinformatics assignments 5 and 6 - Video [Last Updated On: April 30th, 2014] [Originally Added On: April 30th, 2014]
- Visual Genome Analysis Suite Bioinformatics Software Demonstration - Video [Last Updated On: May 1st, 2014] [Originally Added On: May 1st, 2014]
- Bioinformatics Erasing the line between biology and hacking Krystal Thomas White and Patrick Thomas - Video [Last Updated On: May 1st, 2014] [Originally Added On: May 1st, 2014]
- Cambridge genomics duo in the steps of Pasteur [Last Updated On: May 2nd, 2014] [Originally Added On: May 2nd, 2014]
- bioinformatics-phamerator - Video [Last Updated On: May 3rd, 2014] [Originally Added On: May 3rd, 2014]
- Metal Slug 3 Soundtrack - Bioinformatics Extendido - Video [Last Updated On: May 3rd, 2014] [Originally Added On: May 3rd, 2014]
- Bioinformatics approach helps researchers find new uses for old drug [Last Updated On: May 5th, 2014] [Originally Added On: May 5th, 2014]
- Bioinformatics Approach Helps Researchers Find New Use for Old Drug [Last Updated On: May 5th, 2014] [Originally Added On: May 5th, 2014]
- Biotech industry to touch $7-bn mark by FY15: Study [Last Updated On: May 6th, 2014] [Originally Added On: May 6th, 2014]
- Biotech industry to touch $7 bn mark by FY15-end: study [Last Updated On: May 6th, 2014] [Originally Added On: May 6th, 2014]
- How immune cells use steroids [Last Updated On: May 9th, 2014] [Originally Added On: May 9th, 2014]
- IP Update: New York University, Microsoft among Recent Bioinformatics Patent Winners [Last Updated On: May 10th, 2014] [Originally Added On: May 10th, 2014]
- Introduction to Bioinformatics Presenter - Video [Last Updated On: May 10th, 2014] [Originally Added On: May 10th, 2014]
- Dr. Jessica Schlueter Discusses Bioinformatics Research - Video [Last Updated On: May 11th, 2014] [Originally Added On: May 11th, 2014]
- Bioinformatics Software Carpentry Bootcamp - Session 4 - Video [Last Updated On: May 11th, 2014] [Originally Added On: May 11th, 2014]
- SBRI backs Eagles genomic data technology [Last Updated On: May 12th, 2014] [Originally Added On: May 12th, 2014]
- Bioinformatics firm SolveBio Seeks to Build Business on Providing Painless Access to Curated Data [Last Updated On: May 16th, 2014] [Originally Added On: May 16th, 2014]
- Szilak Lab Bioinformatics and Molecule Design, HUNGARY (MIT-LS 2014) - Video [Last Updated On: May 16th, 2014] [Originally Added On: May 16th, 2014]
- Happy birthday Prof Usman from bioinformatics team - Video [Last Updated On: May 16th, 2014] [Originally Added On: May 16th, 2014]
- KARUNYA BIOINFORMATICS - Video [Last Updated On: May 18th, 2014] [Originally Added On: May 18th, 2014]
- Global Market for Biomarkers to Reach $53.6 Billion in 2018; Bioinformatics to Move at 17.4% CAGR [Last Updated On: May 20th, 2014] [Originally Added On: May 20th, 2014]
- Careers in Bioinformatics and Precision Medicine - Career Development Week - Video [Last Updated On: May 21st, 2014] [Originally Added On: May 21st, 2014]
- Big Data Lets Cancer Researchers Put Old Drugs to New Uses [Last Updated On: May 22nd, 2014] [Originally Added On: May 22nd, 2014]
- Fugeneio The Fest - Fugeitorium - Bioinformatics Experience in 3D - Video [Last Updated On: May 22nd, 2014] [Originally Added On: May 22nd, 2014]
- The Hyve - OpenSource Bioinformatics - Video [Last Updated On: May 24th, 2014] [Originally Added On: May 24th, 2014]
- DNA sequences on the go, with an app born in a Singapore lab [Last Updated On: May 29th, 2014] [Originally Added On: May 29th, 2014]
- :: 29, May 2014 :: POCKET SCIENCE: NEW MOBILE APPLICATION ENABLES DNA ANALYSIS ON THE GO [Last Updated On: May 29th, 2014] [Originally Added On: May 29th, 2014]
- Bioinformatics Market is Expected to Grow at a CAGR of over 23.0% from 2014 to 2020 New Report Published By Grand View ... [Last Updated On: May 29th, 2014] [Originally Added On: May 29th, 2014]
- Professor Bud Mishra, PhD Joins the Science Advisory Board of InSilico Medicine Engaged in Aging Research for Drug ... [Last Updated On: May 30th, 2014] [Originally Added On: May 30th, 2014]
- A holistic view on bioinformatics market - Video [Last Updated On: June 1st, 2014] [Originally Added On: June 1st, 2014]
- Viral Safety Testing using an advanced next generation sequencing and bioinformatics platform - Video [Last Updated On: June 1st, 2014] [Originally Added On: June 1st, 2014]
- Balti and Bioinformatics - Tom Connor - CLIMB - Video [Last Updated On: June 2nd, 2014] [Originally Added On: June 2nd, 2014]
- Global Bioinformatics Market All Set to Register a CAGR of 21.2% According to The Newly Added Report at Analyze Future [Last Updated On: June 4th, 2014] [Originally Added On: June 4th, 2014]
- Bioinformatics basic database and tools - Video [Last Updated On: June 4th, 2014] [Originally Added On: June 4th, 2014]
- Bioinformatics Market Reports by Analyze Future - Video [Last Updated On: June 4th, 2014] [Originally Added On: June 4th, 2014]
- Bioinformatics | Johns Hopkins University Engineering for ... [Last Updated On: June 6th, 2014] [Originally Added On: June 6th, 2014]
- Global Bioinformatics Market Report by Truemarketresearch - Video [Last Updated On: June 7th, 2014] [Originally Added On: June 7th, 2014]
- A Web-Based System for Automatic Bioinformatics Data Classification - Video [Last Updated On: June 12th, 2014] [Originally Added On: June 12th, 2014]
- Bioinformatics Welcome Video - Video [Last Updated On: June 12th, 2014] [Originally Added On: June 12th, 2014]
- Bioinformatics Services: Surfing the Data Wave - Video [Last Updated On: June 12th, 2014] [Originally Added On: June 12th, 2014]
- 4th International Conference on Proteomics & Bioinformatics - Video [Last Updated On: June 12th, 2014] [Originally Added On: June 12th, 2014]
- Institute of Bioinformatics featured on Rajyasabha TV - Video [Last Updated On: June 14th, 2014] [Originally Added On: June 14th, 2014]
- Global Bioinformatics Market - Analysis, Opportunities, Segmentation and Forecast, 2013 - 2020 - Video [Last Updated On: June 21st, 2014] [Originally Added On: June 21st, 2014]
- 6/18/14 Bioinformatics: Computer Technology & Biological Info on Across The Fence - Video [Last Updated On: June 21st, 2014] [Originally Added On: June 21st, 2014]
- JAX, Frasergen announce cancer genomics facility in Hubei Province, China [Last Updated On: June 22nd, 2014] [Originally Added On: June 22nd, 2014]
- UK-Colombia alliance on global food security [Last Updated On: June 22nd, 2014] [Originally Added On: June 22nd, 2014]
- Global Bioinformatics Market Size, Trends, Analysis, Report, Growth, Forecast 2013 - 2020 - Video [Last Updated On: June 23rd, 2014] [Originally Added On: June 23rd, 2014]
- Aging Accelerates Genomic Changes, Signaling Challenges for Personalized Medicine [Last Updated On: June 24th, 2014] [Originally Added On: June 24th, 2014]
- Researchers treat incarceration as a disease epidemic, discover small changes help [Last Updated On: June 26th, 2014] [Originally Added On: June 26th, 2014]
- Atul Butte, MD, discusses bioinformatics in pediatric health, Packard Children's Hospital - Video [Last Updated On: June 27th, 2014] [Originally Added On: June 27th, 2014]
- Metal Slug 3 OST#23 Bioinformatics - Video [Last Updated On: June 27th, 2014] [Originally Added On: June 27th, 2014]