Blog
Opening up world’s fastest RNA structure prediction algorithm to the scientific community to support battle against coronavirus

2020-02-01

Back to list

The 2019 coronavirus, or 2019-nCoV, has evolved to infect humans, causing a range of flu-like symptoms that pose the risk of developing into more severe conditions. Due to the outbreak’s rapid spread in recent weeks, there have been 9692 reported cases and 213 deaths (as of Jan. 30). 

 

In response, Baidu is helping shoulder the responsibility to support virus control and prevention while promoting long-term public health education and security. Baidu intends to continue to invest in research and technology developments that will provide aid and resistance to this and future outbreaks.

 

Among these research contributions, Baidu is opening up LinearFold to the world, the world’s fastest algorithm for ribonucleic acid (RNA) secondary structure prediction, which was published last year in partnership with Oregon State University and University of Rochester. This innovative algorithm can significantly speed up the prediction of the secondary structure of an RNA sequence over traditional RNA folding algorithms.

 

As an example of LinearFold’s efficiency, by applying the tool to the secondary structure prediction for the 2019-nCoV RNA sequence, our AI scientists have reduced analysis time from 55 minutes to 27 seconds, showcasing a 120-fold speed-up. 

 

We believe that access to this tool could benefit the scientific community as predicting the RNA folding from its sequence can improve our understanding of the coronavirus’s biological functions to further facilitate RNA virus analysis and vaccine development.

 

Today, we are announcing the creation of the LinearFold website, providing a source for the world’s fastest RNA structure prediction site. The site will be freely available to industry experts around the globe. If you are a scientific research unit, genetic testing agency, or epidemic prevention center that needs technical support for RNA structure predictions, please contact RNA@baidu.com.  

 

A promising potential to accelerate coronavirus research

 

The 2019-nCoV belongs to a family of enveloped coronaviruses that are single-stranded RNA viruses. Compared to double-stranded DNA viruses, single-stranded RNA viruses (such as HIV, Ebola, influenza, and coronaviruses) mutate much faster, which poses challenges to vaccine development. With time, 2019-nCoV will likely continue to mutate as it circulates between humans, causing it to become increasingly unpredictable and harder to control. Compared to the SARS (severe acute respiratory syndrome) outbreak in 2003, which infected 8,098 and killed 774 in 17 countries, the incubation period of 2019-nCoV lasts longer, spanning up to two weeks, and is highly contagious. 

 

The challenge with existing algorithms for RNA secondary structure prediction is their significant limitation: their runtimes scale cubically with the RNA length, meaning that the compute time jumps eightfold if the RNA length doubles. This computational lag-time limits their applicability on RNA viruses with large genomes such as HIV, Ebola, and in particular, the coronavirus family that ranges from 26 to 32 kilobases, the largest for an RNA virus. 

 

With Baidu’s LinearFold, it takes less than half a minute to analyze the structural information of the virus. This efficiency is potentially beneficial for a deeper understanding of this virus and its vaccine development.

 

How LinearFold works

 

LinearFold is the first RNA folding algorithm to achieve linear runtime. 

 

To create LinearFold, our researchers borrowed techniques from computational linguistics, specifically incremental parsing algorithms, to allow the algorithm to scan the RNA sequence in a left-to-right (5′-to-3′) rather than in a bottom-up fashion. Additionally, they also applied beam search, a popular heuristic, to prune the search space. 

 

As it was being developed, researchers became increasingly excited by Linearfold’s ability to achieve more accurate predictions on longer-sequence families as well as improved accuracies for long-range base pairs, exceeding its original performance expectations and earning it a placement in the ISMB 2019 (Intelligent Systems for Molecular Biology), the top academic conference in bioinformatics, as well as the Bioinformatics journal, the leading journal in this bioinformatics. 

 

As a result, the LinearFold website pushes the edge of RNA secondary structure predictions in both speed and scalability, handling RNA sequences with up to 100 kilobases. 

 

Our battle against the coronavirus

 

In addition, Baidu has also launched several additional resources including an index for official updates, a curated news aggregator with a focus on the virus, a free service channel to consult doctors online, a dedicated page to curb the spread of misinformation and more. Baidu Maps also released a live map for tracking the spread of the virus and an emergency map of fever hospitals. 

 

We urge everyone to stay safe and encourage referencing accurate updates to avoid misinformation.