DeepBioSeq: Deep Learning for Next Generation Sequencing Data

DeepBioSeq: Deep Learning for Next Generation Sequencing Data

Deep learning technologies are making an impact, particularly with image analysis and object detection. Applications to Next Generation Sequencing data are however still at an early stage ...

In DeepBioSeq, we develop state-of-the-art convolutional neural networks, specifically deep residual networks (https://github.com/broadinstitute/keras-resnet) to analyze sequencing data, and in particular RNA-Seq data. Commonly used genomic alignment steps can be biased and prone to inaccuracies. A key feature of DeepBioSeq is that there is no need for commonly used sequencing data preprocessing or genomic alignment steps. DeepBioSeq uses raw transcriptomics sequencing data (.fastq files) for the investigation and classification of active processes within biological phenomena of interest. Ensuring high data quality processing, we explicitly consider the quality score of the raw sequence reads for training the deep learning network. The algorithm is also applicable to single cell sequencing analysis, ChIP sequencing and genomic sequencing data sets.