Bioinformatics Support

 

The Genomics Core of the NARF provides primary base calling, QC and initial deconvolution (demultiplex process) of sequence data to fastQ format at no additional cost. We also provide bioinformatic services and support on a fee for services basis. For each project, we will discuss the details of the research project, define deliverables, and project cost and expected turnaround time for completion. Bioinformatics analyses of high-throughput data, using standard analysis pipelines, are outlined below. 

 

Quality score Illumina reads

 

Next-generation sequencing generates gigabase to terabase of data output in millions to billions of short 50-300 base raw reads. The bioinformatics team of the Genomics Core has established pipelines that run after each run to perform initial base calling, demultiplexing step and to assess read quality (FastQC software). These pre-aligment quality assessment provided by FastQC software include:

  • Per-base sequence quality
  • Per-base sequence content
  • Per-base GC content
  • Search for overrepresented sequences (adapters, primers, etc)

Output files are made available as compressed FASTQ file. These sequence files and QC metrics and graphics are available on an FTP server for secure downloads.

 

RNA-seq analysis overview - from reads to differentially expressed genes

 

Reads from a RNA-seq experiment are generally processed using the RNA-seq aligner STAR. This step involves:

  • mapping high quality reads to the reference genome using STAR
  • featureCounts to quantify gene expression
  • limma+voom and/or DESeq2 to identify differentially expressed genes

The Bioinformatics team also has available a variety of open-source softwares, including the Tuxedo suite tools (Bowtie, Tophat, Cufflinks/CuffDiff and cummenRbund) wich can be used for RNA-seq alignment, assembly, transcript abundance estimation, and differential expression analyses.

Circular representation of Sneathia SN35 genome

 

 The preferred platform for small-to-medium sized genome assembly projects is the Illumina MiSeq instrument for both cost and efficiency. The MiSeq provides 2 X 300 base paired-end sequences, which are prefered to efficient assembly. The reads undergo a through quality filtering as outlined above, trimming to eliminate clearly inappropriate length sequences, followed by a low-complexity purge and poly A/T clipping, before being assembled using Spades or proprietary Newbler Assembler software. The high quality reads are mapped back to the assembly to calculate coverage metrics.

 

 

 

Reads visualization using  Integrative Genomics Viewer (IGV)

 

The whole/targeted genome and exome sequencing are processed using optimized in-house pipelines for effective and accurate variant calling using the GATK (Genome Annalysis ToolKit). This pipeline involves:

  • read alignment
  • duplicate removal
  • indel realignment
  • base recalibration
  • SNP/INDEL calling
  • variant recalibration
  • filtering

Variants can further be annotated using ANNOVAR or similar software.

 

microRNA have short sequence length with 20-50 nt size range. Customize trimming is required before performing any downstream analysis. Our pipeline includes:

  • custom trimming to clip off Illumina adapter sequences from the 3' end,
  • miRNA quantification using mirBASE 22 release