Sequence information is also available through NCBIat dbEST and the Trace Archives.

 

TSA

GFWJ01000001– GFWJ01400489

SRA

SRX3061857

 

RNASeq library was prepared with Illumina's TruSeq Stranded mRNAseq Sample Prep kit (Illumina) with one modification: fragmentation was done at 94*C for one minute. Reads are 250nt in length. Read1 aligns to the antisense strand, Read 2 aligns to the sense strand. Sequencing was performed on an Illumina HiSeq 2500. Library adaptors have been trimmed from the 3'-end of the reads.

 

Illumina Raw Data

Henry.201682.tgz

Initial read quality

C_atra_RNA1_ATCACG_L002_R1_001_fastqc.html
C_atra_RNA1_ATCACG_L002_R2_001_fastqc.html

 

Paired-end, stranded 250bp reads assembled using TrinityRNASeq v2.2.0 running default settings, including defaults for read trimming (Trimmomatic as implemented in TrinityRNASeq) and normalization (as implemented in Trinity RNASeq). Assembled using kmer size of 25bp (default) and 31bp.

 

Preliminary Assemblies

Trinity kmer25 Assembly
Trinity kmer31 Assembly

Reports

Trinity kmer25 Stats
Trinity kmer31 Stats

 

Assembly filtered using kallisto to remove all transcripts with TPM lower than 1. Further cleaning of filtered k31 assembly using MCSC pipeline, a decontamination method that uses hierarchical clustering and taxon identification based on sequence similarity to remove contaminants without pre-existing knowledge of contaminants present. Pipeline used as described in Lafond-Lapalme et al., 2016, Bioinformatics.

 

Filtered Assemblies

Trinity.k25.shorthead.kallisto.minTPM1.fa
Trinity.k31.shorthead.kallisto.minTPM1.fa
Trinity.k25.shorthead.kallisto.minTPM1decont.fasta

 

Trinotate annotation report incorporating results of blastx searches of the assembled contigs and blastp searches of the longest open reading frames as predicted by Transdecoder (v2.0.1, https://transdecoder.github.io/) against the Uniprot Swissprot database (rel 30-Nov-2016) using BLAST+ (v2.6.0). Only the top hits to the UniProt database were retained. Protein domains predicted by HMMER (v3.0) and top hits from a Diamond blastx search (v0.8.5) against the UniRef90 database (Release 2016-11) used as implemented in the MCSC decontamination pipeline were also incorporated. Trinotate adds Gene Ontology and KEGG pathway information to the output based on the SwissProt database.

 

Trinotate Annotation Report

Catra.k31.TPM1filt.trinotate_annotation_report.xls
CatraTrinitt.k31.TPM1.MCSCdecon.Trinotate.xls

This research is funded by N.S.F. Grant IOS 1558061 to J.J.H.