PocilloporaBase: Pocillopora Transcriptomics Database
 

Adult coral RNA extraction, sequencing and assembly

Adult Pocillopora colonies were collected from three geographically isolated populations in Oahu, Hawaii. Coconut Island, on the northeastern corner of Oahu, is considered to be "recovering" from significant human impacts [1]. Sand Island, off the southern coast of Oahu, just west of Honolulu, abuts a heavily industrialized area that houses the chief sewage treatment plant for all of metropolitan Honolulu [2]. Relative to these two sites, Waimanalo, on the eastern side of Oahu is relatively un-impacted by human activity. Three to four individual colonies were collected from each site. Each colony was fragmented into nubbins upon arrival, and were kept in a open system out door sea water table for two weeks before being subjected to a range of biologically relevant stressors administered in a controlled laboratory setting including desiccation (four hours out of water), hypo-saline shock (two hours in fresh water), heat shock (50°C for 1 hour), and peroxide exposure (2 hours in sea water supplemented with 10% peroxide). Total RNA from stressed and control nubbins was extracted using Trizol [3]. To produce the reference transcriptome described here, aliquots of all the individual RNA samples were pooled prior to sequencing. The pooled RNA sample was then shipped to Beckman Coulter for library preparation and sequencing using the 454 sequencing technology [4,5].

References:

  1. Hunter CL, Evans CW: Coral-Reefs in Kaneohe Bay, Hawaii - 2 Centuries of Western Influence and 2 Decades of Data. B Mar Sci 1995, 57(2):501-515.
  2. Grigg RW: Coral reefs in an urban embayment in Hawaii: A complex case history controlled by natural and anthropogenic stress. Coral Reefs 1995, 14(4):253-266.
  3. Rio DC, Ares M, Jr., Hannon GJ, Nilsen TW: Purification of RNA using TRIzol (TRI reagent). Cold Spring Harb Protoc 2010, 2010(6):pdb prot5439.
  4. Rothberg JM, Leamon JH: The development and impact of 454 sequencing. Nat Biotechnol 2008, 26(10):1117-1124.
  5. Mardis ER: Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 2008, 9:387-402.

Bioinformatics

Assembly, Identification of contigs, and Pfam domain searches

After sequencing, short reads (<40 nucleotides) and low quality reads that did not overlap with other sequencing reads were discarded, and the remaining 955,910 sequencing reads were assembled using MIRA3 [1]. Singletons were included in the following contig analysis. Contig sequences were than blasted against the adaptor sequences used for both the library preparation and the sequencing to insure that none of the contig sequences were contaminated with adaptors. Adaptor sequences were trimmed, and the assembled contigs were used to sequentially query the NCBI non-redundant protein database using BLASTX with an E-value cut-off of 0.001. The top five gene hits were assigned to each contig. All five of the top hits usually agreed on gene ontology and taxonomy, but where they disagreed, we associated multiple GO terms and multiple possible taxonomic affinities with a given contig. In order to identify conserved protein domains, all six open reading frames were blasted against the protein domain database at Pfam [2]. Hits were retained only if they had an e- value cut off lower than 0.001.

GO term and KEGG analysis

Using the top five hits from the BLASTX search described above, each contig was assigned a list of associated protein GI numbers. GI numbers were converted to Entrez Gene IDs using the gene2accession conversion file from NCBI. Gene2go was then used to obtain relevant GO annotation for the five top BLASTX hits to each Pocillopora contig, and the GO term(s) were then associated with their respective contig. Protein GI numbers were cross-referenced to species-specific KEGG pathways [3]. Using KEGG's lists of annotated plants and animals, these pathways were organized into corresponding lists, and generalized KEGG pathway Ids were obtained. KEGG pathway analysis was then performed, and individual contigs were then mapped to different biochemical pathways using IPath [4]. The top 5 hits were chosen to increase the probability of finding a hit that would allow pairing each contig with its corresponding GO category and with the existent KEGG data. For example, using photosynthesis as an example, the top hits matched to Symbiodinium sequences, not plants. Because Symbiodinium is not represented within the KEGG database, the contigs did not appear to represent enzymes involved in photosynthesis when, in fact, they did.

References:

  1. Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WE, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 2004, 14(6):1147-1159.
  2. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K et al: The Pfam protein families database. Nucleic Acids Res 2010, 38(Database issue):D211-222.
  3. Kanehisa M: The KEGG database. Novartis Found Symp 2002, 247:91- 101; discussion 101-103, 119-128, 244-152.
  4. Letunic I, Yamada T, Kanehisa M, Bork P: iPath: interactive exploration of biochemical pathways and networks. Trends Biochem Sci 2008, 33(3):101-103.