An international group of researchers in collaboration with DOE's Joint Genome Institute has sequenced the genome of P. tricornutum using whole genome shotgun (WGS) sequencing. The clone of P. tricornutum that was sequenced is CCAP1055/1 and is available from the Culture collection of Algae and Protozoa (http://www.ccap.ac.uk). This clone represents a monoclonal culture derived from a fusiform cell in May 2003 from strain CCCP632, which was originally isolated in 1956 off Blackpool (U.K.). It has been maintained in culture continuously in F/2 medium. The 27.4 Mb genome assembly contains 33 chromosomes and 55 scaffolds.
In 2021, Oxford Nanopore Technologies long-read sequencing was used to update and validate the quality and contiguity of the P. tricornutum genome. Despite repetitive DNA sequences caused problems for current genome assembly algorithms, this sequencing allowed to resolve previously uncertain genomic regions and further characterize complex structural variation.
The genome of P. tricornutum was annotated using the JGI annotation pipeline, which combines several gene prediction, annotation and analysis tools. 10,402 gene models were predicted, 86% of the genes are supported by ESTs and 60-65% show homology to proteins in SwissProt.
There are two parts to the P. tricornutum genome sequence assembly and annotation reported here: the Phatr2 "finished chromosomes" and the Phatr2_bd "unmapped sequence". The finished chromosomes consist of the finished genome sequence that could be reliably assembled into chromosomes. The "unmapped sequence" consists of assembled scaffolds that could neither be mapped to finished chromosomes nor assigned to organelles, but that could be aligned to P. tricornutum ESTs which were not represented in the finished chromosomes.
The Phaeodactylum tricornutum genome is a reannotation with 12,089 gene models predicted by using existing gene models, expression data and protein sequences from related species used to train the SNAP and Augustus gene prediction programs using the MAKER2 annotation pipeline. The inputs were:
10,402 gene models from a previous genebuild from JGI
13,828 non-redundant ESTs
42 libraries of RNA-Seq generated using Illumina technology
49 libraries of RNA-Seq data generated under various iron conditions using SoLiD technology
93,206 Bacillariophyta ESTs from dbEST, 22,502 Bacillariophyta and 118,041 Stramenopiles protein sequences from UniProt.
Using mass spectrometry-based proteomics data, approximately 8300 Phatr2 genes were confirmed, and 606 novel proteins, 506 revised genes, 94 splice variants were identified.
Comparison of cells grown in-replete conditions collected at 4, 8, 20, and 36h with nitrate starved cells collected at 4, 8, and 20h, dark treatment for 8h, nocodazole treatment for 20h, and phosphate starvation for 36h
The first whole-genome methylome has been obtained by digestion with the methyl-sensitive endonuclease McrBC followed by hybridization to McrBc-chip tiling array of the P. tricornutum genome. Next bisulfite deep sequencing has been used to compare DNA methylation in low and replete nitrogen conditions.
5 histone PTMs were identified using mass spectrometry: H3K4me2, H3K9me2, H3K27me3, H4K91Ac ans H3K9me3. Three were used to compare histone marks in low and replete nitrogen conditions.
Repeats were collectively found to contribute ~3.4Mb (12%) of the assembly, including transposable elements (TEs), unclassified and tandem repeats, as well as fragments of host genes.
Five small RNA libraries within the 18 to 36 nt RNA fraction were extracted from cells grown under different conditions: Normal Light (NL), High Light (HL), Low Light (LL), Dark (D), and Iron starvation (−Fe).
In this study related to responses to P depletion, long intergenic nonprotein coding RNAs (lincRNAs) were defined as sequences with a length of ≥ 200 nucleotides and a predicted open reading frame (ORF) of ≤ 100 amino acids.
In the file below you will find Supporting informations for the Phatr3 genes: corresponding ID (Phatr2, NCBI...), KEGG, GO, Domains, Targeting predictions, Evolutionary origins.