Phylogeny.fr has been designed to provide a high performance platform that transparently chains programs relevant to phylogenetic analysis in a comprehensive, and flexible pipeline. Although phylogenetic aficionados will be able to find most of their favorite tools and run sophisticated analysis, the primary philosophy of Phylogeny.fr is to assist biologists with no experience in phylogeny in analyzing their data in a robust way.
The Phylogeny.fr platform offers a phylogeny pipeline which can be executed through three main modes:
The "One Click mode" targets users that do not wish to deal with program and parameter selection. By default, the pipeline is already set up to run and connect programs recognized for their accuracy and speed (MUSCLE for multiple alignment and PhyML for phylogeny) to reconstruct a robust phylogenetic tree from a set of sequences.
In the "Advanced mode", the Phylogeny.fr server proposes the succession of the same programs but users can choose the steps to perform (multiple sequence alignment, phylogenetic reconstruction, tree drawing) and the options of each program.
The "A la carte mode" offers the possibility of running and testing more alignment and phylogeny programs, such as MUSCLE, ClustalW, T-Coffee, PhyML, BioNJ, TNT,...
Alternatively, users have the possibility to run the different programs separately.
↑ Table of ContentThis is a "default" mode which proposes a pipeline already set up to run and connect programs recognized for their accuracy and speed (MUSCLE for multiple alignment, optionally Gblocks for alignment curation, PhyML for phylogeny and finally TreeDyn for tree drawing) to reconstruct a robust phylogenetic tree from a set of sequences.
 
   
What users have to do is just to copy and paste their set of sequences in the FASTA format (or upload their FASTA file) and to click the Submit button. The system will do all the rest work, all the parameters are those of programs by default. However, users are able to decide to use or not the Gblocks program to eliminate poorly aligned positions and divergent regions, by checking the corresponding checkbox in the form page. At the end of the analysis, the server displays a publication quality image of the phylogenetic tree.
| Step | Program used | Settings | Notes | 
|---|---|---|---|
| Alignment | MUSCLE 3.7 | 
 | Several studies and especially the BAliBASE benchmark showed that MUSCLE achieved the highest ranking of any method at the time of publication. | 
| Alignment refinement | Gblocks 0.91b | 
 | This step is optional. Gblocks eliminates poorly aligned positions and divergent regions (removes alignment noise). Parameters are set to their default values in Gblocks. These are rather stringent; e.g. all positions with gaps are removed. | 
| Phylogeny | PhyML 3.0 | 
 | PhyML was shown to be at least as accurate as other existing phylogeny programs using simulated data, while being one order of magnitude faster. HKY fits well in most cases as it modelizes the main features of DNA substitutions: transition/transversion and unequal base frequencies. LG has been shown to be the best amino-acid replacement matrix to date. Gamma distributed rates are mandatory in most (if not all) analyses, and using invariant sites generally improves (never degrade) the fit. | 
| Tree rendering | TreeDyn 198.3 | 
 | TreeDyn offers many tree customization options compared to other tree rendering tools and especially for tree annotations. The starting output tree is rooted using mid-point rooting method (performed by Retree from PHYLIP package) but the user can reroot the tree using our dynamic tree editing interface. | 
  The pipeline is the same as the "One Click" mode but is flexible
  enough to allow users to select which steps to perform. In this manner, the
  input data can be a set of non-aligned sequences in FASTA format, an
  alignment of multiple sequences in FASTA, PHYLIP or Clustal format, or a
  tree in NEWICK format.
  Users are provided with options to set the parameters of the different
  programs of the pipeline.
 
Furthermore, the system offers the possibility to control results of each step before launching the next program, so that users can modify and properly adjust parameters for a given task. This is possible in checking the "Step by step" option.
In the case of an "All at once", at the end of the pipeline processing, users have access to detailed reports for all the different analysis steps of the pipeline through different tabs of results.
 
   
At the end of the analysis, users have the possibility to look at a schematic representation of the workflow, with details about software options, so as the references of the different selected tools. This summary of the run is accessible through the "Overview" tab since the analysis is finished. This fonctionality is also observable in the "One Click mode".
 
In the case of an "Step by step" analysis, users have the possibility to control and edit the results of each step before launching the next program.
 
The server offers the possibility to run other alignment and phylogeny programs than those preselected in the "One Click" and "Advanced" modes:
 
A fast BLAST search on Gigablaster allows to quickly explore your sequence neighbors. Paste your single sequence, run blast and explore its homologous sequences. The system facilitates the selection of homologous sequences, based on a 'quick-and-dirty' phylogenetic representation using BLAST results and an estimator of the final multiple alignment length.
 
| Program | Function | Input | Output | Speed | Current Limitations | Use | 
|---|---|---|---|---|---|---|
| Blastall 2.2.17 | Sequence searching | Raw, FASTA | FASTA | Fast | None | Advanced and "A la Carte" | 
| MUSCLE 3.7 | Multiple alignment | FASTA, EMBL/Uniprot, GenBank, PAUP*/Nexus | FASTA, Clustal, PHYLIP | Fast | <200 nucleic sequences, <6000 sites <200 protein sequences, <2000 sites | All modes Large dataset | 
| T-Coffee 5.56 | Multiple alignment | FASTA, EMBL/Uniprot, GenBank, PAUP*/Nexus | FASTA, Clustal, PHYLIP | Very slow | <50 nucleic sequences, <2000 sites <50 protein sequences, <2000 sites | "A la Carte" Small dataset | 
| 3DCoffee 5.56 | Multiple alignment using structural information | FASTA, EMBL/Uniprot, GenBank, PAUP*/Nexus | FASTA, Clustal, PHYLIP | Very slow | <50 nucleic sequences, <2000 sites <50 protein sequences, <2000 sites | "A la Carte" Small dataset | 
| ClustalW 2.0.3 | Multiple alignment | FASTA, EMBL/Uniprot, GenBank, PAUP*/Nexus | FASTA, Clustal, PHYLIP | Fast | <200 nucleic sequences, <4000 sites <200 protein sequences, <2000 sites | "A la Carte" Large dataset | 
| Gblocks 0.91b | Alignment refinement | FASTA, Clustal, PHYLIP, EMBL, PAUP*/Nexus | FASTA, Clustal, PHYLIP | Fast | None | All modes Large dataset | 
| PhyML 3.0 | Phylogeny using maximum likelihood | FASTA, Clustal, PHYLIP, EMBL, PAUP*/Nexus | Newick | Fast to Slow | (sequence size) × (number of taxa)2 <100000000 | All modes Medium to large dataset | 
| TNT 1.1 | Phylogeny using parsimony | FASTA, Clustal, PHYLIP, EMBL, PAUP*/Nexus | Newick | Fast to Slow | <200 sequences, <6000 sites | "A la Carte" Medium to large dataset | 
| FastDist/Protdist + BioNJ / Neighbor (PHYLYP) 3.67 | Phylogeny using distances | FASTA, Clustal, PHYLIP, EMBL, PAUP*/Nexus | Newick | Fast | <5000 taxa for BioNJ <500 taxa for Neighbor | "A la Carte" Large dataset | 
| Bootstrap with PhyML and TNT | Estimations of clade supports | FASTA, Clustal, PHYLIP, EMBL, PAUP*/Nexus | Newick | Very slow | <500 replicates with PhyML <1000 replicates with TNT | Advanced and "A la Carte" Small to medium datasets | 
| Bootstrap with distance methods | Estimations of clade supports | FASTA, Clustal, PHYLIP, EMBL, PAUP*/Nexus | Newick | Slow | <1000 replicates with FastDist <1000 replicates with Protdist | "A la Carte" Medium to large datasets | 
| TreeDyn 198.3 | Tree rendering | Newick, New Hampshire Extanded, PAUP*/Nexus | Newick, PNG, PS, PDF, SVG, TGF | Fast | None | All modes | 
| Drawgram 3.67 | Various tree shapes rendering | Newick, New Hampshire Extanded, PAUP*/Nexus | PNG, PS, PDF | Fast | None | "A la Carte" | 
| Drawtree 3.67 | Unrooted tree rendering | Newick, New Hampshire Extanded, PAUP*/Nexus | PNG, PS, PDF | Fast | None | "A la Carte" | 
This format can be used for single sequence, multiple sequences or alignment. Each sequence begins with a single header line providing the sequence name (optionnaly description), followed by lines of sequence data. The description line must start with a greater-than (">") symbol in the first column.
>Homo sapiens MEVEAVCGGAGEVEAQDSDPAPAFSKAPGSAGHYELPWVEKYRPVKLNEIVGNEDTVSRLEVFAREGNVP NIIIAGPPGTGKTTSILCLARALLGPALKDAMLELNASNDRGIDVVRNKIKMFAQQKVTLPKGRHKIIIL DEADSMTDGAQQALRRTMEIYSKTTRFALACNASDKIIEPIQSRCAVLRYTKLTDAQILTRLMNVIEKER VPYTDDGLEAIIFTAQGDMRQALNNLQSTFSGFGFINSENVFKVCDEPHPLLVKEMIQHCVNANIDEAYK ILAHLWHLGYSPEDIIGNIFRVCKTFQMAEYLKLEFIKEIGYTHMKIAEGVNSLLQMAGLLARLCQKTMA PVAS >Arabidopsis thaliana MASSSSTSTGDGYNEPWVEKYRPSKVVDIVGNEDAVSRLQVIARDGNMPNLILSGPPGTGKTTSILALAH ELLGTNYKEAVLELNASDDRGIDVVRNKIKMFAQKKVTLPPGRHKVVILDEADSMTSGAQQALRRTIEIY SNSTRFALACNTSAKIIEPIQSRCALVRFSRLSDQQILGRLLVVVAAEKVPYVPEGLEAIIFTADGDMRQ ALNNLQATFSGFSFVNQENVFKVCDQPHPLHVKNIVRNVLESKFDIACDGLKQLYDLGYSPTDIITTLFR IIKNYDMAEYLKLEFMKETGFAHMRICDGVGSYLQLCGLLAKLSIVRETAKAP
  This format can be used for single sequence or multiple sequences.
  Each sequence entry begins with a line containing the word "LOCUS",
  indicating the short name for this sequence, followed by several annotation
  lines. The start of each sequence is marked by a line containing the word
  "ORIGIN" and the end of each sequence is marked by two slashes
  ("//").
LOCUS       CAA36839                 152 aa            linear   PRI 14-NOV-2006
DEFINITION  calmodulin [Homo sapiens].
ACCESSION   CAA36839
VERSION     CAA36839.1  GI:825635
DBSOURCE    embl accession X52606.1
            embl accession X52607.1
            embl accession X52608.1
ORIGIN      
        1 madqlteeqi aefkeafslf dkdgdgtitt kelgtvmrsl gqnpteaelq dminevdadd
       61 lpgngtidfp efltmmarkm kdtdseeeir eafrvfdkdg ngyisaaelr hvmtnlgekl
      121 tdeevdemir eadidgdgqv nyeefvqmmt ak
//
LOCUS       CAA59418                 149 aa            linear   PLN 18-APR-2005
DEFINITION  calmodulin [Macrocystis pyrifera].
ACCESSION   CAA59418
VERSION     CAA59418.1  GI:728609
DBSOURCE    embl accession X85091.1
ORIGIN      
        1 madqlteeqi aefkeafslf dkdgdgtitt kelgtvmrsl gqnpteaelq dminevdadg
       61 ngtidfpefl tmmarkmkdt dseeeiieaf kvfdkdgngf isaaelrhim tnlgekltde
      121 evdemiread idgdgqinye efvkmmmak
//
  This format can be used for single sequence or multiple sequences.
  Each sequence starts with an identifier line containing the word
  "ID ", followed by several annotation lines. The start of each
  sequence is marked by a line starting with the identification "SQ",
  and the end of each sequence is marked by two slashes ("//").
ID   Homo sapiens; AA; UNK; 354 AA.
XX
AC   unknown;
XX
DE   
XX
FH   Key             Location/Qualifiers
FH
XX
SQ   Sequence 354 BP; 35 A; 8 C; 23 G; 18 T; 270 other;
     meveavcgga geveaqdsdp apafskapgs aghyelpwve kyrpvklnei vgnedtvsrl        60
     evfaregnvp niiiagppgt gkttsilcla rallgpalkd amlelnasnd rgidvvrnki       120
     kmfaqqkvtl pkgrhkiiil deadsmtdga qqalrrtmei yskttrfala cnasdkiiep       180
     iqsrcavlry tkltdaqilt rlmnvieker vpytddglea iiftaqgdmr qalnnlqstf       240
     sgfgfinsen vfkvcdephp llvkemiqhc vnanideayk ilahlwhlgy spediignif       300
     rvcktfqmae ylklefikei gythmkiaeg vnsllqmagl larlcqktma pvas             354
//
ID   Arabidopsis thaliana; AA; UNK; 333 AA.
XX
AC   unknown;
XX
DE   
XX
FH   Key             Location/Qualifiers
FH
XX
SQ   Sequence 333 BP; 27 A; 6 C; 21 G; 18 T; 261 other;
     masssststg dgynepwvek yrpskvvdiv gnedavsrlq viardgnmpn lilsgppgtg        60
     kttsilalah ellgtnykea vlelnasddr gidvvrnkik mfaqkkvtlp pgrhkvvild       120
     eadsmtsgaq qalrrtieiy snstrfalac ntsakiiepi qsrcalvrfs rlsdqqilgr       180
     llvvvaaekv pyvpegleai iftadgdmrq alnnlqatfs gfsfvnqenv fkvcdqphpl       240
     hvknivrnvl eskfdiacdg lkqlydlgys ptdiittlfr iiknydmaey lklefmketg       300
     fahmricdgv gsylqlcgll aklsivreta kap                                    333
//
  This is an alignment format.
  The first line contains the number of sequences and their length (in
  characters) separated by blanks.
  The next line contains the sequence name, followed by the sequence in blocks
  of 10 characters.
 7 100
T25          ACTATTGAAA GAAGGGGGTT CCTAGATATC TGCGAGTATA ATCGTGCTTG
T16          ATTAATCAAA GTAGGCGGGG CGGCCGTAGA TGCTAAGAAA ATCGAGTTCG
T27          ATTAATCAAA GTAGGCAGGG CGGCCGTAGA TGCTAAGAAA ATCGAGTTCG
T1           GTTAACCGAA GTAGGCGGAA CGGACGTATA TGCGATTAAA ATCGAGTTCG
T19          GTTAACCGAA GTAGGCGGAA CGGACGTATA TGCGATTAAA ATCGAGTTCG
T35          ATTAATCAAA GCAGGCGGTC CGGACGTATA TCCTAATAAA ATCGAGTTCG
T56          ATTAATCAAA GTAGGCGGTC CGGCCGAATA TGCGAATAAA ATCGAGTTCG
             GTCTCCTATC GATGCGCATC GGACCGAGAG GCTCTCCAGC CATGTGGACG
             GTCACCTCCC ATTGGGCAGC AGATCGCTAG GCTCTTTAGC CAGGTGGACG
             GTCACCTCCC ATTGCGCAGC AGATCGCTAG GCTCTTTAGC CAGGCGGACG
             GACACCTTCC AGGGCGCAGC AGATCGCGAG GCTTTCTAAC CAGGTGGACG
             GACACCTTCC AGGGCGCAGC AGATCGCGAG GCTTTCTAGC CAGGTGGACG
             GTCACCTCCC AGGGCGCAGA AGATCGCGAG GCTCTCCAGC CAGGTGGACG
             GTAACCTCCC AGTCCGCAGA AGATCGCGAG GCTCTCCAGC CAGGGGGACG
  This is an alignment format.
  The word "CLUSTAL" is on the first line of the file. The alignment
  is written in blocks of a fixed length. Every block starts with the sequence
  names (maximum of 10 characters), followed by at least one space character.
  The sequence is then displayed in upper or lower cases, "-" denotes
  gaps. 
A count of the total number of residues may be shown at the end of the line.
Below each block of residues, an additional line shows the degree of conservation for each site.
CLUSTAL W (1.83) multiple sequence alignment
aboA            -NLFV-ALYDFVASGDNTLSITKGEKLRV-------LGYNHNG-------EWCEA--QTK 42
ycsB            KGVIY-ALWDYEPQNDDELPMKEGDCMTI-------IHREDEDEI-----EWWWA--RLN 45
pht             -GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFSDGQEARPEEIGWLNGYNETT 59
vie             ---------DRVRKKSG--AAWQGQIVGW---------YCTNLTP----EGYAVESEAHP 36
ihvA            ------NFRVYYRDSRD--PVWKGPAKLL---------WKGEG-------AVVIQ---DN 33
                             .         *                                    
aboA            NGQGWVPSNYITPVN------ 57
ycsB            DKEGYVPRNLLGLYP------ 60
pht             GERGDFPGTYVEYIGRKKISP 80
vie             GSVQIYPVAALERIN------ 51
ihvA            SDIKVVPRRKAKIIRD----- 49
                .     *              
Nexus format can be used for multiple sequences, alignments, distance matrices and trees. It starts with "#NEXUS". It has been detailed in Maddison & al., NEXUS: An Extensible File Format for Systematic Information.
#NEXUS BEGIN TAXA; DIMENSIONS ntax=5; TAXLABELS 1aboA 2ycsB 3pht 4vie 5ihvA; END; BEGIN UNALIGNED; DIMENSIONS ntax=5; FORMAT datatype=Protein gap=-; MATRIX 1aboA NLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVN 2ycsB KGVIYALWDYEPQNDDELPMKEGDCMTIIHREDEDEIEWWWARLNDKEGYVPRNLLGLYP 3pht GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFSDGQEARPEEIGWLNGYNETTGERGDFPGTYVEYIGRKKISP 4vie DRVRKKSGAAWQGQIVGWYCTNLTPEGYAVESEAHPGSVQIYPVAALERIN 5ihvA NFRVYYRDSRDPVWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRD ; END;
