Georg Steinert

marine ecology and systematics

RAxML Tutorial

Introduction

This is a very short example of how to run a RAxML analysis. Keep in mind that this is not a detailed guide. It is much more like a personal note about my approach working with RAxML.

Useful links and additional informations:

The Exelixis Lab

RAxML 7.0.4. Manual

RAxML Google Group

raxmlGUI - a graphical front-end for RAxML

wxRAxML - another graphical front-end for RAxML

Comprehensive description of RAxML commands

Bodega Phylogenetics Wiki - RAxML Tutorial

In contrast to MrBayes and the helpful MrBayes Block at the end of the Nexus file, RAxML is solely a command-line based tool for phylogenetic analysis (except for the two GUI addons - see links above). In this short tutorial I will describe the installation of different RAxML version under Linux (Ubuntu - not to be confused with Ubuntu). Furthermore, I will explain step-by-step how to analyse a sequence data-set and how to set up a partition file for multiple models of substituion.

Installing RAxML

After downloading (RAxML 7.2.8. from the Elexis Lab Homepage) and unpacking the zip file to the folder of your choice you are able to run several commands in the terminal window to compile different versions of RAxML:

#without SSE3 intrinsics

make -f Makefile.gcc

#with SSE3 intrinsics

make -f Makefile.SSE3.gcc

#Pthreads version without SSE3 intrinsics

make -f Makefile.PTHREADS.gcc

#Pthreads version with SSE3 intrinsics

make -f Makefile.SSE3.PTHREADS.gcc

#moreover, there are MPI, SSE3.MPI, and hybrid MPI/Pthreads (with and without SSE3) version available:

make -f Makefile.MPI.gcc

make -f Makefile.SSE3.MPI.gcc

make -f Makefile.HYBRID.gcc

make -f Makefile.SSE3.HYBRID.gcc

If you have desktop computer or notebook with at least two cores (which is very common) you should compile the SSE3.PTHREADS version. This should speed up the computation time by using the SSE3 intrinsics and all the available cores of your CPU (if set). After compiling this version you should have a file with the name "raxmlHPC-SSE3-PTHREADS" or something like that. Additionally, you can rename the file (e.g. "raxml") in order to save time while typing the RAxML commands.

Setting up the analysis

In the following example I will use the imaginary roseobacter.phy Phylip alignment. RAxML only works with Phylip files, but this should be no problem. You can easily use Seaview or other alignment tools to export your Fasta or Nexus files to the necessary Phylip format.

Once the raxml application and the roseobacter.phy alignment are within the same folder, you open the terminal and go the directory with the two files. Now, you can run the following command:

./raxml -s roseobacter.phy -n roseobacter_boot -m GTRGAMMA -x 1979 -f a -o OG_AY75694 -T 2 -#1000 > roseobacter_log.txt

That's it. Depending on your alignment, RAxML version, and number of bootstraps the analysis will run for a period of time.

Let's take a look at the different commands:

./raxml - this is the application file.

-s roseobacter.phy = specifies the name of the sequence file, in this example roseobacter.phy

-n roseobacter_boot = specifies the name of the output tree file, in this example roseobacter_boot

-m GTRGAMMA = specifies the analysis model. There is a huge variety of options, and you will find further informations about the different models here and in the slightly outdated manual of version 7.0.4 (please read the note about some differences between 7.2.8 and 7.0.4 on phylo.org)

-x 1979 = specifies an integer number (the random seed) and turns on rapid bootstrapping

-f a = specifies the algorithm. Again, you have many options. In this case I applied the rapid bootstrap analysis with the subsequent search for the best-scoring ML tree.

-o OG_AY75694 = specifies the name of a single outgroup. You can also separate more than one outgroup by commas. In this case the name of the outgroup was OG_AY75694.

-T 2 = specifies the number of threads you want to run. This is vor the PTHREADS version only. For example, if you have a CPU with two cores, you can run a maximum of two threads at the same time.

-#1000 = specifies the number of replicates.

> roseobacter_log.txt = specifies the name of you log file.

When the analysis is finished, you will have some analysis files:

RAxML_info.roseobacter_boot = analysis informations

roseobacter_log.txt = same as the analysis informations

RAxML_bootstrap.roseobacter_boot = all bootstrap trees

RAxML_bestTree.roseobacter_boot = best-scoring ML tree

RAxML_bipartitions.roseobacter_boot = best-scoring ML with support values - which is the most interesting result to me

Now, just rename the file the best-scoring file to "RAxML_bipartitions.roseobacter_boot.tre" and/or simply open the tree-file(s) with Figtree, Treeview, or Treeview X - and the finished tree could look like this:

raxml ML tree

Individual per-partition branch lengths

If you want to run an anylsis on a partioned dataset, like the the Mytilus dataset from my MrBayes tutorial, you only need to set up a separate partion file. In this example we will separate the Mytilus sequence alignment (1259 bps length) into two gene fragments COI and VDI. Therefore, you use your text editor and write into the empty file:

DNA, coi=1-399

DNA, vd1=400-1259

Then, you save your file (e.g. "mytilus_partition") and add the following commands to the command-line to run a partitioned analysis:

-q mytilus_partition = specifies the name of the parition file

-M = estimation of individual per-partition branch lenghts - the branch lenghts for the individual paritions will be saved to separate files, and a weighted average of the branch lenghts will be computed by using the two partition lenghts

Additionally, you can also specify the codon positions. In our example the partition file should look like this:

DNA, codon1codon2 = 1-1259\3,2-1259\3

DNA, codon3 = 3-1259\3

That's it. If you have any questions or comments, please feel free to contact me.

Home

Research