marcottelab/Gene-Ages

Name: Gene-Ages

Owner: The Marcotte Lab

Description: Consensus ages for genes

Created: 2016-03-01 19:06:25.0

Updated: 2017-10-31 17:32:52.0

Pushed: 2016-05-19 22:40:36.0

Homepage: null

Size: 11629

Language: Jupyter Notebook

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Gene-Ages

A consensus approach to estimating gene ages for model organisms

DOI This repository contains scripts, data, and ipython notebooks in support of our manuscript:

“Towards Consensus Gene Ages” Benjamin J. Liebeskind, Claire D. McWhite, and Edward M. Marcotte

If you use any of the information or code in this repository, please cite the paper above.

How old is my gene?

If you study a certain gene or gene family, you might be interested in knowing how many other organisms share orthologs of that gene. Or maybe you want to annotate genomic datasets with gene ages to get an idea of how deep in evolutionary time different pathways were assembled. If so, look no further!

We took orthology calls from 13 popular orthology inference algorithms and estimated consensus gene ages for a variety of model eukaryotes. Because orthology inference is notoriously difficult, we also annotated our datasets with various error terms so that you can propagate uncertainty through your downstream analyses.

Organisms with gene-age information

You can find consensus tables for the following organisms in the Main/ directory. They are named main_.csv

| Common Name | Uniprot ID | | ————— | ————– | | Anopheles gambiae (Mosquito) | ANOGA | | Bos taurus (Cattle) | BOVIN | | Branchiostoma floridae (Lancelet) | BRAFL | | Caenorhabditis elegans (Worm) | CAEEL | | Candida albicans | CANAL | | Canis lupus familiaris (Dog) | CANFA | | Gallus gallus (Chicken) | CHICK | | Ciona intestinalis (Tunicate) | CIOIN | | Cryptococcus neoformans | CRYNJ | | Danio rerio (Zebrafish) | DANRE | | Drosophila melanogaster (Fly) | DROME | | Homo sapiens (Human) | HUMAN | | Ixodes scapularis (Tick) | IXOSC | | Macaca mulatta (Rhesus macaque) | MACMU | | Monosiga brevicollis (Choanoflagellate) | MONBE | | Monodelphis domestica (Opossum) | MONDO | | Mus musculus (Mouse) | MOUSE | | Nematostella vectensis (Sea anemone) | NEMVE | | Neurospora crassa (Bread mold) | NEUCR | | Ornithorhynchus anatinus (Platypus) | ORNAN | | Pan troglodytes (Chimp) | PANTR | | Phaeosphaeria nodorum (Wheat fungus) | PHANO | | Rattus rattus (Rat) | RAT | | Saccaromyces cerevisiae (Budding yeast) | YEAST | | Schistosoma mansoni (Blood fluke) | SCHMA | | Schizosaccharomyces pombe (Fission yeast) | SCHPO | | Sclerotinia sclerotiorum (White mold) | SCLS1 | | Takifugu rubripes (Pufferfish) | TAKRU | | Ustilago maydis (Corn smut/Huitlacoche | USTMA | | Xenopus tropicalis (Frog) | XENTR | | Yarrowia lipolytica | YARLI |

Age-categories

These files contain the following information:

Error statistics

| Name | Description | | —- | ———– | | NumDBsContributing | How many databases/algorithms contribute to final estimate. More is better | | NumDBsFiltered | How many databases/algorithms were trimmed out. Less is better | | entropy | Shannon's entropy over final age-call distribution. Lower is better | | NodeError | Average patristic distance between age calls before filtering. Lower is better | | Bimodality | How bimodal the age call is (see manuscript). Lower is better | | HGT_flag | Whether or not this gene was flagged as being a recent horizontal gene transfer |

Replicating the analysis

We provide scripts and ipython notebooks if you're interested in replicating the analysis or running again with some different parameters. You should run the code using scripts in CannedScripts/. Raw orthology prediction alignments are available from https://github.com/qfo/OrthologTables/. Here's a flowchart to show how the scripts in CannedScripts/, some of the ipyton notebooks in Notebooks/, and the output files in Data/ are all related:

Flowchart

And because you can now add emojis to GitHub markdown, I will :black_joker:


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.