Bio Info SIG: Using Genome Annotations for Comparative Genomics



  • Abstract:

    Whole genome sequencing and annotation has now given us a fairly complete set of genes for many organisms. The systematic annotation of the human genome, for example, has produced around 25,000 complex gene-models consisting of introns, exons, 5' & 3'-UTRs, alternatively spliced transcripts, and more. How best to package and manipulate these annotations in-silico, however, remains an open question. One thing, however, is certain: the traditional in-silico representations of genes - protein & transcript fasta files - are poor containers for these complex data and place undesirable limitations on their analysis. How best to produce portable, machine-readable descriptions of genome annotations is no trivial matter, but turning genes into an electronic commodity that can be easily distributed to third parties is essential for the growth of bio-tech and academia alike.

    There are currently several academic projects underway whose goal is to facilitate the production of machine-readable descriptions of genome annotations, and I'll touch upon these, but I'll focus primarily on a software library that we have developed at the Berkeley Drosophila Genome Project for manipulating genome annotations. I'll explain how we are using this code to investigate the evolution of gene-structure, particularly the evolution of introns. I'll also discuss some more practical applications and describe how we are using this code to help us take genome annotations 'to the bench' and our attempts to automatically design wet-lab experiments using them.


    About the Speaker:

    Mark Yandell is a Senior Scientist at the Berkeley Drosophila Genome Project (BDGP), where he leads a team of researchers applying in-silico and wet-bench techniques to comparative genomics. Prior to joining BDGP, Mark was at Celera Genomics, where he wrote much of the software used by Celera to annotate and analyze the drosophila, human, mouse and mosquito genomes.

    Before joining Celera he was at the Genome Sequencing Center at Washington University, where he pursued post-doctoral studies in computational biology, genome annotation and SNP discovery. Mark received his PhD in Molecular, Cellular and Developmental Biology from the University of Colorado, Boulder.


    Event Logistics:


    Hanson Bridgett
    333 Market Street, 21st Floor
    San Francisco, CA, 94105

    Note: Location provided courtesy of Hanson, Bridgett, Marcus, Vlahos, Rudy, LLP.


    6:30 - 7:00 p.m. Registration and Networking
    7:00 - 9:00 p.m. Presentation


    $15 at the door for non-SDForum members
    No charge for SDForum members
    Please call 408.494.8378 for student memberships
    No charge for TiE Members during November and December
    No registration required