Student Presentations |
---|
Date: November 14, 2006 Speaker: Samuel Angiuoli Abstract: Numerous ontologies have taken hold across the domain of biology and medicine providing data models for describing a complex array of data types. This presentation provides a survey of 4 popular ontologies that are under active development in different domains of biomedicine and bioinformatics. The scope, design, and management of these ontologies will be presented. The Gene Ontology and Sequence Ontology are used for genome annotation and have been developed through an open collaborative process with extensive community involvement. These ontologies are now managed by the newly formed National Center for Biomedical Ontology, which is a providing standards, tools, and a repository for community built ontologies. Another community built ontology, is the MGED ontology, developed by the MGED consortium, for describing microarray experiments. The MGED ontology describes experimental design, protocols, and biomaterials and follows the MAGE data model. In the domain of biomedicine, the UMLS integrates a number of ontologies for biomedical and health related concepts and is built and managed by the NLM. The Metathesaurus within UMLS, provides relationships between terms from a number of established ontologies including SNOMED, ICD-9-CM, and MeSH. By providing a mapping between source ontologies, the UMLS attempts to integrate existing ontologies, identify synonymous concepts, and provide a common data format, while preserving the content and structure of the source ontology. Reading: Optional: Date: November 14, 2006 Speaker: Elena Zheleva Title: Mapping and Linking of Ontologies Abstract: Ontology mapping and alignment is necessary in order to provide interoperability between data contributed by independent sources. Lexical analysis provides a tool for discovering two entities from the same or different ontologies, which refer to the same concept. I will present an overview of the efforts going at the National Center for Biological Ontologies related to mapping and aligning of ontologies, and an evaluation of three lexical methods for ontology mapping. Reading: Date: November 21, 2006 Speaker: Asad Sayeed Title: Machine learning and text mining approaches to data integration in bioinformatic contexts Abstract: In my talk, I will present the major points of two papers containing applications of machine learning and text mining techniques to biological databases with an eye towards data integration. The first paper consists of a discussion of a system that labels records from protein databases with Gene Ontology codes. This system was submitted for the BioCreative text mining evaluation event and used simple text-mining techniques, such as n-gram models. The second paper delves more deeply into the nitty-gritty of biological applications, using support vector machines trained on multiple datasets in order to predict the relationships between transcription factors and binding sites. Reading: Optional: Date: November 21, 2006 Speaker: Inbal Yahav Title: Text mining in life science Abstract: Biomedical text plays a fundamental role in knowledge discovery in life science. Although information retrieval or text searching is useful, it is not sufficient to find specific facts and relations. Managing the increasing volume, complexity and specialization of knowledge expressed in these texts is therefore very challenging. In my presentation I will mainly discuss the challenges, method and architecture of Unstructured Information management, as offered by an IBM research group. Reading: Date: November 28, 2006 Speaker: Michael Schatz Title: Managing SNP data and the HapMap Project Abstract: The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention. Reading: Date: November 28, 2006 Speaker: Yao Wu Title: Ranking target objects for gene queries Abstract: Web navigation plays an important role in exploring public interconnected data sources such as life science data. A navigational query in the life science graph produces a result graph which is a layered directed acyclic graph (DAG). Traversing the result paths in this graph reaches a target object set (TOS). The challenge for ranking the target objects is to provide recommendations that reflect the relative importance of the retrieved object, as well as its relevance to the specific query posed by the scientist. We present a metric layered graph PageRank (lgPR) to rank target objects based on the link structure of the result graph. LgPR is a modification of PageRank; it avoids random jumps to respect the path structure of the result graph. We also outline a metric layered graph ObjectRank (lgOR) which extends the metric ObjectRank to layered graphs. We then present an initial evaluation of lgPR. We perform experiments on a real-world graph of life sciences objects from NCBI and report on the ranking distribution produced by lgPR. We compare lgPR with PageRank. In order to understand the characteristics of lgPR, an expert compared the Top K target objects (publications in the PubMed source) produced by lgPR and a word-based ranking method that uses text features extracted from an external source (such as Entrez Gene) to rank publications. Reading: Optional:
|