Welcome!
The Medbio Lab hosted the Family Physicians Inquiries Network (FPIN) Information System Delivery Meeting between March 25 and March 27, 2007. Cerner Corporation and MedBio DL Lab are initiating a project in researching integration of content-based medical image retrievals with electronic medical records. The Shumaker Graduate Fellowship is accepting applications. Please contact Dr. Shyu for detailed information. Four Lab members presented their research at AMIA 2006 between 11/13/06-11/22/06.
|
|
 |
|
Knowledge-Driven High-Dimensional Indexing for Content-Based Retrieval
Knowledge-driven Content-based High-dimensional Indexing
In today’s digital age, digital cameras, digital medical imaging, bioinformatics, biometrics and other sources are providing an ever-expanding source of digital content (media) for databases. When accessing media databases, it is no longer sufficient to limit users to searching meta-data, such as names, dates of acquisition, and other textual information. The state-of-the-art now relies on developing new methods for content-based retrieval. More precisely, given a database of a specialized media, such as high-resolution computed tomography (HRCT) scans of lungs, we must be able to supply an example lung scan as a query method to retrieval similar images. This content-based image retrieval (CBIR) is just one example of the more general content-based retrieval methods that we are developing. In addition to HRCT lung images, we have developed and applied content-based retrieval methods to protein tertiary structures and high-resolution satellite imagery. Content-based access to non-textual data sources continues to be a challenging research area that spans numerous fields such as computer vision, pattern recognition, databases, indexing, image processing, information retrieval and others.
We are furthering the development of content-based retrieval by exploiting domain knowledge associated with media in the database. For example, HRCT lung scans have associated diagnosis, protein tertiary structures have structural classification from SCOP, and regions-of-interest (ROI) in satellite imagery have identifiable visual elements. When building content-based indexes over a database of HRCT images, every database object has an assigned diagnosis. This diagnosis is domain-specific knowledge encoded by a domain expert (radiologist). Our HRCT lung content-based retrieval system (WebHIQS), exploited this full domain knowledge to achieve fast and accurate retrievals from the database. In other domains of content-based retrieval, it is not feasible to have complete domain knowledge of all database objects. For this reason, the initial version of ProteinDBS exploited partial knowledge of a small portion of the database to estimate knowledge of the remaining database objects. Given both the partial knowledge, and the estimate knowledge, all the database objects were indexed, leading to the world’s first real-time protein tertiary structure retrieval engine. Finally, in the extreme case, database content knowledge may be too high-level to sufficiently encode for indexing purposes. Our satellite imagery content-based retrieval system, GeoIRIS, contains over 43GB of high-resolution panchromatic and multi-spectral imagery. Regions of interest in the imagery may simultaneously contain forested areas, roadways, manmade objects, and other visual-perceptual features. Not all permutations of the visual contents can be sufficiently enumerated as domain knowledge, so we exploit well-developed methods of automatic clustering of content in the high-dimensional space. These cluster assignments then provide synthetic domain knowledge, enabling our fast and efficient content-based indexing.
To achieve the exploitation of domain knowledge in content-based indexing, we have developed the Entropy Balanced Statistical (EBS) k-d tree. The EBS is specifically designed as a large-scale multi-dimensional index for content-based retrieval. The tree grown over a database using decision tree induction, but contains a dual induction constraint of reducing entropy from parent to children as well as seeking relative balance in the entropy of the children. Our motivation is avoid greedy decisions which sacrifice the entropy of one node for exceptional gain in the other. Furthermore, we have developed a complete file structure that is conducive to generating large ranked result sets from a query, navigating through the leaf structure in the high-dimensional space to increased retrieval precision. As a derivative of the EBS, we have developed the Entropy Balanced Bitmap (EBB) tree. This index allows us to efficiently index over 800,000 objects that have been extracted from high-resolution satellite imagery and encoded into bitmaps of 32 x 32 bits. By comparison, if we attempted to use traditional bitmap indexing, the number of leafs would number 232 and would require 232 - 1 internal nodes. This is obviously not efficient as our number of objects is currently less than 220.
In addition to the core content-based indexing, numerous supporting technologies have been developed:
- Various image processing & feature extractions for different media content
- Clustering and Partial knowledge clustering approaches
- Supporting content-based retrieval infrastructure
- Novel multi-object spatial relationship extraction and representation for indexing
|
|
|
|
|
- Publications:
Pin-Hao Chi, Grant Scott, and Chi-Ren Shyu. A fast protein structure retrieval system using image-based distance matrices and multidimensional index, in Proc. of IEEE Fourth Symposium on Bioinformatics and Bioengineering, Taichung, Taiwan 2004
Pin-Hao Chi, Grant Scott, and Chi-Ren Shyu. A fast protein structure retrieval system using image-based distance matrices and multidimensional index, in International Journal of Software Engineering and Knowledge Engineering, Vol. 15, No. 3 , Special Issue on Software and Knowledge Engineering Support in Bioinformatics 2005; 527-545
Grant Scott, and Chi-Ren Shyu. EBS k-d Tree: An Entropy Balanced Statistical k-d Tree for Image Databases with Ground-Truth Labels, in Proc. of the International Conference of Image and Video Retrieval 2003
Chi-Ren Shyu, Pin-Hao Chi, Grant Scott, and D. Xu. ProteinDBS - A content-based retrieval system for protein structure database, in Nucleic Acids Research, Vol. 32, July 2004; W572-W575
Grant Scott, Matt Klaric, and Chi-Ren Shyu. Modeling Multi-Object Spatial Relationships for Satellite Image Database Indexing and Retrieval, in Lecture Notes in Computer Science (LNCS), Vol. 3568, Singapore, July 2005; 247-256
Grant Scott, and Chi-Ren Shyu. Knowledge Driven Multidimensional Indexing Structure for Biomedical Media Database Retrieval, in IEEE Transactions on Information Technology in Biomedicine, Vol. 11, No. 3 , May 2007; 320-331
Chi-Ren Shyu, Matt Klaric, Grant Scott, Adrian Barb, Curt Davis, and Kannappan Palaniappan. GeoIRIS: Geospatial Information Retrieval and Indexing System - Content Mining, Semantics Modeling, and Complex Queries, in IEEE Transactions on Geoscience and Remote Sensing, Special Issue on Image Mining, Vol. 45, No. 4 , April 2007; 839-852
back to full list
|
|