Research Objectives

In my research projects, I proposed robust statistic model to leverage image, text and user-created tags to enhance the performance of image annotation and retrieval. Specifically, three research questions will be addressed in my research:

·         How to achieve more robust and effective image representations to bridge over the “semantic gap”?

In automatic image annotation and retrieval, how to bridge over the “semantic gap” [Smeulders 2000] between image features and high-level semantic meanings is a major challenge.  Specifically, it requires identifying a set of image features that well preserve the semantic consistency of image content. State-of-the-art image representation approaches either represent image content by its global spatial layout [Henderson and Hollingworth 1999], [Oliva and Torralba 2001], [Siagian and Itti 2007], or represent image by saliency model (such as salient part and key-points) [Matas 2002], [Sivic 2003], [Lowe 2004], [Yang 2007], [Zhang 2007], [Jiang 2007], [Chen 2009], yet either approach has its advantages and drawbacks. In our approach, instead of treating these two approaches separately, we utilize the saliency model (salient regions and key-points) as a complement part of spatial layout model. Our motivation comes from the fact that the mechanism of human visual perception allows for very rapid holistic image analysis to provide a coarse context of image scene (spatial layout model), yet it also give rise to a small set of candidate salient locations in a scene (saliency model) that needs to be intensively studied.

·         How to make image content associated with text descriptions?

The second step of the problem is to uncover latent semantic topics from the co-occurrence patterns of image content and corresponding text descriptions. High quality text descriptions of images play a vital role as training and benchmarking data in developing and evaluating an automatic image annotation system.  So the first issue of this research question is concerned with building the benchmark dataset for the purpose of training a automatic image annotation and retrieval system. In our approach, we propose to enrich the image hierarchy in ImageNet dataset with comprehensive text description from Wikipedia. The second issue is proposing an effective model to study the correlation between image and text descriptions. In the data mining and information retrieval community, there has been a long time focus on using probabilistic topic models to study the correlation between image and text descriptions. Specifically, the Correspondence LDA (CorrLDA) model [Blei 2003], which is initially proposed for automatic image annotation, provides a natural way to learn latent topics from text word and other entities (such as image features). This model enforces great degree of correspondence between word and entity topics. It first generates latent topic for each text word, resulting in a document-level mixture of word topics; then replicates itself as the composition of entity topics, which is used to supervise the generation of associated entities, resulting in a direct connection between word and entity topics. Most recent extensions of CorrLDA model, including sophisticated correspond topic models that extend to different kinds of entities (such as protein entities [Ahmed 2009], visual words, and ontology-based biomedical concepts [Chen  2009]), still follow a similar generative process as the prototype CorrLDA model. Based on the CorrLDA model, we propose a Probabilistic Topic-Connection (PTC) model, which enables more effective and robust modeling of the co-existing image features and annotations. 

·         How to involve the contextual information into the final image annotation and retrieval system?

Online photo sharing websites such as Flickr.com allow users to create tag for images. Although user-created image tags tend to be noisy in the sense that they might not directly relate to the image content, and typically only a few of many possible tags have been added to each image, however, despite the noisy relation between tags and image content, they are still a useful additional feature for user to share, organize and retrieve the images. Among users of image sharing website, the one who tags images are usually image searchers at the same time. Popular tags applied to similar visual content tend to reflect the actual consensus of users about the resources (that means high probability of agree on the same tag in the same scenario); while other tags reflects users’ special perspective.

 


Research Projects

My research plan centers on the design and development of effective data mining algorithms for image annotation and retrieval. The proposed research work will also consist of a set of novel models, methods, technique and algorithms to represent image content integrate user context information into the model. This section also ddresses related work, research issues and tasks, technical approach and evaluation in real-world applications and in bioinformatics research.

\
Perspective Hierarchical Dirichlet Process for User-Tagged Image Modeling
by X. Chen, X. Hu, Y. An, Z. Xiong, T. He and E.K. Park
project page, CIKM 2011
A Probabilistic Topic-Connection Model for Automatic Image Annotation
by X. Chen, X. Hu, Z. Zhou, C. Lu, G. Rosen, T. He and E.K. Park
project page, CIKM 2010
Probabilistic Models for Topic Learning from Images and Captions in Online Biomedical Literatures
by X. Chen, C. Lu, Y. An and P. Achananuparp
project page, CIKM 2009
Spatial Weighting for Bag-of-Visual-Words and Its Application in Content-Based Image Retrieval
by X. Chen, X. Hu and X. Shen
project page, PAKDD 2009
 
Inferring Functional Groups from Microbial Gene Catalogue with Probabilistic Topic Models
by X. Chen, T. He, X. Hu, Y. An and X. Wu
project page, BIBM 2011,
 
Scene Understanding from Ground View Videos
by X. Chen, Z. Zhou
project page,