Audio Visual Challenge
Audio Visual Challenge
Stefan R
uger
The Open University, UK
[email protected]
Yosi Mass
IBM Research, Israel
[email protected]
Introduction
With the rising popularity of rich media services such as Flickr, YouTube, and Jumpcut, new
challenges in large scale multimedia information retrieval have emerged that not only rely on
meta-data but on content-based information retrieval combined with the collective knowledge of
users and geo-referenced meta-data that is captured during the creation process. For the future,
it is envisioned that multimedia search in mobile environments or on P2P networks will take off
on a large scale.
This workshop followed four previous SIGIR workshops on multimedia information retrieval
(1998, 1999, 2003, 2005), and aimed to address and explore new challenges in multimedia information retrieval by bringing both researchers and practitioners together. We encouraged submission
and participation in this workshop not only from the core information retrieval community but
also from researchers in databases, multimedia and image processing thus cross-fertilizing to information retrieval research.
In response to the call for papers, 16 submissions were received. Each submission was reviewed
by at least three members of the program committee. At this point, we would like to thank the
members of the Program Committee for their contribution to this workshop. Based on their
recommendations the workshop organizers have selected the 7 best scoring articles for publication
and presentation at the workshop. The articles are grouped around three themes: Querying
multimedia, Content-based multimedia retrieval, and Social media mining and meta-data. For
presenting the article, each author had a time-slot allocated of 25 minutes, of which 5 minutes
were reserved for discussion.
Prior to the regular paper sessions, a keynote speech was given by Dr Wei-Ying Ma on the
topic of The Challenges and Opportunities of Mining Billions of Web Images for Search and
Online Applications.
To further stimulate the interactive character of the workshop we decided to allow each participant of the workshop to propose a subject about which he/she could talk for a maximum of
10 minutes during the Speakers Corner. The output of the Speakers Corner was used to organize the direction of the brainstorm session, which allowed for the discussion of new challenges in
Multimedia Information Retrieval.
In the remainder of this report, we summarize the event following the order of the workshop
program. All articles, and the full proceedings are available online at http://www.yr-bcn.es/
events/mir2007-workshop/.
The keynote speech, entitled The Challenges and Opportunities of Mining Billions of Web Images for Search and Online Applications, was delivered by Dr Wei-Ying Ma. He is a principal
researcher at Microsoft Research Asia, and is based in Beijing. As a Research Area Manager, he
leads a team of talented, passionate researchers to advance the state-of-the-art in Web search and
data mining.
In his speech, he argued that although content-based image retrieval has been studied for
decades, most commercial search engines still rely on text information to index Web images.
This is because of the many fundamental limitations in current content-based image retrieval
technologies when applied to Web-scale data and a lack of business incentives to rely on image
content for online advertising. In this talk, he discussed the technical hurdles that exist when
we attempt to build systems to analyze and index billions of Web images based on content. He
presented several important Internet applications that have the potential to take off and make
significant impacts if we find ways to overcome existing technical hurdles.
Speakers Corner
In response to the call for contributions for the speakers corner 6 participants stood up and took
the opportunity to share their thoughts in front of the workshop participants.
Jussi Karlgren - Use cases in multi-media information access
Jussi Karlgren presented the CHORUS coordination action and argumented for the informed creation of USE CASES as a focus for service design, system development, and algorithm evaluation.
He stated that evaluation is dear to our hearts, all of ours, really, if we come to a SIGIR conference. We know about relevance assessment, precision, and recall... but how these arguably
bloodless target metrics are translated to not-only-reliable-but-valid-system evaluation exercises
is non-trivial. Especially so for systems in new areas such as multimedia retrieval, where the usage
scenarios may be less obviously patternable to previous tradition.
Mor Naaman - Social Media: Changing the image of multimedia
Mor Naaman pleaded that the advent of media-sharing sites like Flickr and YouTube has drastically increased the volume of community-contributed multimedia resources available on the web.
These collections have a previously unimagined depth and breadth, and have generated new opportunities and new challenges to multimedia research. How do we analyze, understand and
extract patterns from these new collections? How do we use this analysis to improve current
applications and introduce new ones? These questions were discussed and a demo was shown.
Laura Hollink - Bringing semantics into multimedia retrieval
As stated by Laura Hollink, two very different research fields are now slowly growing towards each
other: semantic web and (content-based) image retrieval. This combination has the potential
to improve retrieval performance and open up new ways of searching. Large bodies of structured knowledge are available on the web. Annotations of images, eg, tags, manual annotations,
automatically detected concepts etc, can be linked to these ontologies and take advantage of
its semantics: queries can be answered even though no directly matching annotation was found;
relations between tags can be used for browsing.
Yosi Mass - Search in audio-visual content using P2P information retrieval
As the coordinator of the SAPIR european project, Yosi Mass discussed that today, Web searches
are dominated by search giants such as Google, Yahoo, or MSN that deploy a centralized approach to indexing and utilize text-only indexes enriched by page rank algorithms. Consequently,
while it is possible to search for audio-visual content, the search is limited to associated text and
metadata annotations. Supporting real content-based, audio-visual search requires media-specific
understanding and extremely high CPU utilization, which would not scale in todays centralized
solutions. He claimed that large-scale, distributed P2P architectures will make it possible to search
audio-visual content using the query-by-example paradigm.
Stefan R
uger - Advertisement
Stefan announced that the Knowledge Media Institute of The Open University is hiring talented
researchers to address new challenges in audiovisual search.
Mark Sanderson - External meta-data to enhance multimedia search
Mark Sanderson pointed out that the problems of multimedia search can be mitigated by use of
externally gathered metadata. Images geo-referenced with GPS data for example can lead to a
number of additional improvements. For example, researchers at Berkeley have queried weather
databases to determine the conditions at the time and place where a photograph was taken to
guess quite accurately what weather conditions are pictured in the photograph.
Brainstorm Session
Following the trend that was set during the speakers corner, the brainstorm session was used
to clarify many of the open issues and controversial statements that were brought in by the
participants. The brainstorm was highly interactive and according to the participants, a good
way to wrap up the workshop, which was perceived by the participants as a highly successful
event.
Acknowledgments
We would like to thank the members of the program committee for their contributions to the
material. We also extend our sincere thanks to SIGIR, to the keynote speaker, all the paper
presenters, to the speakers of the Speakers Corner and all 36 participants, who jointly made this
workshop an outstanding workshop.
This workshop was supported by the European Community under the Information Society
Technologies (IST) priority of the 6th Framework Programme for R&D projects: SEMEDIA
(IST-FP6-045032), PHAROS (IST-FP6-045035), Tripod (IST-FP6-045335), and SAPIR (ISTFP6-045128).