Building Scalable and Smart Multimedia Applications on the Semantic Web

Doctoral Thesis / Dissertation, 2009

277 Pages, Grade: 1,0



I Scope and Foundations

1 Introduction
1.1 Motivation
1.2 Problem Definition
1.2.1 Performance and Scalability Issues in Distributed Metadata Sources
1.2.2 Efficient and Effective Representation of Multimedia Metadata
1.2.3 Scaleable Multimedia Metadata Deployment on the Semantic Web
1.3 Reader’s Guide
1.4 What this work is NOT about

2 Related and Existing Work
2.1 Semantic Web Applications
2.1.1 Projects & Activities
2.2 Multimedia Applications
2.2.1 Smart Multimedia Content
2.2.2 Multimedia Metadata Deployment
2.2.3 Semantic Web Multimedia Applications—SWMA
2.2.4 Projects & Activities
2.3 Scalability and Expressivity
2.3.1 Infrastructure Level
2.3.2 Application Level
2.3.3 Projects & Activities

3 Multimedia Metadata
3.1 Multimedia Container Formats
3.1.1 eXtensible HyperText Markup Language–(X)HTML
3.1.2 Scalable Vector Graphics–SVG
3.1.3 Synchronized Multimedia Integration Language–SMIL
3.1.4 eXtensible 3D—X3D
3.2 Aspects of Multimedia Metadata
3.2.1 An Attempt of A Definition
3.2.2 Types of Metadata
3.2.3 Scope of Metadata
3.3 Multimedia Metadata Formats
3.3.1 Metadata for Still Images
3.3.2 Metadata for describing Audio Content
3.3.3 Metadata for describing Audio-Visual Content
3.3.4 Multimedia Content Description Interface–MPEG-
3.3.5 Formats For Describing Specific Domains Or Workflows
3.3.6 Interoperability

4 Semantic Web
4.1 Logic and the Semantic Web
4.1.1 Knowledge Representation
4.1.2 Description Logics (DL)
4.1.3 Logic Programming (LP)
4.1.4 Integrating DL and LP
4.2 Semantic Web Vision
4.2.1 Synopsis
4.2.2 Current State
4.2.3 Future
4.2.4 Related Fields
4.3 Semantic Web Stack
4.3.1 Encoding & Addressing
4.3.2 Data Structure and Exchange
4.3.3 Data Model
4.3.4 Ontologies, Rules & Query
4.3.5 Trust & Data Provenance
4.3.6 Semantic Web Issues
4.4 Semantic Web Vocabularies
4.4.1 Generic Vocabularies
4.4.2 Social Vocabularies
4.4.3 Spatio-temporal Vocabularies
4.4.4 Other Vocabularies
4.5 Linked Data
4.6 Web
4.6.1 Web 2.0: Ajax & Mashups
4.6.2 Metadata in HTML
4.6.3 Web 2.0 + Semantic Web = Web 3.0?
4.7 Conclusion

II Methods and Requirements

5 Creating Smart Content Descriptions
5.1 Information Flow and Media Semantic Web Stack
5.2 Extraction vs. Annotation
5.2.1 Extraction
5.2.2 Annotation
5.3 How To Deal with the Semantic Gap
5.3.1 Low-level Feature Based Approach
5.3.2 Model-based Approach
5.3.3 Semantic Web Approach
5.3.4 Hybrid Approach
5.4 Multimedia Ontology Engineering
5.4.1 Methodologies
5.4.2 Ontology Engineering Tools
5.4.3 Review of Existing Multimedia Ontologies

6 Scaleable yet Expressive Content Descriptions
6.1 Introduction
6.2 Motivation and Scenarios
6.3 Requirements for the Description of Multimedia Assets
6.4 Environment Analysis: The Semantic Web
6.5 Multimedia Assets on the Semantic Web
6.6 Formal Descriptions of Multimedia Assets
6.6.1 Ontology Languages
6.6.2 Rules
6.6.3 Comparing Formal Descriptions Regarding the Requirements
6.7 Conclusions

III SWMA Engineering

7 Rational & Common Concepts
7.1 The Semantic Web Stack regarding SWMA
7.2 Design Principles and Common Concepts
7.2.1 Occam’s Razor
7.2.2 Follow-your-nose
7.2.3 Reuse & Layering
7.3 Expressivity on the Semantic Web
7.4 Scalability on the Semantic Web
7.5 Conclusion

8 A Performance and Scalability Metric for Virtual RDF Graphs
8.1 Motivation
8.2 Related and Existing Work
8.3 Virtual RDF Graphs
8.3.1 Types Of Sources
8.3.2 Characteristics Of Sources
8.4 A Metric for Virtual RDF Graphs
8.5 Conclusion
8.6 Acknowledgements

9 Media Semantics Mapping
9.1 Environment
9.2 Related Work
9.3 Media Semantics Mapping
9.3.1 Data and Metadata
9.3.2 Media Semantics
9.3.3 Spaces of Abstraction
9.3.4 Built-in rules
9.3.5 User-defined rules
9.3.6 The MSM Knowledge Base
9.4 Applying the Media Semantics Mapping
9.5 Mapping the NM2 Workflow to the Canonical Model
9.5.1 The NM2 Workflow
9.5.2 Authoring Of Non-linear Stories
9.5.3 Example NM2 Productions
9.5.4 Lessons Learned
9.5.5 The NM2 Workflow in Terms of Canonical Processes
9.6 Discussion

10 Efficient Multimedia Metadata Deployment
10.1 Motivation
10.1.1 Last Mile of Multimedia Metadata Deployment
10.1.2 Related Work
10.1.3 Design Principles
10.2 Use Cases
10.2.1 Use Case: Annotate and Share Photos Online
10.2.2 Use Case: Purchasing Music Online
10.2.3 Use Case: Describing the Structure of a Video
10.2.4 Use Case: Publishing Professional Content with Metadata
10.2.5 Use Case: Expressing and Using Complex Rights Information
10.2.6 Use Case: Detailed Description of Large Media Assets
10.2.7 Use Case: Cultural Heritage
10.2.8 Derived Requirements from the Use Cases
10.3 RDFa-deployed Multimedia Metadata
10.3.1 ramm.x Vocabulary
10.3.2 ramm.x extensions
10.3.3 Processing ramm.x Descriptions
10.4 Examples
10.4.1 Deploying a Still Image along with Exif Metadata
10.4.2 An Example from Cultural Heritage
10.5 Conclusion and Future Work

IV Conclusion and Outlook

11 Conclusions

12 Outlook
12.1 Semantic Web multimedia applications now and in 10 years time
12.1.1 Emerging Metadata
12.1.2 Advanced Annotation Techniques
12.1.3 Interactive Media
12.2 Future Work
12.2.1 Meshups and More
12.2.2 Multimedia and the Web of Data

V Addendum

A Sources
A.1 RDF Source Codes
A.1.1 Minimalistic Media Ontology Example
A.2 Program Source Code
A.2.1 Performance and Scalability Metric Showcase
A.3 Diagrams
A.3.1 Media Semantics Mapping

B Author’s Contribution
B.1 Publications
B.2 Projects
B.2.1 Media Production
B.2.2 Media Analysis
B.2.3 Other Activities
B.3 Academic Activities
B.4 Activities
B.4.1 W3C participation
B.4.2 ramm.x initiative
B.4.3 Related to MPEG-7

C Reference Material
C.1 Multimedia Ontologies
C.1.1 aceMedia Visual Descriptor Ontology
C.1.2 Mindswap Image Region Ontology
C.1.3 Music Ontology Specification
C.1.4 Kanzaki Audio Ontology
C.1.5 Core Ontology for Multimedia (COMM)
C.2 Multimedia Annotation Tools




List of Figures

1.1 A visual guide through the thesis

2.1 An examplary Semantic Web application

2.2 The Semantic Gap in Multimedia Content Description

2.3 Deploying multimedia formats on the Semantic Web

2.4 Examples of Semantic Web Multimedia Applications

2.5 Semantic Web Multimedia Applications (SWMA)

3.1 Example HTML page as a multimedia container

3.2 Example SMIL document as a multimedia container

3.3 Example X3D document as a multimedia container

3.4 A sample video clip described using MPEG-7

3.5 The MPEG-21 REL

4.1 Architecture of a KR system based on Description Logics

4.2 A sample DL knowledge base

4.3 DL and LP overlapping

4.4 The W3C Semantic Web Stack

4.5 URIs, URLs and URNs

4.6 A simple multimedia ontology in RDF-S/OWL

4.7 Social Vocabularies Orchestration

4.8 The Linking Open Data dataset at time of writing

4.9 An sample Exhibit document using JSON

4.10 Web 2.0: Ajax, Mashups and more

4.11 An overview on microformats

4.12 An exemplary tag-based environment (

4.13 An exemplary XHTML+RDFa version of a FOAF document

4.14 Human-Computer Communication Model in the Web 3.0 context

4.15 A Proposed Web 3.0 Architecture

5.1 Flow of Information in Multimedia Applications on the Semantic Web

5.2 The Semantic Web Stack in the Realm of Multimedia Applications

5.3 Extraction & Annotation Yielding High Quality Metadata

6.1 WSML Variants

7.1 The Semantic Web Stack in the context of this work

7.2 A sample RDF graph and basic RDF-S inference

8.1 The Semantic Web Stack: Focus of chapter 8

8.2 A real-world setup for a Semantic Web application

8.3 The RDF representation pyramid

8.4 PSIMeter Showcase Screenshot

8.5 PSIMeter: Metric in dependency on Query Type

8.6 PSIMeter: Metric for a fixed query

9.1 The Semantic Web Stack: Focus of chapter 9

9.2 The NM2 system architecture

9.3 The NM2 Toolkit v3.1 (July 2007)

9.4 Challenge of finding matching media assets in NM2

9.5 The Media Semantics Modelling Spaces

9.6 Transitions in A/V-essence

9.7 Applying a user-defined rule on a production ontology

9.8 The NM2 workflow

9.9 Preview a narrative of a non-linear media production

10.1 The Semantic Web Stack: Focus of chapter 10

10.2 Exif metadata visualizations

10.3 Metadata in the iTunes music store

10.4 Asset offered at BBC Motion Gallery and metadata

10.5 Textual rights information for an image offered at Getty Images

10.6 NBA content on YouTube and metadata published with it

10.7 A cultural heritage newspaper scan in an XHTML container document

10.8 Multimedia metadata deployment on the Web

10.9 The ramm.x core vocabulary at a glance

10.10 Processing ramm.x descriptions

10.11 A sample still image with embedded Exif metadata

10.12 Processing ramm.x on a still image with Exif metadata

12.1 hAudio example

12.2 Usage of RDFa in flickr

12.3 Google’s Image Labeler

12.4 International Remix

12.5 CaMiCatzee’s system architecture

A.1 NM2 core ontology

A.2 Temporal Annotations in the NM2 core ontology

List of Tables

2.1 Scalability in selected domains

3.1 Scope of Metadata regarding their Functional Type

4.1 Description Logics Axioms

4.2 Terminology for LP clause types

4.3 Dublin Core Elements

6.1 Comparison of Formal Descriptions for Media Assets

8.1 Query Types

9.1 Overview on the Media Semantics Mapping built-in rules

C.1 An overview of Multimedia Annotation Tools

List of Listings

3.1 A sample SVG markup

3.2 A sample SMIL markup

3.3 A sample X3D markup

3.4 Excerpt of an exemplary MPEG-7 document

3.5 An exemplary MPEG-21 license

4.1 A sample DL knowledge base

4.2 Example RDF statements

4.3 Example SPARQL query

4.4 A sample SKOS document

4.5 A sample FOAF document

4.6 A sample DOAP document

4.7 An excerpt of a sample JSON document

9.1 The F/C-Space mapping rule

9.2 Examplary transition rule

9.3 An exemplary user-defined rule

10.1 XHTML source code excerpt of the deployed media asset

10.2 Extracted RDF from an historical newspaper page

10.3 Querying the embedded RDF metadata of the newspaper scan

12.1 Resulting triples from the hAudio example

A.1 RDF source code of the minimal media ontology (T-Box)

A.2 RDF source code of the minimal media ontology (A-Box)

A.3 Java source code of the PSIMeter application

Part I Scope and Foundations

Chapter 1 Introduction

“Vade mecum.”

(Latin phrase)

When “a message from Chad and Steve”[1] reached the YouTube community in early Oc- tober 2006, people would ask: Why is Google going to put 1.65 billion dollar[2] on the counter? Without being in the board of Google it is hard to tell, though the core of the story is obvious: it is the multimedia, stupid !

Two fundamental types of resources are at odds on the Web: textual resources, and mul- timedia resources—or more specific, audio-visual content—such as a PNG still image, a MP3 music clip, or an AVI video clip. While for textual resources an array of research [85; 285] and tools are available[3], multimedia issues w.r.t. the Semantic Web have not yet been widely addressed. In this work we focus on multimedia resources, or—to be a bit more precisely— their description, and the respective usage of the descriptions.

1.1 Motivation

The demand for real-world applications on the Semantic Web is steadily increasing. Simul- taneously, existing Web applications handling millions of multimedia assets are starting to take advantage of Semantic Web technologies [237]. Although in the past five to ten years an increase of research activities in the media semantics area can be noticed, several core prob- lems are still not satisfactory solved. Effectively and efficiently accessing distributed data sources, dealing with the Semantic Gap in multimedia content descriptions, and deploy- ing media asset descriptions on a Web-scale; these and other related issues stemming from real-world requirements may be one of the reasons for the—still widely academic minted— reputation of Semantic Web (multimedia) applications.

Different parameters may influence the performance, and the functionality of a Seman- tic Web multimedia application (SWMA). Attempting to build such scaleable and smart applications, one has to research manifold aspects of multimedia metadata generation, rep- resentation, and consumption. A multidimensional analysis is necessary to identify the requirements for a successful utilisation of media assets on the Semantic Web.

Regarding accessing distributed data sources, it can be noted that the RDFising process has not yet been widely researched. Some practical work has been reported, such as [264]. However, performance and scalability issues were neglected by and larger so far.

Furthermore, automated understanding of multimedia content is an issue in Semantic Web multimedia applications; often referred to as the “Semantic Gap”, which is, following Smeulders [271]

... the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation.

Although substantial research efforts have been undertaken, a generic, domain-independent solution to the problem is not at hand. Understanding from a set of low-level features, such as colour, shape, etc. that these actually stand for (that is “mean”) a certain entity in a domain—for example “tree”—is a non-trivial task.

Most of the activities or projects addressing the Semantic Gap are seldom more than research prototypes, using toy data sets. While the focus is often put on the expressivity of the description, aspects as performance and scalability, extensibility, and interoperability still have not been widely addressed. Studer et. al. [284] recently claimed:

Another challenge is to manage the expressivity-scalability trade-off of reason- ing over declarative knowledge, enabling reasoning over large-scale distributed knowledge bases for suitably expressive knowledge representations. Automated knowledge acquisition will typically yield knowledge that’s uncertain—for ex- ample, fuzzy or probabilistic. Such knowledge must be represented and rea- soned with in an adequate and scalable way. As knowledge from distributed knowledge bases is aggregated, a deeper semantics can emerge, letting intelli- gent agents discover patterns across people, roles, and tasks.

This work aims at addressing the expressivity-scalability tradeoff in the realm of multime- dia applications operating on the Semantic Web. The following example illustrates, how easy one may run into troubles, when dealing with a detailed description of audio-visual content.

Example 1.1 (Low-level feature description of a media asset with RDF).

A video clip with a duration of one hour is described with MPEG-7. Several visual low-level features (F) as colour, shape, texture, etc. are extracted for a number of spatial segments (S) per key frame (K). A multimedia ontology is then used to represent the MPEG-7 descriptors formally (on basis of RDF); an average number of RDF triples is assumed for each descriptor (T D). An estimation of the resulting RDF graph size then is F · K · S · T D. Let us assume that we want to capture 10 features, some 1000 key frames may exist, 10 spatial segments are marked up, and finally 10 triples are required per descriptor. This yields a total RDF graph size of 1

million triples —just for describing the low-level features of an hour of video footage. ❢

Finally, even if the above mentioned issue were resolved, another open issue exists: The deployment of multimedia metadata along with the content in the context of the Seman- tic Web. To the best of our knowledge no proposal exists that addresses performance and scalability, as well as enabling the formal descriptions of the multimedia resources.

1.2 Problem Definition

Several issues arise when building Semantic Web multimedia applications; based on a thor- ough analysis (cf. Chapter 6) we identify three issues to be most significant regarding scala- bility and expressivity:

- performance and scalability issue in distributed multimedia metadata sources,
- efficient and effective representation of multimedia vocabularies and instances, and
- scaleable multimedia metadata deployment on the Semantic Web.

The following sections describe each of the above listed areas of research in greater detail, and formulate according research questions. The reader is invited to note that although the three selected research areas are not strongly interrelated, they have a recurrent theme: they all focus on both effectiveness and efficiency, hence the name of this thesis—Building scaleable and smart multimedia applications on the Semantic Web. While the three areas may be seen as orthogonal, they address different aspects in the design and implementation of a Semantic Web multimedia applications.

1.2.1 Performance and Scalability Issues in Distributed Metadata Sources

A Semantic Web multimedia application needs to process RDF-based metadata stemming from a range of sources. When accessing and processing distributed metadata sources on the RDF-level, the application has to deal with real-world limitations as bandwidth, down- times, etc.

While from the point of view of a Semantic Web agent it might not be of interest where the triples come from, it may—for the human user who has instructed the agent to carry out a task—well be of interest how long a certain operation takes.

Research Questions. What are the characteristics of (multimedia) data sources available on the (Semantic) Web. How can these efficiently be RDFised? Which are practical perfor- mance and scalability indicators?

Scope. For this problem, we assume that we deal with global descriptions of multimedia assets. For the performance and scalability indicator a static, a simple setup is assumed; it should be evaluate it in an multimedia Web application.

1.2.2 Efficient and Effective Representation of Multimedia Metadata

When building Semantic Web multimedia applications, the content being dealt with has to be described appropriately. In order to describe the content appropriately, a language has to meet a range of requirements. It has to be expressive to represent objects, events, and relations. There must be ways to assign descriptions to temporal and spatial segments. The granularity of the content description has to be adjustable. The language has to deal with concrete data types in all its forms (scalars, vectors, and matrices).

To find a tradeoff between expressivity and scalability, several aspects should be taken into account[4]:

- The granularity of the description usually has an impact on the size of the result- ing description. The discriminator here is that of scope: A audio clip might be de- scribed global in terms of genre (this MP3 file is a Jazz clip) or there might be a de- tailed description of the wave shape, energy, etc. for a certain time period (from time code X to time code Y the following parameters have been extracted: vector of signal parameters).
- The required inferential capabilities of the system influence the choice of the repre- sentation. If no or only simple queries are expected (return all documents that are mono and less than 1min playtime), a simple metadata format (such as ID3 for music) might be sufficient. When advanced, and even domain specific retrieval operations are on tar- get (find me contemplative scenes with at least two people in it), usually formal-grounded languages as Description Logics or rule-based languages are a good choice.
- The usage of the content that may further be differentiated into:
- number of users (limited group vs. Web-scale)
- content delivery (streaming, interactive, off-line)
- metadata deployment (embedded vs. referenced)
- access mode (broadcast vs. point-to-point)
- read-only vs. read/write
- personalisation of content

The reader is invited to note that no single language currently covers all the above men- tioned aspects. Where, e.g., MPEG-7 is a good choice for representing low-level features, it fails supporting the engineer in the modelling of high-level semantics. On the other hand, for example OWL is quite expressive but lacks built-ins for complex concrete data types (as matrices), temporal descriptions, and support for multimedia description in general[5].

Research Questions. How can (formal) descriptions of multimedia content be repre- sented effectively and efficiently? Is there a trade-off between scalability and expressivity and if yes, where?

Scope. A closed-world scenario is assumed; we focus on spatio-temporal descriptions of audio-visual material. Common multimedia metadata formats such as MPEG-7 should be taken into account.

1.2.3 Scaleable Multimedia Metadata Deployment on the Semantic Web

Many multimedia metadata formats, such as Exif or MPEG-7 are available to describe what a multimedia asset is about, who has produced it, etc. With the advent of User Generated Content—be it blogs, Wikis, etc.—a need for deploying these M3 formats in (X)HTML pages can be identified. Another motivation stems from the professional content realm. There, detailed descriptions of cross-media content is on target, along with rights-management.

Again, in the context of building Semantic Web multimedia applications, one key ques- tion regarding the deployment of the metadata is how to enable existing multimedia meta- data formats to enter the Semantic Web in order to make them accessible to Semantic Web agents capable of handling RDF-based metadata.

Research Questions. How can existing multimedia metadata formats be deployed ef- fectively and efficiently on the Semantic Web? What are the use cases?

Scope. It is assumed that reusability of exiting material should be maximised. We as- sume a prototypical implementation as sufficient as a proof of concept. The deployment description should be available as an vocabulary.

1.3 Reader’s Guide

The thesis at hand is roughly structured into five parts as shown in Fig. 1.1.

illustration not visible in this excerpt

Figure 1.1: A visual guide through the thesis.

- Part I introduces the foundations and lists existing and related work;
- Part II discusses methods and requirements regarding scaleable, yet expressive multi- media content descriptions;
- Part III addresses the three core issues of engineering Semantic Web multimedia ap- plications as of the problem definition;
- Part IV contains conclusions and contemplates about future directions regardingw.r.t. Semantic Web multimedia applications;
- Part V (Appendix) gathers sources and the author’s contributions.

Readers familiar with both multimedia metadata and equipped with knowledge of the Semantic Web (technologies) may choose to skip Part I and directly start with Part II. The core of the thesis is in Part III, as it addresses the research question given earlier in this chapter (cf. section 1.2).

Note that a detailed explanation of the research this thesis is built upon is given later in section B.1 (Appendix B). This work was accompanied by the author’s activities within W3C. The author was active in the first Multimedia Semantics Incubator Group (MMSEM- XG) in 2006/2007. Further, the author has been active in the Semantic Web Deployment Working Group (ongoing) focusing on the RDFa specification[6], more specially on the use cases, test cases and the implementation report. Finally, the author has been active in the LinkingOpenData[7] project, realising the formalisation and interlinking of statistical data [125] and proposing a new interlinking method [144; 143].

In the following a detailed reader’s guide—on the chapter level—is given:

Part I–Scope and Foundations Introduces the foundations and sums up existing work. The goal is to make the reader familiar with the problem domain. Introduction Gives a motivation and defines the research questions. Related and Existing Work Related work is discussed and critically reviewed. Multimedia Metadata Foundations of multimedial metadata (M3) are explained. Semantic Web Semantic Web basics are explained.

Part II–Methods and Requirements Constitutes theoretical elaborations on scaleable yet expressive multimedia content descriptions. Creating Smart Content Descriptions Elaborates on how multimedia content descrip- tions are created (from extraction to ontology engineering). Scaleable yet Expressive Content Descriptions Introduces requirements for scaleable yet expressive multimedia content descriptions.

Part III–SWMA Engineering Addresses three core issues of engineering Semantic Web Mul- timedia applications. Rational & Common Concepts Lists basic design principles and defines common con- cepts. A Performance and Scalability Metric for Virtual RDF Graphs Addresses issues w.r.t. the access of distributed metadata sources. Media Semantics Mapping Addresses issues regarding the Semantic Gap in media descriptions. Ef ficient Multimedia Metadata Deployment Addresses multimedia metadata deploy- ment issues.

Part IV–Conclusion and Outlook Discusses lessons learned and future directions. Concluding Remarks The work is reviewed and discussed. Outlook A number of possible developments regarding SWMA is presented.

Part V–Appendix Gathers sources and author’s contributions. Sources Lists sources of RDF graphs and applications in the context of this work. Author’s Contribution Lists the author’s contributions in the realm of this thesis. Reference Material Offers a collection of good practice material for SWMA. Glossary Gives a short explanation of terms used in this work.

1.4 What this work is NOT about

This thesis does not attempt to define a solution for a semantic description of multimedia content. Defining such a formal specification, i.e., an ontology, would contradict with the genuine idea behind ontologies: to be based on an agreement of domain experts. The reader is invited to note the plural form; it is rather due to the fact that ontologies are based on a shared understanding of a domain than to the circumstance that the author is not able to or willing to perform such a task.

Not in the scope of this thesis are multimedia content issues, such as compression, codecs, etc. Further, issues as data access and delivery (caching, streaming, broadband, etc.) or access control-issues, such as ACLs, etc., are not in the primary scope of the thesis. However, we refer to such issues if they have a significant impact on the issues discussed earlier, i.e., issues that are in the scope of the work.

Chapter 2 Related and Existing Work

“In statu nascendi.”

(Latin phrase)

The title of this work—Building Scaleable and Smart Multimedia Applications on the Semantic Web—contains terms, which have to be clarified, and put into context before one is able to go into greater detail. For quite a lot of these terms, no general definition is available. Were appropriate, such a definition in the context of the thesis is given. Hence, this chapter discusses the following terms, along with their interpretation in the context of the work at hand:

- “Semantic Web Applications”, cf. section 2.1
- “Multimedia Applications”, cf. section 2.2
- “Scalability and Expressivity”, cf. section 2.3

For each of the phrases an explanation is given, relevant existing and related work is dis- cussed. Where applicable, research projects are listed exemplary—some of them the author has participated in.

As an aside, it is worth noting that each technology undergoes certain phases ranging from foundational academic research to practical exploitation. Semantic Web technologies are—as time of writing of this thesis—according to Gartner’s Hype Cycle[1] in the so called “Technology Trigger” phase. This first phase of a Hype Cycle is the breakthrough, product launch or other event that generates significant press and interest. While from the infrastruc- tural point of view a lot work has already been done (annotations, languages, services, etc.), practical aspects as for example scalability of metadata have not been widely addressed. However, a range of activities can be noticed in this field, be it grass-root-like or educational and outreach activities.

2.1 Semantic Web Applications

In 2007, the Semantic Web Challenge[2] (SWC) is being held the fifth time in a row. The SWC—an event for demonstrating practical progress towards achieving the vision of the Semantic Web—is organized in conjunction with the International Semantic Web Conference (ISWC). Several purposes are served, namely (i) the SWC enables to illustrate to society what the Semantic Web can provide, (ii) gives researchers an opportunity to showcase their work and compare it to others, and (iii) stimulates current research to a higher final goal by showing the state-of-the-art every year.

To ensure a certain level of comparability, the SWC has listed a number of minimal re- quirements, a Semantic Web applications must meet in order to be able to participate in the challenge. These criteria are outlined and discussed in the following.

1. The meaning of data has to play a central role.
- Meaning must be represented using formal descriptions,
- Data must be manipulated/processed in interesting ways to derive useful infor- mation, and
- This semantic information processing has to play a central role in achieving things that alternative technologies cannot do as well, or at all.

2. The information sources...
- Should have diverse ownerships (i.e. there is no control of evolution),
- Should be heterogeneous (syntactically, structurally, and semantically), and
- Should contain real world data, i.e. are more than toy examples.

3. It is required that all applications assume an open world, i.e. assume that the infor- mation is never complete.

Discussing the above listed criteria, we note the following w.r.t. the scope of this work.

Regarding the “meaning of data”: Though formal is formulated quite liberal, in the context of the Semantic Web the languages of choice are somehow limited to being RDF- based, such as OWL and the like.

Regarding the “information sources”: Firstly, the requirement that the sources need to have diverse ownerships is obviously needed to be able to demonstrate the Web character- istic; cf. Definition 2.2. Secondly, asking for real world data rather than for constructed, limited toy examples supports the very issue of this thesis.

Regarding the “open world assumption”: Due to the Web-scale reasoning process, this is a non-trivial issue; recently Fensel and van Harmelen [98] elaborated on that issue.

We subscribe to the above stated view on the requirements for Semantic Web applications, and additionally point out that a Semantic Web application is a Web application, after all. The lessons learned in this area should be taken into account, as well. Well-known in- frastructure, processes, and methodologies [3] for handling content and metadata should be utilised. Consequently, before we give a definition of what is to be understood by a Semantic Web application, we define Web application as follows[4].

Definition 2.1 (Web Application).

A Web Application is a software program that meets following minimal requirements:

- It is based on the HyperText Transfer Protocol HTTP [169] and Uniform Resource Iden- tifiers URI [33; 60][5] ;
- For human agents, the primary presentation format is the Hypertext Markup Lan- guage (X)HTML[6] [330];
- For software agents, the primary interface is REST-compliant [100] or may be based on Web services[7] —cf. SOAP [276], WSDL [55], and UDDI [301];
- The application operates on the Internet;
- The number of (concurrent) users is undetermined.

Note that where primary is used in Definition 2.1, it is possible and likely that other ren- dering formats (such as PDF[8] ) or protocols (for example XMPP[9] ) may as well be offered by a Web application in addition to the ones mentioned. Note as well that the last characteristic both effects the scalability and performance of a Web application.

In the next step we give—based on Definition 2.1 and motivated by the requirements of the Semantic Web Challenge—a definition of a Semantic Web application.

Definition 2.2 (Semantic Web Application).

A Semantic Web Application is a Web application that additionally to the requirements listed in Definition 2.1, meets the following minimal requirements:

- The metadata (metadata sets) used in the Web application must be machine readable and machine interpretable[10], i.e, it is based on the Resource Description Framework RDF [203][11] ;
- A set of formal vocabularies—potentially based on OWL [239]—is used to capture the domain of discourse[12] ; at least one of the utilised vocabularies and/or metadata sets has to be proven not to be under (full) control of the Semantic Web application maintainer;
- SPARQL [253] should be used for querying, and RIF[260] may be utilised for exchang- ing rules. q

The restriction that a (Semantic) Web application is expected to operate on the Internet is to ensure that Intranet—or for the sake of correctness: Intraweb—applications utilising (Semantic) Web technologies are not understood as (Semantic) Web applications in the nar- rower sense per se. The reader is invited to note that this requirement is a matter of the control over the data and the schemas rather than a question of the sheer size of the de- ployed application.

Another very important aspect of Semantic Web Applications was paraphrased by Ora Lassila in his keynote at the Scandinavian Conference on AI (SCAI) 2006 [193]:

Any specific problem (typically) has a specific solution that does not require Se- mantic Web technologies.

Q: Why then is the Semantic Web so attractive?

A: For future-proofing. Semantic Web can be a solution to those problems and situations that we are yet to define.

It was also Lassila who coined the term “serendipity”; serendipity in interoperability (how to interoperate with systems we knew nothing about at design time?), serendipity in information reuse (accessible semantics of the information), and serendipity in information integration (can information from independent sources be combined?).

As an exemplary Semantic Web application[13], we discuss mle, the mailing list explorer [148] in the following[14].

illustration not visible in this excerpt

Figure 2.1: An examplary Semantic Web application.

Following and understanding discussions on mailing lists is a prevalent task for execu- tives and policy makers in order to get an impression of one’s company image. However, existing solutions providing a Web-based archive require substantial manual effort to search for or filter certain information. With mle (cf. Fig. 2.1) we propose a new way to automati- cally process mailing list archives. The tool is realised based on two Semantic Web technolo- gies: Firstly, SIOC (Section 4.4.2) is utilised as the primary vocabulary for describing posts, people, and topics; secondly the RDF metadata is deployed by means of embedding it in the Web page encoded in XHTML+RDFa (see section 4.6.2).

2.1.1 Projects & Activities

In the following, projects and activities dealing with how to build, enhance, or utilise Seman- tic Web applications are discussed. The discipline of Semantic Web application building is a quite new one[15], hence selecting appropriate tools is an elaborate task[16].

A good starting point for applications of ontologies is the Handbook on Ontologies [279]. Fensel et. al. [97] describe areas for application of the Semantic Web, focusing on knowledge management and electronic commerce.

At the time of writing, the W3C Semantic Web Education and Outreach Interest Group[17] seeks to develop strategies and materials to increase awareness among the Web community of the need and benefit for the Semantic Web, and educate the Web community regarding related solutions and technologies.

The following is a short overview on prominent (research) projects operating in the Se- mantic Web application domain:

- SWAD-Europe - S emantic W eb A dvanced D evelopment in Europe, Projects goals from

The SWAD-Europe project aims to support W3C’s Semantic Web initiative in Europe, providing targeted research, demonstrations and outreach to en- sure Semantic Web technologies move into the mainstream of networked computing. The project aims to support the development and deployment of W3C Semantic Web specifications through implementation, research and testing activities. Semantic Web Advanced Development for Europe (SWAD- Europe) aims to play a key role in the evolution of the Semantic Web, through education and outreach to developers, organisations and content creators; through Open Source implementation and testing, and through pre-consensus technology development to drive and inform the creation of new Semantic Web standards.

- Knowledge Web

Mission from

The mission of Knowledge Web is to strengthen the European industry and service providers in one of the most important areas of current computer technology: Semantic Web enabled E-work and E-commerce. The project concentrates its efforts around the outreach of this technology to industry. Naturally, this includes education and research efforts to ensure the durabil- ity of impact and support of industry.

- SIMILE - S emantic I nteroperability of M etadata and I nformation in un L ike E nvironments,

Due to, SIMILE ...

... seeks to enhance inter-operability among digital assets, schemata / vo- cabularies / ontologies, metadata, and services. A key challenge is that the collections which must inter-operate are often distributed across indi- vidual, community, and institutional stores. We seek to be able to provide end-user services by drawing upon the assets, schemata/vocabularies/on- tologies, and metadata held in such stores.

[...] The project also aims to implement a digital asset dissemination architec- ture based upon web standards. The dissemination architecture will provide a mechanism to add useful ”views” to a particular digital artifact (i.e. asset, schema, or metadata instance), and bind those views to consuming services.

- CAS - C S A KTive S pace, states that:

CAS is an integrated Semantic Web application which provides a way to explore the UK Computer Science Research domain across multiple dimen- sions for multiple stakeholders, from funding agencies to individual researchers.

- SIOC - S emantically- I nterlinked O nline C ommunities

The homepage states that SIOC

provides methods for interconnecting discussion methods such as blogs, fo- rums and mailing lists to each other. It consists of the SIOC ontology, an open-standard machine readable format for expressing the information con- tained both explicitly and implicitly in internet discussion methods, of SIOC metadata producers for a number of popular blogging platforms and con- tent management systems, and of storage and browsing/searching systems for leveraging this SIOC data.

The reader is invited to note that SIOC has been submitted to W3C for standardisation and—given its widespread use—is likely to become a recommendation, soon.

- Semantic Web Search Engines

Although the available amount of RDF-based data on the Web is rather limited com- pared to the overall size of the Web, already dedicated search engines and indexer are available[18]. Typically, they operate on the triple level, i.e., indexing triple along with their provenance information. Among those SE, the following are the ones widely used:

- Sindice (DERI):—a semantic indexer;
- Falcon (ISW China):—a Semantic Web search engine;
- Swoogle (UMBC):—a mature, hybrid SE with some 6 million URIs indexed;
- Zitgist LLC’s PTSW:—a web service archiv- ing the location of recently updated RDF documents on the Web.

2.2 Multimedia Applications

In this section we first explain the term smart multimedia content in the context of this work. Then, multimedia metadata deployment issue are discussed. Further, the notion of a Multi- media Application on the Semantic Web is defined. Finally, this section concludes with a review of recent projects and activities illustrating the current status of Semantic Multimedia Appli- cations [177; 152].

2.2.1 Smart Multimedia Content

In the area of smart multimedia content handling [289], the past few years of research have produced a notable output. This section summarizes known approaches to bridge the Se- mantic Gap, and highlights activities in this area. For a comprehensive discussion on the state-of-the-art, and a proposal for a research agenda including open research issues in the development of the Semantic Web from the perspective of hypermedia research, the reader is referred to [311].

An array of research is available [129; 84; 336; 116; 334] dealing with the Semantic Gap. However, a generic, domain-independent solution to the problem is not at hand. An exam- ple instantiation for the Semantic Gap is depicted in Fig. 2.2 and described in the following example, respectively.

Example 2.1 (Semantic Gap Example).

Take for example some visual content source—such as a video clip—depicting a soccer ball. Further, assume we have low-level features, say, colour and shape, which describe the con- tent throughout. This setup is shown in Fig. 2.2.

The question now is if and how it is possible to map the two available low-level features shape=circular and colour={black, white}, occurring in a certain region, to the (logical) concept soccer ball. If it is possible, then the question is under which circumstances it can be realised. The ultimate goal would be to handle the generic case, i.e., without any further knowledge about the domain. ❢

Starting with a critical review of both MPEG-7 and OWL—as well as an analysis of their respective interoperability—in [310] and [229] we can state that there is a certain degree of freedom in choosing some markup for a certain task. From a methodological point of view there exist a number of approaches that may be taken to realise smart media content descriptions[19], reviewed in the following.

- The purist approach, where either a metadata format as MPEG-7 or logic-based ap- proach (e.g. an description based on OWL) is assumed to fulfil the task. An example for a logic language extended for multimedia retrieval can be found in [209];
- The integration approach, that tries to embed or translate (parts of) one language into the other—a prominent exponent is [171]; a recent example can be found in [107];
- The layer approach, also known as the “principle of subsidiarity”, where each vocabu- lary is used in the appropriate realm. For example [290] is a promising research work that represents this approach. Related work can also be found in [299];

illustration not visible in this excerpt

Figure 2.2: The Semantic Gap in Multimedia Content Description.

- Finally it is possible to invent a new vocabulary —such as proposed in [170], and inde- pendently in [12]—which is not advisable in terms of interoperability[20].

Standardisation. Since 2005 considerable work has been carried out in the World Wide Web Consortium (W3C) [317] on multimedia content description and understanding The Multimedia Task Force of the Semantic Web Best Practices and Deployment Working Group[21] has elaborated on multimedia markup for a while. In early 2006 another W3C activity has been launched,the Multimedia Semantics Incubator Group [315]. Its mission was to ...

... show metadata interoperability can be achieved by using the Semantic Web technologies to integrate existing multimedia metadata standards. Thus, the goal of the XG is NOT to invent new multimedia metadata formats, but to leverage and combine existing approaches

The author of this thesis has been active since and contributed to various deliverables, such as the Incubator Group report “Multimedia Vocabularies on the Semantic Web” [141].

The scope of the Moving Picture Experts Group (MPEG)[22] has been extended recently from only signal coding to multimedia metadata, processes and applications. The “Multi- media Content Description Interface” (MPEG-7) standard [222; 220; 221; 218] specifies the description of multimedia content, integrating content structure (e.g. shots of video, regions of image), low-level visual and audio features and high-level descriptions (e.g. production information). The high-level descriptors allow linking external thesauri or knowledge bases and thus the integration of media oriented content descriptions with Semantic Web tech- nologies. MPEG-7 profiles have been proposed as subsets for certain application areas to reduce the interoperability problem caused by the comprehensiveness and generality of the MPEG-7 standards. The Detailed Audiovisual Profile (DAVP) [19], the first MPEG 7 pro- file with formal semantics of the description elements (in order to solve the interoperability problem), has been developed with contributions from the author of this thesis[23].

While standardisation efforts are emerging and already produce first substantial results, the term “smart multimedia content” is still not uniformly defined. Even in marketing slang the term has been (mis)used. To avoid confusion what is meant by smart media content, we define it in the context of this work as follows:

Definition 2.3 (Smart Multimedia Content—SMC).

Smart Multimedia Content is multimedia content along with metadata enabling interoper- able and advanced operations across systems. Typically SMC has two characteristics:

- The metadata can be available both in terms of low-level features, as well as formal domain descriptions;

- It is self-descriptive (see also the “The Self-Describing Web”[24] ). q

The reader is invited to note that the definition of SMC is deliberately kept quite vague. This is due to the nature of SMC. Many forms of SMC may exist, and many technologies may be utilised to realise SMC. Hence, the above definition can be seen a least common denominator. Notable efforts regarding SMC have been reported from the mobile devices area and from ubiquitous computing.

2.2.2 Multimedia Metadata Deployment

To the best of our knowledge research regarding multimedia metadata deployment has not been widely performed. Current approaches are either not specific to multimedia or do not scale to the size of the Web. We will discuss available proposals and highlight issues with them in the following.

The W3C’s Protocol for Web Description Resources (POWDER) Working Group cur- rently works on a very powerful, but rather generic standard[25] to facilitate the publication of descriptions of multiple resources such as all those available from a Web site.

Adobe’s Extensible Metadata Platform (XMP)[26], which is primarily used for PDF doc- uments (but also usable with other formats, such as JPEG, PNG, etc.), shares some of our objectives. The Open Archives Initiative (OAI) has published the Compound Information Objects draft [191], a specification dealing with the publication of aggregations of distinct information units.

The Upper Mapping and Binding Exchange Layer (UMBEL) specification[27] is a high- level subject layer for mapping various ontologies with simple binding mechanisms for any structured formalism. Simply stated, UMBEL is both a high-level reference “bag of subjects” and light-weight mechanisms for binding to Web ontologies via proxies for those subjects. For linked datasets, the semantic sitemaps extension [65] is available, providing basic de- ployment descriptions on the data access level. We are currently working on a proposal labelled voiD—“Vocabulary of Interlinked Datasets”[28] allowing the description of linked datasets on the content-level.

However, the main issue with multimedia metadata deployment is the so called “last mile”.

The Last Mile. For communications provider, the last mile is the final leg of delivering connectivity to a customer. Equally, in business the last mile is used to describe the process of getting any deliverable to the final consumer. In the case of multimedia metadata deploy- ment, the last mile is the delivery of multimedia metadata to the end-user, i.e., a Semantic Web agent able to “understand” RDF. In this work, multimedia metadata deployment has been assigned a precise meaning:

Definition 2.4 (Multimedia Metadata Deployment).

Multimedia Metadata Deployment is the packaging and the delivery of the metadata along with an multimedia asset. Regarding the Semantic Web at least two requirements need to be fulfilled:

- The data model of the deployment has to be RDF;
- While typical container formats such as SMIL, SVG, and PDF may be used, there has to been at least one deployment path that works with (X)HTML. q

In Fig. 2.3 the basic setup regarding multimedia deployment on the Semantic Web is given. Typically a couple of distinct players is involved, namely (i) human users consum- ing a Web page and its embedded media objects, (ii) Semantic Web agents consuming the RDF-based metadata describing the media objects, further (iii) Semantic Web languages and ontologies, and finally (iv) multimedia metadata formats.

illustration not visible in this excerpt

Figure 2.3: Deploying multimedia formats on the Semantic Web.

2.2.3 Semantic Web Multimedia Applications—SWMA

Two kinds of applications can roughly be distinguished, namely (i) Web applications and (ii) multimedia applications. As stated above, Semantic Web applications are a subset of Web applications. In the intersection of Semantic Web Applications, and Multimedia Appli- cations we finally find the so called “Semantic Web Multimedia Applications” (SWMA), as depicted in Fig. 2.5.

Based on the definitions 2.2 (cf. page 13) and 2.3 (cf. page 19), it is now possible to define:

Definition 2.5 (Semantic Web Multimedia Application—SWMA).

A Semantic Web Multimedia Application (SWMA) is a Semantic Web Application (cf. Def. 2.2) dealing with smart multimedia content (cf. Def. 2.3). The SWMA may support sharing, cre- ating, manipulating, or delivering of the multimedia content; at least one of the following characteristics applies:

- The application deals with spatio-temporal issues w.r.t. the content description;
- The application deals with the Semantic Gap. q

State-of-the-Art examples of Semantic Web multimedia applications are[29], the Podcast Pinpointer[30], the MultimediaN E-Culture demo[31], and FOAFing-the-Music[32]. The latter two are depicted in the Fig. 2.4.

Abbildung in dieser Leseprobe nicht enthalten Abbildung in dieser Leseprobe nicht enthalten

(a) MultimediaN E-Culture Demo. (b) FOAFing-the-Music.

Figure 2.4: Examples of Semantic Web Multimedia Applications.

While the MultimediaN E-Culture Demo (Fig. 2.4(a)) focuses on artworks supporting a range of vocabularies (AAT, ULAN, WordNet, etc.), FOAFing-the-Music (Fig. 2.4(b)) is a semantic music recommender system, based on a user’s FOAF profile. Both SWMA’s have successfully taken part in the 2006 Semantic Web challenge (first and second prize).

illustration not visible in this excerpt

Figure 2.5: Semantic Web Multimedia Applications (SWMA).

2.2.4 Projects & Activities

A range of projects (EU-funded research projects, national programmes, and international project) focuses on smart media; a selective overview is given in the following. Note: In projects marked with Þ the author of this work has been or still is active.

- aceMedia -

Citing [189]:

[...] an approach for knowledge and context-assisted content analysis and reasoning based on a multimedia ontology infrastructure is presented. [...] In aceMedia, ontologies will be extended and enriched to include lowlevel audiovisual features, descriptors and behavioural models in order to support automatic content annotation. This approach is part of an integrated framework consisting of: user-oriented design, knowledge-driven content processing and distributed system architecture. The overall objective of ace- Media is the implementation of a novel concept for unified media repre- sentation: the Autonomous Content Entity (ACE), which has three layers: content, its associated metadata, and an intelligence layer.

- Þ K-Space K nowledge Space of Semantic inference for automatic annotation and re- trieval of multimedia content,

From we learn that ...

K-Space integrates leading European research teams to create a Network of Excellence in semantic inference for semi-automatic annotation and retrieval of multimedia content. The aim is to narrow the gap between content de- scriptors that can be computed automatically by current machines and algo- rithms, and the richness and subjectivity of semantics in high-level human interpretations of audiovisual media: The Semantic Gap.

- MUSCLE M ultimedia U nderstanding through S emantics, C omputation and Le arning, describes the goals as follows:

MUSCLE aims at creating and supporting a pan-European Network of Ex- cellence to foster close collaboration between research groups in multimedia datamining on the one hand, and machine learning on the other [...]

- Þ NM2 N ew M illennium, N ew M edia

The project homepage,, says:

NM2 unites leading media and technology experts from across Europe to develop compelling new media genres, which utilise the unique character- istics of broadband networks. The project is creating new production tools for the media industry that allow the easy production of interactive non- linear broadband media genres, which can be personalised to suit the pref- erences of the individual viewer. Viewers are able to interact directly with the medium and influence what they see and hear according to their per- sonal choices and tastes.

- Þ SALERO - S emantic A udiovisuaL E ntertainment R eusable O bjects At it is claimed that ...

SALERO aims at making cross media-production for games, movies and broadcast faster, better and cheaper by combining computer graphics, lan- guage technology, semantic web technologies as well as content based search and retrieval.

- REVEAL THIS [33] - RE trieval of V id E o A nd L anguage for T he H ome user in an I nformation S ociety

The project homepage——explains the scope as follows:

REVEAL THIS addresses a basic need underlying content organisation, fil- tering, consumption and enjoyment by developing content processing sys- tems that will help European citizens keep up with the explosion of digital content scattered over different platforms (radio, TV, World Wide Web, etc), different media (speech, text, image, video) and different languages. Peo- ple should be spending most of their leisure time enjoying the content, not searching for it. REVEAL THIS aims at developing content processing tech- nology able to capture, semantically index, categorise and cross-link mul- tiplatform, multimedia and multilingual digital content, as well as provide the system user with semantic search, retrieval, summarisation and transla- tion functionalities.

- FilmEd

[266] and state:

The FilmEd project’s original aim was to provide the tertiary education sec- tor with broadband access to high quality and unique film and video con- tent stored within Australian moving image archives to enhance curricu- lum based programs concerned with screen literacy, film and media studies, journalism and Australian culture and history. A prototype called Vannotea has been developed which enables the collaborative indexing, annotation and discussion of audiovisual content over high bandwidth networks. It enables geographically distributed groups connected across broadband net- works (GrangeNet) to perform real time collaborative sharing indexing, dis- cussion and annotation of high quality digital film/video and images (and shortly 3D objects).

- MAENAD - M ultimedia A ccess across E nterprise N etworks a nd D omains Quoting

The objectives of this project are to develop an underlying data model, meta- data mapping schemas (RDF, XML), metadata generators, metadata reposi- tories, query languages, search interfaces and search engines which can pro- vide solutions to the problems of resource discovery, preservation, deliv- ery and management. Resource Discovery of single-medium atomic dig- ital objects has advanced in the past 5 years due to the development of metadata standards such as Dublin Core which provides semantic interop- erability for textual documents and MPEG-7 which will provide the same for audio, video and audiovisual documents. However the future will lead to many more compound multimedia documents on the web which com- bine text, image, audio and video in rich complex structured documents in which temporal, spatial, structural and semantic relationships exist between the components. The problems of indexing, archiving, searching, browsing, retrieving and managing these kinds of structured dynamic documents are infinitely more complex than the resource discovery of simple atomic textual documents.

2.3 Scalability and Expressivity

A good starting point for discussing scalability and expressivity issues is the OWL Use Cases and Requirements document [242], which states:

Expressivity determines what can be said in the language, and thus determines its inferential power and what reasoning capabilities should be expected in sys- tems that fully implement it. An expressive language contains a rich set of prim- itives that allow a wide variety of knowledge to be formalized. A language with too little expressivity will provide too few reasoning opportunities to be of much use and may not provide any contribution over existing languages.

Expressivity. A good place to start the discussion on expressivity is the work of Levesque and Brachman [198]. They examine computational limits on automated reasoning and its ef- fect on knowledge representation. The conclusion they draw is that there exists a tradeoff between the expressiveness of a representational language and its computational tractabil- ity: When one limits what can be in a knowledge base its implication are more manageable computationally. Restricting the logical form of a knowledge base can lead to very spe- cialized forms of inference. Well-known and practical relevant forms are: the relational (database) form [258], Description Logics (cf. section 4.1.2), and the logic-program form (cf. section 4.1.3),

Dixon et. al. [79] have discussed issues associated with systems evolution in decen- tralised organisations. They have proposed a five layer model of information expressivity that provides a theoretical framework for classifying the system variants.

James Hendler sketched the main motivation for expressivity in [153]:

However, I argue that semantic web techniques can, and must, go much further. The first use of ontologies on the web for this purpose is pretty straightforward— by creating the service advertisements in an ontological language, tools could use the hierarchy (and property restrictions) to find matches via the class/sub- class properties or other semantic links. For example, someone looking to buy roses might find florists (who sell flowers) even if there were no exact match that served the purpose. Using, for example, description logic (or other inferential means), the user could even find categorizations that werent explicit. So, for ex- ample, specifying a search for animals that were of ”size = small” and ”type = friendly,” the user could end up finding the Pet Shop Mary is working for, which happens to specialize in hamsters and gerbils.

[200] has investigated the expressive power and parsing complexity of the a formalism originally designed for displaying formal propositions and proofs in natural language, the so called Grammatical Framework. Recently a survey of the usage of ontology languages and their expressivity has been performed [321].

Scalability. While expressivity is defined relatively sharp, the term scalability is used to address a range of issues in different domains. A generic definition is not available, hence does not make sense when comparing the differing interpretations in various domains. For example, Hill [154] stated in 1990 in the context of microprocessor systems that

[...] I first examine formal definitions of scalability, but I fail to find a useful, rig- orous definition of it. I then question whether scalability is useful and conclude by challenging the technical community to either (1) rigorously define scalability or (2) stop using it to describe systems.

The following Table 2.1 lists some examples for the usage of scalability and issues connected with.

illustration not visible in this excerpt

Table 2.1: Scalability in selected domains.

In the literature some research already has been performed regarding scalability. Most importantly, Bondi [44] recently elaborated on scalability issues on a generic level; there he considers four types of scalability: load scalability, space scalability, space-time scalability, and structural scalability:

- Load scalability. If a system has the ability to function gracefully, i.e., without undue delay and without unproductive resource consumption or resource contention at light, moderate, or heavy loads while making good use of available resources;
- Space scalability. If its memory requirements do not grow to intolerable levels as the number of items it supports increases.
- Space-time scalability. If a system continues to function gracefully as the number of objects it encompasses increases by orders of magnitude.
- Structural scalability. If its implementation or standards do not impede the growth of the number of objects it encompasses, or at least will not do so within a chosen time frame.
The above given attributes form the basis of the analysis given in chapter 6.

2.3.1 Infrastructure Level

On the infrastructure level—as RDF stores, reasoning facilities, and the like—a number of research activities can be listed. The outcome here mostly are benchmarks, evaluations, and guides how to implement the infrastructure in an optimal way.

RDF stores. Some practical research w.r.t. triple stores has been reported from the SIM- ILE[34] project [196]. A quite complete survey on RDF storage systems with special atten- tion on scalability is available as a deliverable of the Semantic Web Advanced Development for Europe (SWAD-Europe) project [25]. Wielemaker [325] outline an Prolog-based infrastructure for loading and saving RDF triples, elementary reasoning with triples and visualization. A predecessor of the infrastructure described there has been used in applica- tions for ontology-based annotation of multimedia objects. The library aims at fast parsing, fast access and scalability for fairly large but not unbounded applications up to 40 million triples. In [120] Guo present an evaluation of four knowledge base systems w.r.t. to use in large OWL applications. The datasets used range from 15 OWL files totalling 8MB to 999 files totalling 583MB. They evaluated two memory-based systems and two systems with persistent storage. The conclusion of the work is that existing systems need to place a greater emphasis on scalability. For a criticism of benchmarks we invite the reader to refer to [324].

Reasoning. Wache have published some related research on Scalability Techniques for Reasoning with Ontologies [318]. In [214] some practical related results w.r.t representing and reasoning about incomplete information are presented. In [282]—based on contexts— a theoretical approach and implementation of Contextual Reasoning in a Semantic Web KB and the associated testing results are presented. Another work on reasoning worth mention- ing in the realm of ubiquitous computing is [232]. Heflin [151] proposes in his PhD thesis “... to use reasoning methods that are not sound and complete”, which is sensible due to the Open World Assumptions, and further ...

... as an alternative to description logics is to use Horn logic. It has been shown that although Horn-logic and the most common description logics can express things the other cannot, neither is more expressive than the other.

Software & Data Engineering. The development of Semantic Web applications from an object-oriented programmer’s point of view is discussed in [188]. Alba [7] report on IBMs Semantic Super Computing platform that has been designed to ingest, augment, store, index and support queries on billions of documents. They describe the challenges and lessons learned in the areas of solution design, hardware, operations, middleware, al- gorithms, and testing.

2.3.2 Application Level

The application level covers the What? rather than the How?. This level is mainly in the scope of this thesis.

Multimedia/Hypermedia. For the multimedia realm however, there exists little research efforts. One of the few to mention is [190] that deals with the data rather with the metadata level of scalability. Another work is [9] that focuses on scalability “in both the data and application domains” with an industrial hypermedia system as the testbed.

Semantic Web Applications. In [132], Hartmann and Sure describe their contribution to the 2003 Semantic Web Grand Challenge that realizes semantic-based search and access facilities to information represented by semantic portals. Such portals typically provide knowledge about a specific domain and rely on ontologies to structure and exchange this knowledge. They claim that their approach has the following benefits: (i) Significant re- duction of content maintenance overhead, (ii) knowledge accessibility for both human and machine agents, atop existing information sources, and (iii) suitability for productive envi- ronments.

2.3.3 Projects & Activities

In the recent years a number of project implicitly and explicitly addressing scalability issues can be noticed. In the following we discuss some prominent examples of research projects in this area.

- REOL - R easoning for E xpressive O ntology L anguages, states the project goals as follows:

The primary goal of this project is to develop techniques that address the ex- pressivity requirements of various applications. This is to be achieved by a synergy between two previously disjoint techniques. Currently, tableaux al- gorithms are the state-of-the-art for reasoning with DL ontologies. However, in recent years, great progress has been made in designing resolution-based algorithms for reasoning with ontologies. These two types of algorithms seem to enjoy two complementary properties: tableaux are model-building calculi that seem to perform well on satisfiable problems, whereas resolution is a refutation calculus that seems to perform well on unsatisfiable problems. The main idea of this project is to extend both calculi and provide a common framework for integrating them.

- REWERSE - R easoning on the We b with R ules and S emantics,

The homepage of the project,, gives following overview:

The community networked and structured by REWERSE will (i) develop a coherent and complete, yet minimal, collection of inter-operable reasoning languages for advanced Web systems and applications; (ii) test these lan- guages on context-adaptive Web systems and Web-based decision support systems selected as test-beds for proof-of-concept purposes; (iii) bring the proposed languages to the level of open pre-standards amenable to submis- sions to standardisation bodies such as the W3C.

- MOSES - MO dular and S calable E nvironment for the S emantic Web, Projects goals from

The only way to make the Semantic Web a success is a bottom up approach, enabling it to emerge from the aggregation of locally organized knowledge fragments of varying size. To demonstrate this approach MOSES will create a small but scalable ontology-based Knowledge Management System and an ontology-based search engine that will accept queries and produce answers in natural language.

The main goals of the scalable environment will be to demonstrate that it is possible to upgrade ontological systems from new contents, either creat- ing new knowledge domains or incorporating new knowledge into a pre- existing domain. This requires extracting structured knowledge from plain content.

- Þ SCALEX - Sca lable Ex hibition Server,

Projects goals from

SCALEX is an easy to use toolbox for museums and companies that deal with the creation of digital content. With SCALEX it is possible to combine digital content, as for example texts, images, videos and audios, with real exhibition objects. In addition to that SCALEX also supports the creation of purely virtual exhibitions. The presentation of the digital media is directly coupled to the interests of the specific visitor. Exhibitions that are enhanced with digital media open up new interaction possibilities and thereby offer the visitors a completely new experience during exhibition visits.

- DIET - D ecentralised I nformation E cosystems T echnology, states:

The project will involve the theoretical study, implementation and validation of a novel information management framework which will use ecosystem metaphors to turn the global information infrastructure into an open, adap- tive, scalable and stable environment for service provision. Initially this will involve the design of an overall framework in which infohabitants - entities which can process information - can interact and coexist in societies set in an information environment.

Note: In projects marked with Þ, the author of this work has been or still is active.

While generic scalability and expressivity issues have been researched in an array of projects, practical issues often are neglected. The Semantic Web, as an extension of the Web is per definition a “scaling entity”. Real success in terms of user loyalty, community, and market is only truly possible if one is able to offer solutions that scale to the size of the.

Chapter 3 Multimedia Metadata

“Tabula rasa.”

(Latin phrase)

This chapter gives an overview on multimedia metadata formats relevant in the context of the Web; it addresses still-images, audio, audio-visual, and multimedia container formats. Based on the work performed by the author and colleagues in the realm of the W3C Mul- timedia Semantics Incubator Group (MMSEM-XG) [315] diverse aspects of the multimedia metadata formats are discussed herein.

We will, however, not discuss the basic multimedia content formats—as for example the PNG format[1] for still-images, or the Ogg format[2] for videos—used on the (Semantic) Web to represent, and deliver audio-visual content. For an overview and a sound discussion on these basic multimedia formats, and their capabilities w.r.t. metadata, the interested reader is invited to refer to [105], and [63].

3.1 Multimedia Container Formats

In contrast to the basic multimedia content formats mentioned above, container formats are able to host a range of media, typically including text. Multimedia container formats may differ regarding their capabilities of arranging, and presenting the content. This representa- tional aspect can further be differentiated into (i) a spatial dimension (2D, 3D, layout, etc.), and (ii) a temporal dimension (synchronisation, parallel playout, etc.). Another feature rel- evant in the context of container formats is the support for manipulating the content (or at least its components) dynamically. Last but not least the support for metadata handling is of great importance, which is valid not only in the realm of the work at hand.

The formats discussed in the following have been selected due to their alignment with the Semantic Web, viz. they meet some minimal requirements[3], as being based on XML, utilising URIs, etc.

3.1.1 eXtensible HyperText Markup Language–(X)HTML

(X)HTML is the family name for the group of languages that form the lingua franca of the World Wide Web[4]. While HTML 4.01 [168] is the latest revision of the SGML-based branch of hypertext language dialects, the Web is based on, XHTML [330; 331] now takes over the role of the Web workhorse. Figure 3.1 depicts a HTML page conceived as a simple multimedia container. It demonstrates, how multimedia assets, as still-images, etc. can be embedded into a page and—in conjunction with links—forms a simple hypermedia document.

illustration not visible in this excerpt

Figure 3.1: Example HTML page as a multimedia container.

Though (X)HTML is in wide spread use, it bears some serious limitations. It only allows for a spatial arrangement of multimedia assets. Temporal issues can only be handled using scripting functionality in combination with the (X)HTML document object model (DOM).

(X)HTML and Metadata

Whilst earlier versions of (X)HTML took a global point of view regarding metadata[5], the newest generation—XHTML 2.0 [331]—heads after a sound basis for integrating metadata. The author of this thesis participates in this effort [5], known as XHTML+RDFa;for further details see section 3.2.3, below.

3.1.2 Scalable Vector Graphics–SVG

SVG [314] is a modularized language for describing two-dimensional vector and mixed vec- tor/raster graphics in XML. It allows for describing scenes with vector shapes (e.g. paths consisting of straight lines, curves), text, and multimedia (e.g. still images, video, audio). These objects can be grouped, transformed, styled and composited into previously rendered objects.

SVG files are compact and provide high-quality graphics on the Web, in print, and on resource-limited handheld devices. In addition, SVG supports scripting and animation, so SVG is ideal for interactive, data-driven, personalized graphics. SVG is based on the download-and-play concept. SVG has also a mobile specification, SVG Tiny, which is a sub- set of SVG.

illustration not visible in this excerpt

Listing 3.1: A sample SVG markup.

A sample SVG’s document code[6] is depicted in listing 3.1. Note, that though not primar- ily intended, it is possible to use SVG as a general purpose media container format[7].

Metadata which is included with SVG content is specified within the metadata ele- ments[8], with contents from other XML namespaces such as Dublin Core or RDF. The speci- fication states Individual industries or individual content creators are free to define their own metadata schema but are encouraged to follow existing metadata standards and use standard metadata schema wherever possible to promote interchange and interoperability. If a particular standard metadata schema does not meet your needs, then it is usually better to define an additional metadata schema in an existing framework such as RDF and to use custom metadata schema in com- bination with standard metadata schema, rather than totally ignore the standard schema.

When looking at the deployment of SVG+RDF, the results are rather disillusioning. For ex- ample a Google-search in the form of filetype:svg rdf in early September 2007 yielded only some 500 hits.

3.1.3 Synchronized Multimedia Integration Language–SMIL

SMIL[9] is a XML-based W3C recommendation [272] for describing interactive multimedia presentations. The language allows for describing the temporal behaviour of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presen- tation on a screen.

Components based on SMIL are used for integrating timing into XHTML [330] and into SVG [314]. SMIL components may have different media types, such as audio, video, im- age or text. The begin and the end time of different components are specified according to events in other media components. For example, in a slide show, a particular slide is dis- played when the narrator in the audio starts talking about it. Hyperlinks embedded in the presentation allow for random navigation through the presentation.

illustration not visible in this excerpt

Figure 3.2: Example SMIL document as a multimedia container.

Microsoft and others proposed a SMIL-based variant of HTML with the Timed Inter- active Multimedia Extensions for HTML (HTML+TIME)[10] ; the dissemination of SMIL still is somehow limited. This might as well be rooted in the complex issues w.r.t. multimedia presentations. An exemplary SMIL markup[11] is depicted in listing 3.2 on page 35.

SMIL and Metadata

With the metainformation module[12], SMIL now supports—additionally to the use of the meta element (from SMIL 1.0)—the description of metadata using the Resource Description Framework (RDF) model and Syntax (cf. section 4.3.3).





[4] For a detailed discussion on these aspects, the reader is invited to refer to Chapter 6.

[5] We note that the ongoing work regarding OWL 2 have not been taken into consideration; see also http: //



[1] A Hype Cycle is a graphic representation of the maturity, adoption and business application of specific technologies.,


[3] e.g.

[4] See also for characteristics of Web ap- plications.

[5] See section 4.3.1 for details





[10] For a discussion on this issue the reader is invited to refer to [312, Section 1.1]

[11] Section 4.3.3

[12] See section 4.3.4 for details

[13] Demonstrated at the Semantic Web Challenge 2007

[14] The application is available at

[15] To coordinate so called Semantic Web engineers, the author of this thesis founded a social site dedicated to the exchange of this issue. This social network is open for subscription to everyone interested in this area; it can be found at

[16] The interested read is referred to a repository at W3C, giving an overview on tools and environments:



[19] The reader is invited to refer to Chapter 5 for a detailed discussion on this issue.

[20] See also













[33] In 2006, a workshop in the framework of the LREC2006 conference was organized by REVEAL THIS in Genoa, Italy. With the authors contribution, [256] was presented at the Crossing Media for Improved Informa- tion Access workshop.




[3] The reader is invited to refer to [128] for further discussions on this topic.

[4] Cf.

[5] See for example

[6] From

[7] For an example see, e.g.,

[8] See section 21 of the SVG Recomendation at

[9] pronounced as “smile”


[11] From


Excerpt out of 277 pages


Building Scalable and Smart Multimedia Applications on the Semantic Web
University of Graz
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
19040 KB
multimedia, Web of Data, Semantic Web, linked data, RDF, URI, ramm.x
Quote paper
Dr. Michael Hausenblas (Author), 2009, Building Scalable and Smart Multimedia Applications on the Semantic Web, Munich, GRIN Verlag,


  • No comments yet.
Look inside the ebook
Title: Building Scalable and Smart Multimedia Applications on the Semantic Web

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free