WWW2011 Elephant

USEWOD2011 - 1st International Workshop on Usage Analysis and the Web of Data


co-located with the 20th International World Wide Web Conference (WWW2011)

Hyderabad, India, March 28th, 2011, 8:30 - 17:00

USEWOD2011 is part of the USEWOD workshop series.


Best Paper Award SPONSOR


Data Challenge SPONSOR

Gridline Logo

Keynote Speaker

Dr. Markus Strohmaier, TU Graz Markus Strohmaier
  • 13 May 2011: We have set up a public mailing list to keep you up to date with future developments of the USEWOD challenge dataset, the next USEWOD workshop, etc. The list is open, so you can sign up and contribute to the discussion!
  • 11 May 2011: Good news, everyone! A workshop report on USEWOD2011 has been published in the June 2011 issue of SIGIR Forum. In it you will find a summary of the workshop and each presentation, some details about the data challenge, etc.
  • 19 April 2011: About three weeks have already passed since USEWOD2011, but we still want to take the opportunity to thank everyone again for a great workshop and add a few closing words.
    We had some great sessions with presentation and very active discussions (congratulations again to the two award winning papers!). Especially the spontaneous round table in the end was inspiring. We will work towards turning USEWOD into a regular workshop series and the USEWOD dataset to be a competitive reference dataset. Therefore, we will set up a lightweight infrastructure for you to access and stay updated. We will post news about this here and elsewhere, so stay tuned.
    Also, RDF metadata for the workshop is online over on SWDF. And finally, we have added the permanent links to the online proceedings on arxiv.org for all papers to the programme below.
    See you again next year!
  • 09 March 2011: We are very happy to announce that Markus Strohmaier, a renowned researcher in the field of social streams and tagging, will attend USEWOD2011 to give the keynote at the beginning of the workshop!
  • 25 February 2011: The date for USEWOD2011 has now been set to Monday, 28/03. Also, the review process is underway, and we expect to have the results out soon. Good news: there will be an option for the authors of accepted papers to still get the WWW early bird registration rate!
  • 09 February 2011: We are very happy to announce that there will be two awards for USEWOD submissions! GridLine will sponsor our data challenge, and the LATC project will sponsor the best paper award for regular submissions.
  • 08 February 2011: We have extended the submission deadline for USEWOD by one week. The new deadline is now the 15th of February!
  • 23 December 2010: As an early christmas present, we are happy to announce that the USEWOD2011 Challenge Dataset is now online. Head over to the challenge page for more information.
  • 19 December 2010: Workshop website up and running.
  • 12 December 2010: Very happy to announce that USEWOD has been accepted as one out of 8 workshops for this year's WWW (from a total of 20 submissions)!

Invited Speaker

Dr. Markus Strohmaier will give the keynote talk at USEWOD2011 on the topic of Social Computation for the Web of Data: Motivation, Examples, and Outlook. Dr. Strohmaier is currently an assistant professor at the Knowledge Management Institute, Faculty of Computer Science at Graz University of Technology. He is a renowned expert on social streams and social tagging systems, and has given many talks and published extensively in this area.

For the abstract of Dr. Strohmaier's talk, please refer to the program below.

Program and Accepted Papers

When What Slides
08:30-09:00 Welcome and Opening
09:00-10:00 Social Computation for the Web of Data: Motivation, Examples, and Outlook
(Keynote by Markus Strohmaier)
Abstract ...
Today, the early World Wide Web of documents is evolving into a web of data, where data is stored in different kinds of databases, including unstructured, semi-structured and structured repositories. This talk explores the role of social computation, i.e. the combination of social behavior and algorithmic computation, for linked data. What distinguishes social-computational systems from other types of systems is the unprecedented involvement of data about user behavior, goals and motivations into the software system’s structure. What can be observed in social-computational systems is that the interaction between a user and the system is mediated by the aggregation of explicit or implicit data from other users. This is the case with systems where, for example, user data is used to suggest search terms (e.g. Google Autosuggest), to recommend products (e.g. Amazon recommendations), to aid navigation (e.g. tag-based navigation) or to filter content (e.g. Digg.com). This makes social-computational systems a novel class of software systems and unique in a sense that potentially essential system properties and functions are dynamically influenced by aggregate user behavior. In this talk, I will link social-computational concepts to the web of data, and illustrate my ideas with examples from current research projects.
10:00-13:00 Morning Tea and Coffee
User Modeling Combining Access Logs, Page Content and Semantics (full paper)
Blaz Fortuna, Dunja Mladenic, and Marko Grobelnik
Abstract ...
The paper proposes an approach to modeling users of large Web sites based on combining different data sources: access logs and content of the accessed pages are combined with semantic information about the Web pages, the users and the accesses of the users to the Web site. The assumption is that we are dealing with a large Web site providing content to a large number of users accessing the site. The proposed approach represents each user by a set of features derived from the different data sources, where some feature values may be missing for some users. It further enables user modeling based on the provided characteristics of the targeted user subset. The approach is evaluated on real-world data where we compare performance of the automatic assignment of a user to a predefined user segment when different data sources are used to represent the users.
Towards an Automated Query Modification Assistant (full paper)
Vera Hollink and Arjen De Vries
Abstract ...
Users who need several queries before finding what they need can benefit from an automatic search assistant that provides feedback on their query modification strategies. We present a method to learn from a search log which types of query modifications have and have not been effective in the past. The method analyses query modifications along two dimensions: a traditional term-based dimension and a semantic dimension, for which queries are enriches with linked data entities. Applying the method to the search logs of two search engines, we identify six opportunities for a query modification assistant to improve search: modification strategies that are commonly used, but that often do not lead to satisfactory results.
Mining User Comment Activity for Detecting Forum Spammers in YouTube (short paper)
Ashish Sureka
Abstract ...
Research shows that comment spamming (comments which are unsolicited, unrelated, abusive, hateful, commercial advertisements etc) in online discussion forums has become a common phenomenon in Web 2.0 applications and there is a strong need to counter or combat comment spamming. We present a method to automatically detect comment spammer in YouTube (largest and a popular video sharing website) forums. The proposed technique is based on mining comment activity log of a user and extracting patterns (such as time interval between subsequent comments, presence of exactly same comment across multiple unrelated videos) indicating spam behavior. We perform empirical analysis on data crawled from YouTube and demonstrate that the proposed method is effective for the task of comment spammer detection.
U-Sem: Semantic Enrichment, User Modeling and Mining Usage Data on the Social Web (short paper)
Fabian Abel, Ilknur Celik, Claudia Hauff, Laura Hollink and Geert-Jan Houben
Abstract ...
With the growing popularity of Social Web applications, more and more user data is published on the Web everyday. Our research focuses on investigating ways of mining data from such platforms that can be used for modeling users and for semantically augmenting user profiles. This process can enhance adaptation and personalization in various adaptive Web-based systems. In this paper, we present the U-Sem people modeling service, a framework for the semantic enrichment and mining of people's profiles from usage data on the Social Web. We explain the architecture of our people modeling service and describe its application in an adult e-learning context as an example.
12:00-13:30 Lunch
13:30-13:45 Introduction to the USEWOD data challenge
From Linked Data to Relevant Data - Time is the Essence (full paper)
Markus Kirchberg, Ryan Ko and Bu Sung Lee
Winner of the Best Paper Award, sponsored by LATC!
Abstract ...
The Semantic Web initiative puts emphasis not primarily on putting data on the Web, but rather on creating links in a way that both humans and machines can explore the Web of data. When such users access the Web, they leave a trail as Web servers typically maintain a history of requests. Web usage mining approaches, which try to determine what users are interested in, have been studied since the beginning of the Web given the log's huge potential for purposes such as resource annotation, personalization, forecasting etc. However, the impact of any such efforts has not really gone beyond generating statistics detailing who, when, and how Web pages maintained by a Web server were visited.
An Empirical Study of Real-World SPARQL Queries (challenge paper)
Mario Arias Gallego, Javier D. Fernández, Miguel A. Martínez-Prieto and Pablo De La Fuente
Winner of the Data Challange Award, sponsored by GridLine!
Abstract ...
Understanding how users tailor their SPARQL queries is crucial when designing query evaluation engines or fine-tuning RDF stores with performance in mind. In this paper we analyze 3 million real-world SPARQL queries extracted from logs of the DBPedia and SWDF public endpoints. We aim at finding which are the most used language elements both from syntactical and structural perspectives, paying special attention to triple patterns and joins, since they are indeed some of the most expensive SPARQL operations at evaluation phase. We have determined that most of the queries are simple and include few triple patterns and joins, being Subject-Subject, Subject-Object and Object-Object the most common join types. The graph patterns are usually star-shaped and despite triple pattern chains exist, they are generally short.
14:35-14:45 Announcement of challenge winner and best paper awards
14:45-15:00 Idea jam
15:00-15:30 Breakout session with coffee and tea
15:30-16:30 Usage data hacking session
16:30-17:00 Presentation of results and discussion

Workshop Overview and Goals

This workshop will investigate the synergy between semantics and semantic-web technology on the one hand and analysis and mining of usage data on the other hand. The two fields are a promising combination. First, semantics can be used to enhance the analysis of usage data. Usage logs contain information that can help to better understand users or to adapt a system to a user’s needs and preferences. Now that more and more explicit knowledge is represented on the Web, in the form of ontologies, folksonomies, or linked data, the question arises how these semantics can be used to aid large scale web usage analysis and mining. Second, usage data analysis can enhance semantic resources as well as Semantic Web applications. Traces of users can be used to evaluate, adapt or personalize Semantic Web applications. Since logs record real-life users, they provide an opportunity to create gold standards for search or recommendation tools. In addition, logs can form valuable resources from which knowledge (e.g., in the form of ontologies or thesauri) can be extracted bottom-up.

Also, the emerging Web of Data demands a re-evaluation of existing usage mining techniques; new ways of accessing information enabled by the Web of Data imply the need to develop or adapt algorithms, methods, and techniques to analyze and interpret the usage of Web data instead of Web pages. An important question at this time is how the Web of Data is being used: how are datasets being accessed by human users and how by machines, what kinds of queries are being performed, and what can we learn about the usage of semantic applications?

The primary goals of this workshop are to foment a new community of researchers from various fields sharing an interest in usage mining and semantics and to create a roadmap for future research in this direction.

Topics of Interest (not limited to)

We welcome work that shows how the research areas combine: how semantic resources and techniques can be used to strengthen usage data analysis and, vice versa, how usage data can enhance semantic tools and applications. Within these boundaries, we keep the scope broad. We welcome contributions using any form of semantic information, from formal ontologies to linked data and folksonomies. All records of user actions are considered usage logs; we do not limit ourselves to any format or method of collection of usage information. This ranges from traditional content-consumption logs to various forms of content-production logs, i.e. navigation, application-related transactions, queries, tagging, editing, and similar activities.

  • Analysis and mining of usage logs of semantic resources and applications.
  • Inferring semantic information from usage logs.
  • Methods and tools for semantic analysis of usage logs.
  • Representing and enriching usage logs with semantic information.
  • Usage-based evaluation methods and frameworks; gold standards for evaluation of semantic web applications.
  • Specifics and semantics of logs for content-consumption and content-creation.
  • Using semantics for recommendation, personalization and adaptation.
  • Usage-based recommendation, personalization and adaptation of semantic web applications.
  • Exploiting usage logs for semantic search.
  • Data sharing, privacy, and privacy-protecting policies and techniques.


We invite full paper submissions, as well as short papers and challenge papers (for more information about the challenge see here). Papers must consist of original, unpublished research and must not be under review by another conference, journal, or workshop. Authors of accepted submissions will be invited to present their work at the workshop, and at least one author of each paper must register for the workshop.

Format requirements for the submission of papers are:

  • Full Papers: max. 8 pages
  • Short Papers: max. 4 pages
  • Challenge Papers: max. 4 pages

All accepted papers will be included in the online workshop proceedings, which will be published via CEUR. All papers must be prepared in ACM format.

Metadata about all papers, including title, abstract, authors and author affilitions, will also be made available publicly at http://data.semanticweb.org.

To submit a paper, please log on to the USEWOD2011 page on EasyChair.

A copy of the call for papers is available here.

There will also be a best paper award, kindly sponsored by the LATC project!

USEWOD 2011 Data Challenge

In addition to regular papers, we will release a dataset large dataset (several GB) of usage data (server log files) from two major Linked Open Data sources: Semantic Web Dog Food (data.semanticweb.org) and DBpedia (dbpedia.org). Participants are invited to present interesting analyses, applications, alignments, etc. for these datasets, and to submit their findings as a Data Challenge paper. The best Data Challenge paper will get a prize. For more information, check out the Data Challenge Page.

The USEWOD 2011 data challenge is sponsored by GridLine.

Important Dates

February 15th, 2011   Submission deadline (extended!)
February 8th, 2011   Submission deadline
March 2nd, 2011   Acceptance notification
March 18th, 2011   Camera-ready Version
March 28th, 2011   Workshop


Organising Committee (alphabetically)

Programme Committee (alphabetically)

  • Chris Bizer, Freie Universität Berlin, DE
  • Pablo Castels, Universidad Autónonoma de Madrid , ES
  • Stefan Gradmann, Humboldt Universität Berlin, DE
  • Paul Groth, VU Amsterdam, NL
  • Tom Heath, Talis, UK
  • Michiel Hildebrand, VU Amsterdam, NL
  • Andreas Hotho, University of Kassel, DE
  • Geert-Jan Houben, Delft University of Technology, NL
  • Eero Hyvönen, University of Helsinki, FI
  • Antoine Isaac, Europeana, NL
  • Hideo Joho, University of Tsukuba, JP
  • Jaap Kamps, University of Amsterdam, NL
  • Yiannis Kompatsiaris, Informatics and Telematics Institute, GR
  • Wessel Kraaij, TNO, NL
  • Ruben Lara, Telefónica I+D, ES
  • Peter Mika, Yahoo! Research Barcelona, ES
  • Dunja Mladenic, Jozef Stefan Institute, SI
  • Johan Oomen, Netherlands Institute for Sound and Vision, NL
  • Jacco van Ossenbruggen, Centre for Mathematics and Computer Science (CWI), NL
  • Maarten de Rijke, Universiteit van Amsterdam, NL
  • Marta Sabou, MODUL University Vienna, AT
  • Evan Sandhaus, NY Times, USA
  • Guus Schreiber, VU Amsterdam, NL
  • Fabrizio Silvestri, Istituto di Scienze e Tecnologie dell'Informazione, IT
  • Markus Strohmaier, Graz University of Technology, AT
  • Theodora Tsikrika, Centre for Mathematics and Computer Science (CWI), NL
  • Arjen de Vries, Delft University of Technology, NL
// //