ESWC 2016 Call for Challenge: Semantic Publishing Challenge 2016

Mi Feb 24 09:47:56 CET 2016

ESWC 2016 Call for Challenge: Semantic Publishing 
Challenge 2016
** apologies for cross-posting **

==== Call for Challenge: Semantic Publishing ====

Challenge Website: 
https://github.com/ceurws/lod/wiki/SemPub2016
Challenge hashtag: #SemPub2016
Challenge Chairs:
- Angelo Di Iorio (Department of Computer Science and 
Engineering, University of Bologna, IT)
- Anastasia Dimou (Data Science Lab, Ghent University, BE)
- Christoph Lange (Enterprise Information Systems, 
University of Bonn / Fraunhofer IAIS, DE)
- Sahar Vahdati (Enterprise Information Systems, 
University of Bonn, DE)
Challenge Coordinator: Stefan Dietze (L3S, Germany) 
and Anna Tordai (Elsevier, Netherlands)

13th Extended Semantic Web Conference (ESWC) 2016
Dates: May 29th - June 2nd, 2016
Venue: Heraklion, Crete, Greece
Hashtag: #eswc2016
Feed: @eswc_conf
Site: http://2016.eswc-conferences.org
General Chair: Harald Sack (Hasso Plattner Institute 
(HPI), Germany)

MOTIVATION AND OBJECTIVES

This is the next iteration of the successful Semantic 
Publishing Challenge of ESWC 2014 and 2015. We continue 
pursuing the objective of assessing the quality of 
scientific output, evolving the dataset bootstrapped in 
2014 and 2015 to take into account the wider ecosystem of 
publications. To achieve that, this years challenge 
focuses on refining and enriching an existing linked open 
dataset about workshops, their publications and their 
authors. Aspects of refining and enriching include 
extracting deeper information from the HTML and PDF 
sources of the workshop proceedings volumes and enriching 
this information with knowledge from existing datasets. 
Thus, a combination of broadly investigated technologies 
in the Semantic Web field, such as Information Extraction 
(IE), Natural Language Processing (NLP), Named Entity 
Recognition (NER), link discovery, etc., is required to 
deal with the challenges tasks.

TARGET AUDIENCE
The Challenge is open to everyone from industry and 
academia.

TASKS
We ask challengers to automatically annotate a set of 
multi-format input documents and to produce a LOD that 
fully describes these documents, their context, and 
relevant parts of their content. The evaluation will 
consist of evaluating a set of queries against the 
produced dataset to assess its correctness and 
completeness. The primary input dataset is the LOD that 
has been extracted from the CEURWS.org workshop 
proceedings using the winning extraction tools of the 2014 
and 2015 challenges, plus its full original HTML and PDF 
source documents. In addition, the challenge uses (as 
linking targets) existing LOD on scholarly publications. 
The input dataset will be split in two parts: a training 
dataset and an evaluation dataset, which will disclosed a 
few days before the submission deadline. Participants will 
be asked to run their tool on the evaluation dataset and 
to produce the final Linked Dataset and the output of the 
queries on that dataset.

The Challenge includes three tasks:

= Task 1: Extraction and assessment of workshop 
proceedings information in HTML =
Participants are required to extract information from a 
set of HTML tables of contents published in CEUR-WS.org 
workshop proceedings. The extracted information is 
expected to answer queries about the quality of these 
workshops, for instance by measuring growth, longevity, 
etc. The task is an extension of the Task 1 of the 2014 
and 2015 Challenge: we will reuse the most challenging 
quality indicators from last years challenge, others will 
be defined more precisely, others will be completely new. 
Last years results, with an F-measure of 0.66 in 2015 and 
0.64 in 2014 for the winning solutions, show improvement 
but there is a lot of room for ameliorating information 
extraction.

= Task 2: Extracting information from the PDF full text of 
the papers =
Participants are required to extract information from the 
textual content of the papers (in PDF). That information 
should describe the organization of the paper and should 
provide a deeper understanding of the context in which it 
was written. In particular, the extracted information is 
expected to answer queries about the internal organization 
of sections, tables, figures and about the authors 
affiliations and research institutions, and fundings 
source. The task mainly requires PDF mining techniques and 
some NLP processing.

= Task 3: Interlinking =
Participants are required to interlink the CEUR-WS.org 
linked dataset with relevant datasets already existing in 
the LOD cloud. Task 3 can be accomplished as an entity 
interlinking/instance matching task that aims to address 
both interlinking data from the output of the other tasks 
as well as interlinking CEUR-WS.org linked dataset to 
external datasets. Moreover, as triples are generated from 
different sources and due to different activities, 
tracking provenance information becomes increasingly 
important.

EVALUATION
In each task, the participants will be asked to refine and 
extend the initial CEUR-WS.org Linked Open Dataset, by 
information extraction or link discovery, i.e. they will 
produce an RDF graph. To validate the RDF graphs produced, 
a number of queries in natural language will be specified, 
and their expected results in CSV format. Participants are 
asked to submit both their dataset and the translation of 
the input (natural language queries) to work on that 
dataset. A few days before the deadline, a set of query 
will be specified and be used for the final evaluation. 
Participants are asked then to run these queries on their 
dataset and to submit the produced output in CSV. 
Precision, recall and F-measure will be calculated by 
comparing each querys result set with the expected query 
result from a gold standard built manually. Participants 
overall performance in a task will be defined as the 
average F-measure over all queries of the task, with all 
queries having equal weight. For computing precision and 
recall, an automated tool developed for the 2015 challenge 
will be used; this tool will be publicly available during 
the training phase.

FEEDBACK AND DISCUSSION
A discussion group is open for participants to ask 
questions and to receive updates about the challenge: 
mailto:sempub-challenge at googlegroups.com. Participants are 
invited to subscribe to this group as soon as possible and 
to communicate their intention to participate. They are 
also invited to use this channel to discuss problems in 
the input dataset and to suggest changes.

HOW TO PARTICIPATE
Participants are required to submit:
* Abstract: no more than 200 words.
* Description: It should explain the details of the 
automated annotation system, including why the system is 
innovative, how it uses Semantic Web technology, what 
features or functions the system provides, what design 
choices were made and what lessons were learned. The 
description should also summarize how participants have 
addressed the evaluation tasks. An outlook towards how the 
data could be consumed is appreciated but not strictly 
required. Papers must be submitted in PDF format, 
following the style of the Springer's Lecture Notes in 
Computer Science (LNCS) series 
(http://www.springer.com/computer/lncs/lncs+authors), and 
not exceeding 12 pages in length. Submissions in RASH 
format 
(http://cs.unibo.it/save-sd/rash/documentation/index.html) 
and Linked Research 
(https://github.com/csarven/linked-research) are also 
accepted as long as the final camera-ready version 
conforms to Springer's requirements.
* The Linked Open Dataset produced by their tool on the 
evaluation dataset (as a file or as a URL, in Turtle or 
RDF/XML).
* A set of SPARQL queries that work on that LOD and 
correspond to the natural language queries provided as 
input
* The output of these SPARQL queries on the evaluation 
dataset (in CSV format)

Participants will also be asked to submit their tool 
(source and/or binaries, or a link these can be downloaded 
from, or a web service URL) for verification purposes. 
Further submission instructions will be published on the 
challenge wiki.

All submissions should be provided via the submission 
system linked from the homepage.

JUDGING AND PRIZES
After a first round of review, the Program Committee and 
the chairs will select a number of submissions conforming 
to the challenge requirements that will be invited to 
present their work. Submissions accepted for presentation 
will receive constructive reviews from the Program 
Committee, they will be included in the Springer CCIS 
series. The selection of the best challenge papers will be 
published in the Satellite Event proceedings (a separate 
Springer LNCS Volume) of ESWC2016.

Six winners will be selected. For each task we will 
select:
* best performing tool, given to the paper which will get 
the highest score in the evaluation
* most original approach, selected by the Challenge 
Committee with the reviewing process

IMPORTANT DATES
* January 20, 2016: Publication of the full description of 
tasks, rules and queries; publication of the training 
dataset
* February 28, 2016: Publication of the evaluation tool
* March 11, 2016: Paper submission
* March 31, 2016: Deadline for making remarks to the 
training dataset and the evaluation tool
* April 8, 2016: Notification and invitation to submit 
task results;
* April 24, 2016: Conference camera-ready
* May 11, 2016: Publication of the evaluation dataset 
details
* May 13, 2016: Results submission
* May 29 - June 2, 2016: Challenge days

NOTE: Accepted papers will be included in the Conference 
USB stick. After the conference, participants will be able 
to add data about the evaluation and to finalize the 
camera-ready for the final proceedings.

PROGRAM COMMITTEE
* Aliaksandr Birukou, Springer Verlag, Heidelberg, Germany
* Lukasz Bolikowski, University of Warsaw, Poland
* Kai Eckert, University of Mannheim, Germany
* Maxim Kolchin, ITMO University, SaintPetersburg, Russia
* Phillip Lord, Newcastle University, UK
* Philipp Mayr, hissing, Germany
* Jodi Schneider, University of Pittsburgh, USA
* Selver Softic, Graz University of Technology, Austria
* Ruben Verborgh, Ghent university  iMinds
* Michael Wagner, Schloss Dagstuhl, Leibniz Center for 
computer science, German

We are inviting further members.