[SemPub17]:Call for Challenge (Semantic Publishing Challenge 2017)
sahar vahdati
vahdati at iai.uni-bonn.de
Mi Feb 1 23:02:30 CET 2017
ESWC 2017 Call for Challenge: Semantic Publishing Challenge 2017
==== Call for Challenge: Semantic Publishing ====
Challenge Website: https://github.com/ceurws/lod/wiki/SemPub2017
Challenge hashtag: #SemPub2017, #SemPub
Challenge Chairs:
*
Angelo Di Iorio (University of Bologna, IT)
*
Anastasia Dimou (Ghent University / imec, BE)
*
Christoph Lange (EIS, University of Bonn / Fraunhofer IAIS, DE)
*
Sahar Vahdati (EIS, University of Bonn, DE)
Challenge Coordinators:
*
Mauro Dragoni, Fondazione Bruno Kessler, IT)
*
Monica Solanki (University of Oxford, UK)
14th Extended Semantic Web Conference (ESWC) 2017
Dates: May 28th - June 1st, 2017
Venue: Portorož, Slovenia
Hashtag: #eswc2017
Feed: @eswc_conf
Site: http://2017.eswc-conferences.org
General Chair: Eva Blomqvist (Linköping University, SE)
MOTIVATION AND OBJECTIVES
As in 2016, 2015 and 2014, the goal is to facilitate measuring the
excellence of papers, people and scientific venues by data analysis.
Instead of considering publication venues as single and independent
units, we focus on their explicit and implicit connections, interlinking
and evolution. We achieve that thanks to the primary data source we are
using, which is highly relevant for computer science: the CEUR-WS.org
workshop proceedings, which have accumulated 1,800 proceedings volumes
with around 30,000 papers over 20 years and thus covers the majority of
workshops in computer science. We go beyond the tasks of the 2016
challenge in two ways: (1) refining and extending the set of
quality-related data to be extracted and (2) linking and exploiting
existing Linked Open Data sources about authors, publications, topics,
events and communities. The best data produced in the 2017 challenge
will be published at CEUR-WS.org or in a separate LOD, interlinked to
the official CEUR-WS.org LOD and to the whole Linked Open Data Cloud.
DATASET
The primary dataset used is the Linked Open Datasetthat has been
extracted from the CEUR-WS.org workshop proceedings (HTML tables of
content and PDF papers) using the extraction tools winning the previous
Challenges, plus its full original PDF source documents (for extracting
further information). The most recent workshop proceedings metadata have
explicitly been released under the CC0 open data license; for the older
proceedings CEUR-WS.org has the permission to make that data accessible.
In addition to the primary dataset, we use (as linking targets) existing
Linked Open Datasets containing related information: Springer recently
announced computer science proceedings LOD, the brand-new LOD of
OpenAIRE including all EU-funded open access publications, Springer LD,
DBLP, ScholarlyData(a refactoring of the Semantic Web Dog Food),
COLINDA, and further datasets available under open licenses.
The evaluation dataset will comprise a dataset of around 100 selected
PDF full-text papers from these workshops. Like last year, the training
dataset will be distinct from the evaluation dataset, as well as the
expected results of queries against this subset. Both datasets will
respect the diversity of the CEUR-WS.org workshop proceedings volumes
with regard to content structure and quality.
TASKS
Our challenge invites submissions in one or more out of three tasks,
which are independent from each other but are conceptually connected by
taking into account increasingly more contextual information. Some tasks
include sub-tasks but participants will compete in a task as a whole.
They are encouraged to address all sub-tasks (even partially) to
increase their chance to win.
Task 1: Extracting information from the tables in papers
Participants are required to extract information from the tables of the
papers (in PDF). Extracting content from tables is a difficult task,
which has been tackled by different researchers in the past. Our focus
is on tables in scientific papers and solutions for re-publishing
structured data as LOD. Tables will be collected from CEUR-WS.org
publications and participants will be required to identify their
structure and content. The task then will require PDF mining and data
processing techniques.
Task 2: Extracting information from the full text of the papers
Participants are required to extract information from the textual
content of the papers (in PDF). That information should describe the
organization of the paper and should provide a deeper understanding of
the content and the context in which it was written. In particular, the
extracted information is expected to answer queries about the internal
organization of sections, tables, figures and about the authors’
affiliations and research institutions. The task mainly requires PDF
mining techniques and some NLP processing.
Task 3: Interlinking
Participants are required to interlink the CEUR-WS.org linked dataset
with relevant datasets already existing in the LOD Cloud. Task 3 can be
accomplished as an entity interlinking/instance matching task that aims
to address both interlinking data from the output of the other tasks as
well as interlinking the CEUR-WS.org linked dataset – as produced in
previous editions of this challenge – to external datasets. Moreover, as
triples are generated from different sources and due to different
activities, tracking provenance information becomes increasingly important.
EVALUATION
In each task, the participants will be asked to refine and extend the
initial CEUR-WS.org Linked Open Dataset, by information extraction or
link discovery, i.e. they will produce an RDF graph. To validate the RDF
graphs produced, a number of queries in natural language will be
specified, and their expected results in CSV format. Participants are
asked to submit both their dataset and the translation of the input
(natural language queries) to work on that dataset. A few days before
the deadline, a set of queries will be specified and be used for the
final evaluation. Participants are asked then to run these queries on
their dataset and to submit the produced output in CSV. Precision,
recall and F-measure will be calculated by comparing each query’s result
set with the expected query result from a gold standard built manually.
Participants’ overall performance in a task will be defined as the
average F-measure over all queries of the task, with all queries having
equal weight. For computing precision and recall, the same automated
tool as for previous SemPub challenges will be used; this tool will be
publicly available during the training phase. We reserve the right to
disqualify participants whose dataset dumps are different from what
their information extraction tools create from the source data, who are
not using the core vocabulary, or whose SPARQL queries implement
something different from the natural language queries given in the task
definitions. The winners of each task will be awarded as last year.
TARGET AUDIENCE
The Challenge is open to people from industry and academia with diverse
expertise which could participate in all tasks, or focus on specific
ones. Task 1 and 2 address an audience with a background in mapping,
information extraction, information retrieval and NLP and invites the
previous years’ participants to refine their tools, as well as new
teams. Task 3 additionally addresses the wide interlinking audience,
without excluding in the same time other participants to participate in
the challenge. Task 3 invites new participants as well as participants
from Tasks 1 and 2.
FEEDBACK AND DISCUSSION
A discussion group is open for participants to ask questions and to
receive updates about the challenge: sempub-challenge at googlegroups.com
<mailto:sempub-challenge at googlegroups.com>. Participants are invited to
subscribe to this group as soon as possible and to communicate their
intention to participate. They are also invited to use this channel to
discuss problems in the input dataset and to suggest changes.
HOW TO PARTICIPATE
Participants are first required to submit:
* Abstract: no more than 200 words.
* Description: It should explain the details of the automated annotation
system, including why the system is innovative, how it uses Semantic Web
technology, what features or functions the system provides, what design
choices were made and what lessons were learned. The description should
also summarize how participants have addressed the evaluation tasks. An
outlook towards how the data could be consumed is appreciated but not
strictly required. The description should be submitted as a 5 pages
document.
If accepted, the participants are invited to submit their task results.
In this second phase they are required to submit:
* The Linked Open Dataset produced by their tool on the evaluation
dataset (as a file or as a URL, in Turtle or RDF/XML).
* A set of SPARQL queries that work on that LOD and correspond to the
natural language queries provided as input
* The output of these SPARQL queries on the evaluation dataset (in CSV
format)
Accepted papers will be included in the Conference USB stick. After the
conference, participants will be able to add data about the evaluation
and to finalize the camera-ready for the final proceedings.
The final papers must not exceed 15 pages in length.
Papers must be submitted in PDF format, following the style of the
Springer's Lecture Notes in Computer Science (LNCS) series
(http://www.springer.com/computer/lncs/lncs+authors). Submissions in
semantically structured HTML, e.g. in the RASH
(http://cs.unibo.it/save-sd/rash/documentation/index.html), or dokieli
(https://github.com/linkeddata/dokieli) formats are also accepted as
long as the final camera-ready version conforms to Springer's
requirements (LaTeX/Word + PDF).
Further submission instructions will be published on the challenge wiki
if required.
All submissions should be provided via the submission system
https://easychair.org/conferences/?conf=sempub17.
NOTE: At least one author per accepted submission will have to register
for the ESWC Conference, in order to be eligible for the prizes and to
include the paper in the proceedings.
JUDGING AND PRIZES
After the first round of review, the Program Committee and the chairs
will select a number of submissions conforming to the challenge
requirements that will be invited to present their work. Submissions
accepted for presentation will receive constructive reviews from the
Program Committee, they will be included in the Springer
post-proceedings of ESWC.
Six winners will be selected from those teams who participate in the
challenge at ESWC. For each task we will select:
* best performing tool, given to the paper which will get the highest
score in the evaluation
* best paper, selected by the Program and Challenge Committee
IMPORTANT DATES
* January 29, 2017: Publication of tasks, rules and queries description
* January 29, 2017: Publication of the training dataset
* February 10, 2017: Publication of the evaluation tool
* March 10, 2017: Paper submission (5 page document)
* April 7, 2017: Notification and invitation to submit task results
* April 7, 2017: Test data (and other participation tools) published
* April 23, 2017: Conference camera-ready papers submission (5 page
document)
* May 11, 2017: Publication of the evaluation dataset details
* May 13, 2017: Results submission
* May 30 - June 1: Challenge days
*June 30, 2017: Camera ready paper for challenges post proceedings (12
pages document)
-------------- nächster Teil --------------
Ein Dateianhang mit HTML-Daten wurde abgetrennt...
URL: <https://lists.tu-clausthal.de/cgi-bin/mailman/private/ifi-ci-event/attachments/20170201/6865e1ab/attachment.html>
Mehr Informationen über die Mailingliste IFI-CI-Event