[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[InetBib] CFP 3rd Workshop on Scholarly Document Processing: SDP@COLING2022
- Date: Fri, 25 Mar 2022 20:07:53 +0000
- From: "Mayr-Schlegel, Philipp via InetBib" <inetbib@xxxxxxxxxx>
- Subject: [InetBib] CFP 3rd Workshop on Scholarly Document Processing: SDP@COLING2022
Dear colleagues,
You are invited to participate in the 3rd Workshop on Scholarly Document
Processing (SDP 2022) to be held at COLING 2022 (October 12-17, 2022). The SDP
2022 workshop will consist of a Research track and six Shared Tasks. The call
for research papers is described below, and more details can be found on our
website, http://www.sdproc.org/.
Papers must follow the COLING format and conform to the COLING Submission
Guidelines.
The paper submission site will be provided on the workshop website shortly.
Website: http://www.sdproc.org/
Twitter: https://twitter.com/sdproc
Mailing list: https://groups.google.com/g/sdproc-updates
CfP: https://sdproc.org/2022/cfp.html
** Call for Research Papers **
Introduction
Although scientific literature plays a major part in research and
policy-making, these texts represent an underserved area of NLP. NLP can play a
role in addressing research information overload, identifying disinformation
and its effect on people and society, and enhancing the reproducibility of
science. The unique challenges of processing scholarly documents necessitate
the development of specific methods and resources optimized for this domain.
The Scholarly Document Processing (SDP) workshop provides a venue for
discussing these challenges and bringing together stakeholders from different
communities including computational linguistics, text mining, information
retrieval, digital libraries, scientometrics, and others to develop and present
methods and resources in support of these goals.
This workshop builds on the success of prior workshops: the 1st SDP workshop
held at EMNLP 2020, the 2nd SDP workshop held at NAACL 2021 and the 1st and 2nd
SciNLP workshops held at AKBC 2020 and 2021. In addition to having broad appeal
within the NLP community, we hope the SDP workshop will attract researchers
from other relevant fields including meta-science, scientometrics, data mining,
information retrieval, and digital libraries, bringing together these disparate
communities within ACL.
Topics of Interest
We invite submissions from all communities demonstrating usage of and
challenges associated with natural language processing, information retrieval,
and data mining of scholarly and scientific documents. Relevant tasks include
(but are not limited to):
* Representation learning
* Information extraction
* Summarization
* Language generation
* Question answering
* Discourse modeling and argumentation mining
* Network analysis
* Bibliometrics, scientometrics, and altmetrics
* Reproducibility
* Peer review
* Search and indexing
* Datasets and resources
* Document parsing
* Text mining
* Research infrastructure and others.
We specifically invite research on important and/or underserved areas, such as:
* Identifying/mitigating scientific disinformation and its effects on
public policy and behavior
* Reducing information overload through summarization and aggregation of
information within and across documents
* Improving access to scientific papers through multilingual scholarly
document processing
* Improving research reproducibility by connecting scientific claims to
evidence such as data, software, and cited claims
** Submission Information **
Authors are invited to submit full and short papers with unpublished, original
work. Submissions will be subject to a double-blind peer-review process.
Accepted papers will be presented by the authors at the workshop either as a
talk or a poster. All accepted papers will be published in the workshop
proceedings (proceedings from previous years can be found here:
https://aclanthology.org/venues/sdp/).
The submissions must be in PDF format and anonymized for review. All
submissions must be written in English and follow the COLING 2022 formatting
requirements: https://coling2022.org/Cpapers
We follow the same policies as COLING 2022 regarding preprints and
double-submissions. The anonymity period for SDP 2022 is from June 13 to August
22.
Long paper submissions: up to 9 pages of content, plus unlimited references.
Short paper submissions: up to 4 pages of content, plus unlimited references.
Final versions of accepted papers will be allowed 1 additional page of content
so that reviewer comments can be taken into account.
More details about submissions are available on our website:
http://www.sdproc.org/. To receive updates, please join our mailing list:
https://groups.google.com/g/sdproc-updates or follow us on Twitter:
https://twitter.com/sdproc
** Important Dates (Main Research Track) **
All paper submissions due - July 11, 2022
Notification of acceptance - August 22, 2022
Camera-ready papers due - September 5, 2022
Workshop - October 16/17, 2022
** SDP 2022 Keynote Speakers **
We are excited to have several keynote speakers at SDP 2022. The following
speakers have been confirmed (others will be announced later).
1. Min Yen-Kan, NUS, Singapore (https://www.comp.nus.edu.sg/~kanmy/)
2. Sophia Ananiadou, University of Manchester, UK who will discuss her
recent work on uncertainty and negation, summarisation and citation graphs
(https://www.research.manchester.ac.uk/portal/sophia.ananiadou.html)
3. TBA
** SDP 2022 Shared Tasks **
SDP 2022 will host six exciting shared tasks. More information about all shared
tasks is provided on the workshop website:
https://sdproc.org/2022/sharedtasks.html<https://sdproc.org/2022/sharedtasks.html#skgg>
Each shared task will follow-up with a separate CfP.
Multi Perspective Scientific Document Summarization:
Generating summaries of scientific documents is known to be a challenging task.
Majority of existing work in summarization assumes only one single best gold
summary for each given document. Having only one gold summary negatively
impacts our ability to evaluate the quality of summarization systems as writing
summaries is a subjective activity. At the same time, annotating multiple gold
summaries for scientific documents can be extremely expensive as it requires
domain experts to read and understand long scientific documents. This shared
task will enable exploring methods for generating multi-perspective summaries.
We introduce a novel summarization corpus, leveraging data from scientific peer
reviews to capture diverse perspectives from the reader's point of view. More
information coming soon at: https://github.com/guyfe/Mup
LongSumm 2022: Generation of Long Summaries for Scientific Documents.
Most of the work on scientific document summarization focuses on generating
relatively short summaries. Such a short summary resembles an abstract and
cannot cover all the salient information conveyed in a given scientific text.
Writing longer summaries requires expertise and a deep understanding in a
scientific domain, as can be found in some researchers blogs. This shared task
leverages blog posts created by researchers in the NLP and Machine learning
communities that summarize scientific articles and use these posts as reference
summaries. The corpus for this task includes a training set that consists of
1705 extractive summaries, and 531 abstractive summaries of NLP and Machine
Learning scientific papers.
More information at: https://github.com/guyfe/LongSumm
SV-Ident 2022: Survey Variable Identification in Social Science Publications.
In this shared task, we focus on concepts specific to social science
literature, namely survey variables. Survey variable mention identification in
texts can be seen as a multi-label classification problem: Given a sentence in
a document, and a list of unique variables (from a reference vocabulary of
survey variables), the task is to classify which variables, if any, are
mentioned in each sentence. This task is organized by the VAriable Detection,
Interlinking, and Summarization (VADIS) project. Further details:
https://vadis-project.github.io/sv-ident-sdp2022/
MSLR 2022: Multi-document summarization for medical literature reviews
In the context of medicine, systematic literature reviews constitute the
highest-quality evidence used to inform clinical care. However, reviews are
expensive to produce manually; (semi-)automation via NLP may facilitate faster
evidence synthesis without sacrificing rigor. Toward this end, we are running a
shared task to study the generation of multi-document summaries in this domain.
We make use of two datasets: 1) MS^2: consisting of 20k reviews (citing 470K
studies) from the biomedical literature (https://github.com/allenai/ms2), and
2) Cochrane Conclusions: derived from over 4500 Cochrane reviews
(https://github.com/bwallace/RCT-summarization-data). Each submission is judged
against a gold review summary on the ROUGE score and by the
evidence-inference-based divergence metric defined in the MS^2 paper. We also
encourage contributions that extend this task and dataset, e.g., by proposing
scaffolding tasks, methods for model interpretability, and especially, improved
automated evaluation methods in this domain. More information:
https://sdproc.org/2022/sharedtasks.html#mslr
Scholarly Knowledge Graph Generation
With the demise of the widely used Microsoft Academic Graph (MAG) at the end of
2021, the scholarly document processing community is facing a pressing need to
replace MAG with an open-source community supported service. A number of
challenging data processing tasks are essential for a scalable creation of a
comprehensive scholarly graph, i.e., a graph of entities involving but not
limited to research papers, their authors, research organizations, and research
themes. This shared task will evaluate three key sub-tasks involved in the
generation of a scholarly graph: 1) document deduplication, i.e. identifying
and linking different versions of the same scholarly document, 2) extracting
research themes, and 3) affiliation mining, i.e., linking research papers or
their metadata to the organizational entities that produced them. Test and
evaluation data will be supplied by the CORE aggregator (https://core.ac.uk/).
Pre-register your team here: https://forms.gle/7nduU6meseEpv9i69 And we'll keep
you posted with competition updates and timelines. More information:
https://sdproc.org/2022/sharedtasks.html#skgg
DAGPap22: Detecting automatically generated scientific papers
There are increasing reports that research papers can be written by computers,
which presents a series of concerns. In this challenge, we explore the state of
the art in detecting automatically generated papers. We frame the detection
problem as a binary classification task: given an excerpt of text, label it as
either human-written or machine-generated. To this end, we will provide a
corpus of automatically written papers, as well as documents collected by our
publishing and editorial teams. As a control, we will provide a corpus of
openly accessible human-written papers from the same scientific domains of
documents. We also encourage contributions that aim to extend this dataset with
other computer-generated scientific papers, or papers that propose valid
metrics to assess automatically generated papers against those written by
humans.
More information will be made available at
https://sdproc.org/2022/sharedtasks.html#dagpap
** Organizing Committee **
Arman Cohan, Allen Institute for AI, Seattle, USA
Guy Feigenblat, Piiano, Israel
Dayne Freitag, SRI International, San Diego, USA
Tirthankar Ghosal, Charles University, Czech Republic
Drahomira Herrmannova, Elsevier, USA
Petr Knoth, Open University, UK
Kyle Lo, Allen Institute for AI, Seattle, USA
Philipp Mayr, GESIS -- Leibniz Institute for the Social Sciences, Germany
Robert M. Patton, Oak Ridge National Laboratory, USA
Michal Shmueli-Scheuer, IBM Research AI, Haifa Research Lab, Israel
Anita de Waard, Elsevier, USA
Lucy Lu Wang, Allen Institute for AI, Seattle, USA
--
Dr. Philipp Mayr
Team Leader Information & Data Retrieval
GESIS - Leibniz Institute for the Social Sciences
Unter Sachsenhausen 6-8, D-50667 Köln, Germany
Tel: + 49 (0) 221 / 476 94 -533
Email: philipp.mayr@xxxxxxxxx<mailto:philipp.mayr@xxxxxxxxx>
Web: http://www.gesis.org<http://www.gesis.org/>
Listeninformationen unter http://www.inetbib.de.