[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[InetBib] CFP 3rd Workshop on Scholarly Document Processing: SDP@COLING2022

Dear colleagues,

You are invited to participate in the 3rd Workshop on Scholarly Document 
Processing (SDP 2022) to be held at COLING 2022 (October 12-17, 2022). The SDP 
2022 workshop will consist of a Research track and six Shared Tasks. The call 
for research papers is described below, and more details can be found on our 
website, http://www.sdproc.org/.

Papers must follow the COLING format and conform to the COLING Submission 

The paper submission site will be provided on the workshop website shortly.

Website: http://www.sdproc.org/

Twitter: https://twitter.com/sdproc

Mailing list: https://groups.google.com/g/sdproc-updates

CfP: https://sdproc.org/2022/cfp.html

** Call for Research Papers **


Although scientific literature plays a major part in research and 
policy-making, these texts represent an underserved area of NLP. NLP can play a 
role in addressing research information overload, identifying disinformation 
and its effect on people and society, and enhancing the reproducibility of 
science. The unique challenges of processing scholarly documents necessitate 
the development of specific methods and resources optimized for this domain. 
The Scholarly Document Processing (SDP) workshop provides a venue for 
discussing these challenges and bringing together stakeholders from different 
communities including computational linguistics, text mining, information 
retrieval, digital libraries, scientometrics, and others to develop and present 
methods and resources in support of these goals.

This workshop builds on the success of prior workshops: the 1st SDP workshop 
held at EMNLP 2020, the 2nd SDP workshop held at NAACL 2021 and the 1st and 2nd 
SciNLP workshops held at AKBC 2020 and 2021. In addition to having broad appeal 
within the NLP community, we hope the SDP workshop will attract researchers 
from other relevant fields including meta-science, scientometrics, data mining, 
information retrieval, and digital libraries, bringing together these disparate 
communities within ACL.

Topics of Interest

We invite submissions from all communities demonstrating usage of and 
challenges associated with natural language processing, information retrieval, 
and data mining of scholarly and scientific documents. Relevant tasks include 
(but are not limited to):

  *   Representation learning
  *   Information extraction
  *   Summarization
  *   Language generation
  *   Question answering
  *   Discourse modeling and argumentation mining
  *   Network analysis
  *   Bibliometrics, scientometrics, and altmetrics
  *   Reproducibility
  *   Peer review
  *   Search and indexing
  *   Datasets and resources
  *   Document parsing
  *   Text mining
  *   Research infrastructure and others.

We specifically invite research on important and/or underserved areas, such as:

  *   Identifying/mitigating scientific disinformation and its effects on 
public policy and behavior
  *   Reducing information overload through summarization and aggregation of 
information within and across documents
  *   Improving access to scientific papers through multilingual scholarly 
document processing
  *   Improving research reproducibility by connecting scientific claims to 
evidence such as data, software, and cited claims

** Submission Information **

Authors are invited to submit full and short papers with unpublished, original 
work. Submissions will be subject to a double-blind peer-review process. 
Accepted papers will be presented by the authors at the workshop either as a 
talk or a poster. All accepted papers will be published in the workshop 
proceedings (proceedings from previous years can be found here: 

The submissions must be in PDF format and anonymized for review. All 
submissions must be written in English and follow the COLING 2022 formatting 
requirements: https://coling2022.org/Cpapers

We follow the same policies as COLING 2022 regarding preprints and 
double-submissions. The anonymity period for SDP 2022 is from June 13 to August 

Long paper submissions: up to 9 pages of content, plus unlimited references.

Short paper submissions: up to 4 pages of content, plus unlimited references.

Final versions of accepted papers will be allowed 1 additional page of content 
so that reviewer comments can be taken into account.

More details about submissions are available on our website: 
http://www.sdproc.org/. To receive updates, please join our mailing list:   
https://groups.google.com/g/sdproc-updates or follow us on Twitter: 

** Important Dates (Main Research Track) **

All paper submissions due - July 11, 2022

Notification of acceptance - August 22, 2022

Camera-ready papers due - September 5, 2022

Workshop - October 16/17, 2022

** SDP 2022 Keynote Speakers **

We are excited to have several keynote speakers at SDP 2022. The following 
speakers have been confirmed (others will be announced later).

  1.  Min Yen-Kan, NUS, Singapore (https://www.comp.nus.edu.sg/~kanmy/)
  2.  Sophia Ananiadou, University of Manchester, UK who will discuss her 
recent work on uncertainty and negation, summarisation and citation graphs 
  3.  TBA

** SDP 2022 Shared Tasks **

SDP 2022 will host six exciting shared tasks. More information about all shared 
tasks is provided on the workshop website: 
 Each shared task will follow-up with a separate CfP.

Multi Perspective Scientific Document Summarization:

Generating summaries of scientific documents is known to be a challenging task. 
Majority of existing work in summarization assumes only one single best gold 
summary for each given document. Having only one gold summary negatively 
impacts our ability to evaluate the quality of summarization systems as writing 
summaries is a subjective activity. At the same time, annotating multiple gold 
summaries for scientific documents can be extremely expensive as it requires 
domain experts to read and understand long scientific documents. This shared 
task will enable exploring methods for generating multi-perspective summaries. 
We introduce a novel summarization corpus, leveraging data from scientific peer 
reviews to capture diverse perspectives from the reader's point of view. More 
information coming soon at: https://github.com/guyfe/Mup

LongSumm 2022: Generation of Long Summaries for Scientific Documents.

Most of the work on scientific document summarization focuses on generating 
relatively short summaries. Such a short summary resembles an abstract and 
cannot cover all the salient information conveyed in a given scientific text. 
Writing longer summaries requires expertise and a deep understanding in a 
scientific domain, as can be found in some researchers blogs. This shared task 
leverages blog posts created by researchers in the NLP and Machine learning 
communities that summarize scientific articles and use these posts as reference 
summaries. The corpus for this task includes a training set that consists of 
1705 extractive summaries, and 531 abstractive summaries of NLP and Machine 
Learning scientific papers.

More information at: https://github.com/guyfe/LongSumm

SV-Ident 2022: Survey Variable Identification in Social Science Publications.
In this shared task, we focus on concepts specific to social science 
literature, namely survey variables. Survey variable mention identification in 
texts can be seen as a multi-label classification problem: Given a sentence in 
a document, and a list of unique variables (from a reference vocabulary of 
survey variables), the task is to classify which variables, if any, are 
mentioned in each sentence. This task is organized by the VAriable Detection, 
Interlinking, and Summarization (VADIS) project. Further details: 

MSLR 2022: Multi-document summarization for medical literature reviews

In the context of medicine, systematic literature reviews constitute the 
highest-quality evidence used to inform clinical care. However, reviews are 
expensive to produce manually; (semi-)automation via NLP may facilitate faster 
evidence synthesis without sacrificing rigor. Toward this end, we are running a 
shared task to study the generation of multi-document summaries in this domain. 
We make use of two datasets: 1) MS^2: consisting of 20k reviews (citing 470K 
studies) from the biomedical literature (https://github.com/allenai/ms2), and 
2) Cochrane Conclusions: derived from over 4500 Cochrane reviews 
(https://github.com/bwallace/RCT-summarization-data). Each submission is judged 
against a gold review summary on the ROUGE score and by the 
evidence-inference-based divergence metric defined in the MS^2 paper. We also 
encourage contributions that extend this task and dataset, e.g., by proposing 
scaffolding tasks, methods for model interpretability, and especially, improved 
automated evaluation methods in this domain. More information: 

Scholarly Knowledge Graph Generation

With the demise of the widely used Microsoft Academic Graph (MAG) at the end of 
2021, the scholarly document processing community is facing a pressing need to 
replace MAG with an open-source community supported service. A number of 
challenging data processing tasks are essential for a scalable creation of a 
comprehensive scholarly graph, i.e., a graph of entities involving but not 
limited to research papers, their authors, research organizations, and research 
themes. This shared task will evaluate three key sub-tasks involved in the 
generation of a scholarly graph: 1) document deduplication, i.e. identifying 
and linking different versions of the same scholarly document, 2) extracting 
research themes, and 3) affiliation mining, i.e., linking research papers or 
their metadata to the organizational entities that produced them. Test and 
evaluation data will be supplied by the CORE aggregator (https://core.ac.uk/). 
Pre-register your team here: https://forms.gle/7nduU6meseEpv9i69 And we'll keep 
you posted with competition updates and timelines. More information: 

DAGPap22: Detecting automatically generated scientific papers

There are increasing reports that research papers can be written by computers, 
which presents a series of concerns. In this challenge, we explore the state of 
the art in detecting automatically generated papers. We frame the detection 
problem as a binary classification task: given an excerpt of text, label it as 
either human-written or machine-generated. To this end, we will provide a 
corpus of automatically written papers, as well as documents collected by our 
publishing and editorial teams. As a control, we will provide a corpus of 
openly accessible human-written papers from the same scientific domains of 
documents. We also encourage contributions that aim to extend this dataset with 
other computer-generated scientific papers, or papers that propose valid 
metrics to assess automatically generated papers against those written by 

More information will be made available at 

** Organizing Committee **

Arman Cohan, Allen Institute for AI, Seattle, USA

Guy Feigenblat, Piiano, Israel

Dayne Freitag, SRI International, San Diego, USA

Tirthankar Ghosal, Charles University, Czech Republic

Drahomira Herrmannova, Elsevier, USA

Petr Knoth, Open University, UK

Kyle Lo, Allen Institute for AI, Seattle, USA

Philipp Mayr, GESIS -- Leibniz Institute for the Social Sciences, Germany

Robert M. Patton, Oak Ridge National Laboratory, USA

Michal Shmueli-Scheuer, IBM Research AI, Haifa Research Lab, Israel

Anita de Waard, Elsevier, USA

Lucy Lu Wang, Allen Institute for AI, Seattle, USA

Dr. Philipp Mayr
Team Leader Information & Data Retrieval

GESIS - Leibniz Institute for the Social Sciences
Unter Sachsenhausen 6-8,  D-50667 Köln, Germany
Tel: + 49 (0) 221 / 476 94 -533
Email: philipp.mayr@xxxxxxxxx<mailto:philipp.mayr@xxxxxxxxx>
Web: http://www.gesis.org<http://www.gesis.org/>

Listeninformationen unter http://www.inetbib.de.