How can you teach a computer coreference faster? [ACL 2022 Research Talk] HD

27.04.2022
Coreference resolution is a fundamental NLP task where the model has to determine which spans of text are coreferent. The models for this task are typically developed with the OntoNotes dataset. These models are publicly available to use but may not quickly generalize to new, low-resource domains. In our paper, we mitigate this problem with active learning by sampling a small subset of data for annotators to label. While active learning is well-defined for classification tasks, its application to coreference resolution is neither well-defined nor fully understood. Our work explores the type of spans to label according to different sources of model uncertainty. We also investigate whether to label coreference of spans that belong to the same document or across different contexts. In our experiments, we run simulations to compare different active learning strategies and hold a user study to understand the effects of reading on labeling. Full paper: https://www.umiacs.umd.edu/~jbg/docs/2022_acl_coref.pdf

Похожие видео

Показать еще