A general framework for information extraction using dynamic span graphs

Yi Luan
David Wadden
Luheng He
Amy Shah
Mari Ostendorf
Hannaneh Hajishirzi
NAACL
2019
View in Semantic Scholar

Abstract

We introduce a general framework for several information extraction tasks that share span representations using dynamically constructed span graphs. The graphs are dynamically constructed by selecting the most confident entity spans and linking these nodes with confidence-weighted relation types and coreferences. The dynamic span graph allow coreference and relation type confidences to propagate through the graph to iteratively refine the span representations. This is unlike previous multi-task frameworks for information extraction in which the only interaction between tasks is in the shared first-layer LSTM. Our framework significantly outperforms state-of-the-art on multiple information extraction tasks across multiple datasets reflecting different domains. We further observe that the span enumeration approach is good at detecting nested span entities, with significant F1 score improvement on the ACE dataset.

1 Introduction

Most Information Extraction (IE) tasks require identifying and categorizing phrase spans, some of which might be nested. For example, entity recognition involves assigning an entity label to a phrase span. Relation Extraction (RE) involves assigning a relation type between pairs of spans. Coreference resolution groups spans referring to the same entity into one cluster. Thus, we might expect that knowledge learned from one task might benefit another.

Most previous work in IE (e.g., (Nadeau and Sekine, 2007; Chan and Roth, 2011) ) employs a pipeline approach, first detecting entities and then using the detected entity spans for relation extraction and coreference resolution. To avoid cascading errors introduced by pipeline-style systems, recent work has focused on coupling different IE tasks as in joint modeling of entities and relations (Miwa and Bansal, 2016; Zhang et al., 2017) , entities and coreferences (Hajishirzi et al., 2013; Durrett and Klein, 2014) , joint inference (Singh et al., 2013) or multi-task (entity/relation/coreference) learning (Luan et al., 2018a) . These models mostly rely on the first layer LSTM to share span representations between different tasks and are usually designed for specific domains. In this paper, we introduce a general framework Dynamic Graph IE (DYGIE) for coupling multiple information extraction tasks through shared span representations which are refined leveraging contextualized information from relations and coreferences. Our framework is effective in several domains, demonstrating a benefit from incorporating broader context learned from relation and coreference annotations. Figure 1 shows an example illustrating the potential benefits of entity, relation, and coreference contexts. It is impossible to predict the entity labels for This thing and it from within-sentence context alone. However, the antecedent car strongly suggests that these two entities have a VEH type. Similarly, the fact that Tom is located at Starbucks and Mike has a relation to Tom provides support for the fact that Mike is located at Starbucks. DYGIE uses multi-task learning to identify entities, relations, and coreferences through shared span representations using dynamically constructed span graphs. The nodes in the graph are dynamically selected from a beam of highly-confident mentions, and the edges are weighted according to the confidence scores of relation types or coreferences. Unlike the multi-task method that only shares span representations from the local context (Luan et al., 2018a) , our framework leverages rich contextual span representations by propagating information through coreference and relation links. Unlike previous BIO-based entity recognition systems (Collobert and Weston, 2008; Lample et al., 2016; Ma and Hovy, 2016) that assign a text span to at most one entity, our framework enumerates and represents all possible spans to recognize arbitrarily overlapping entities.

Figure 1: A text passage illustrating interactions between entities, relations and coreference links. Some relation and coreference links are omitted.

We evaluate DYGIE on several datasets spanning many domains (including news, scientific articles, and wet lab experimental protocols), achieving state-of-the-art performance across all tasks and domains and demonstrating the value of coupling related tasks to learn richer span representations. For example, DYGIE achieves relative improvements of 5.7% and 9.9% over state of the art on the ACE05 entity and relation extraction tasks, and an 11.3% relative improvement on the ACE05 overlapping entity extraction task.

The contributions of this paper are threefold. 1) We introduce the dynamic span graph framework as a method to propagate global contextual information, making the code publicly available. 2) We demonstrate that our framework significantly outperforms the state-of-the-art on joint entity and relation detection tasks across four datasets: ACE 2004, ACE 2005, SciERC and the Wet Lab Protocol Corpus. 3) We further show that our approach excels at detecting entities with overlapping spans, achieving an improvement of up to 8 F1 points on three benchmarks annotated with overlapped spans: ACE 2004, ACE 2005 and GENIA.

2 Related Work

Previous studies have explored joint modeling (Miwa and Bansal, 2016; Zhang et al., 2017; Singh et al., 2013; Yang and Mitchell, 2016) ) and multi-task learning (Peng and Dredze, 2015; Peng et al., 2017; Luan et al., 2018a Luan et al., , 2017a as methods to share representational strength across related in-formation extraction tasks. The most similar to ours is the work in Luan et al. (2018a) that takes a multi-task learning approach to entity, relation, and coreference extraction. In this model, the different tasks share span representations that only incorporate broader context indirectly via the gradients passed back to the LSTM layer. In contrast, DYGIE uses dynamic graph propagation to explicitly incorporate rich contextual information into the span representations.

Entity recognition has commonly been cast as a sequence labeling problem, and has benefited substantially from the use of neural architectures (Collobert et al., 2011; Lample et al., 2016; Ma and Hovy, 2016; Luan et al., 2017b Luan et al., , 2018b . However, most systems based on sequence labeling suffer from an inability to extract entities with overlapping spans. Recently Katiyar and Cardie (2018) and Wang and Lu (2018) have presented methods enabling neural models to extract overlapping entities, applying hypergraph-based representations on top of sequence labeling systems. Our framework offers an alternative approach, forgoing sequence labeling entirely and simply considering all possible spans as candidate entities.

Neural graph-based models have achieved significant improvements over traditional featurebased approaches on several graph modeling tasks. Knowledge graph completion (Yang et al., 2015; Bordes et al., 2013) is one prominent example. For relation extraction tasks, graphs have been used primarily as a means to incorporate pipelined features such as syntactic or discourse relations (Peng et al., 2017; Song et al., 2018; . Christopoulou et al. (2018) models all possible paths between entities as a graph, and refines pair-wise embeddings by performing a walk on the graph structure. All these previous works assume that the nodes of the graph (i.e. the entity candidates to be considered during relation extraction) are predefined and fixed throughout the learning process. On the other hand, our framework does not require a fixed set of entity boundaries as an input for graph construction. Motivated by state-ofthe-art span-based approaches to coreference resolution (Lee et al., 2017 and semantic role labeling , the model uses a beam pruning strategy to dynamically select high-quality spans, and constructs a graph using the selected spans as nodes.

Many state-of-the-art RE models rely upon domain-specific external syntactic tools to construct dependency paths between the entities in a sentence (Li and Ji, 2014; Xu et al., 2015; Miwa and Bansal, 2016; Zhang et al., 2017) . These systems suffer from cascading errors from these tools and are hard to generalize to different domains. To make the model more general, we combine the multitask learning framework with ELMo embeddings (Peters et al., 2018) without relying on external syntactic tools and risking the cascading errors that accompany them, and improve the interaction between tasks through dynamic graph propagation. While the performance of DyGIE benefits from ELMo, it advances over some systems (Luan et al., 2018a; Sanh et al., 2019) that also incorporate ELMo. The analyses presented here give insights into the benefits of joint modeling.

3 Model

Problem Definition The input is a document represented as a sequence of words D, from which we derive S = {s 1 , . . . , s T }, the set of all possible within-sentence word sequence spans (up to length L) in the document. The output contains three structures: the entity types E for all spans S, the relations R for all span pairs S × S within the same sentence, and the coreference links C for all spans in S across sentences. We consider two primary tasks. First, Entity Recognition is the task of predicting the best entity type labels e i for each span s i . Second, Relation Extraction involves predicting the best relation type r ij for all span pairs (s i , s j ). We provide additional supervision by also training our model to perform a third, auxiliary task: Coreference resolution. For this task we predict the best antecedent c i for each span s i .

Our Model We develop a general information extraction framework (DYGIE) to identify and classify entities, relations, and coreference in a multi-task setup. DYGIE first enumerates all text spans in each sentence, and computes a locallycontextualized vector space representation of each span. The model then employs a dynamic span graph to incorporate global information into its span representations, as follows. At each training step, the model identifies the text spans that are most likely to represent entities, and treats these spans as nodes in a graph structure. It constructs confidence-weighted arcs for each node according to its predicted coreference and relation links with the other nodes in the graph. Then, the span repre-sentations are refined using broader context from gated updates propagated from neighboring relation types and co-referred entities. These refined span representations are used in a multi-task framework to predict entity types, relation types, and coreference links.

3.1 Model Architecture

In this section, we give an overview of the main components and layers of the DYGIE framework, as illustrated in Figure 2 . Details of the graph construction and refinement process will be presented in the next section.

Figure 2: Overview of our DYGIE model. Dotted arcs indicate confidence weighted graph edges. Solid lines indicate the final predictions.

Token Representation Layer

We apply a bidirectional LSTM over the input tokens. The input for each token is a concatenation of the character reprensetation, GLoVe (Pennington et al., 2014) word embeddings, and ELMo embeddings (Peters et al., 2018) . The output token representations are obtained by stacking the forward and backward LSTM hidden states.

Span Representation Layer For each span s i , its initial vector representation g 0 i is obtained by concatenating BiLSTM outputs at the left and right end points of s i , an attention-based soft "headword," and an embedded span width feature, following Lee et al. (2017) .

Coreference Propagation Layer

The propagation process starts from the span representations g 0 i . At each iteration t, we first compute an update vector u t C for each span s i . Then we use u t C to update the current representation g t i , producing the next span representation g t+1 i . By repeating this process N times, the final span representations g N i share contextual information across spans that are likely to be antecedents in the coreference graph, similar to the process in .

Relation Propagation Layer

The outputs g N i from the coreference propagation layer are passed as inputs to the relation propagation layer. Similar to the coreference propagation process, at each iteration t, we first compute the update vectors u t R for each span s i , then use it to compute g t+1 i . Information can be integrated from multiple relation paths by repeating this process M times. to a FFNN to produce per-class relation scores P R (i, j) between spans s i and s j . Entity and relation scores are normalized across the label space, similar to Luan et al. (2018a) . For coreference, the scores between span pairs (s i , s j ) are computed from the coreference graph layer outputs (g N i , g N j ), and then normalized across all possible antecedents, similar to .

3.2 Dynamic Graph Construction And Span Refinement

The dynamic span graph facilitates propagating broader contexts through soft coreference and relation links to refine span representations. The nodes in the graph are spans s i with vector representations g t i ∈ R d for the t-th iteration. The edges are weighted by the coreference and relation scores, which are trained according to the neural architecture explained in Section 3.1. In this section, we explain how coreference and relation links can update span representations.

Coreference Propagation Similar to (Luan et al., 2018a) , we define a beam B C consisting of b c spans that are most likely to be in a coreference chain. We consider P t C to be a matrix of real values that indicate coreference confidence scores between these spans at the t-th iteration. P t C is of size b c × K, where K is the maximum number of antecedents considered. For the coreference graph, an edge in the graph is single directional, connecting the current span s i with all its potential antecedents s j in the coreference beam, where j < i. The edge between s i and s j is weighted by coreference confidence score at the current iteration P t C (i, j). The span update vector u t C (i) ∈ R d is computed by aggregating the neighboring span representations g t j , weighted by their coreference scores P t C (i, j):

EQUATION (1): Not extracted; please refer to original document.

where B C (i) is the set of K spans that are antecedents of s i ,

P t C (i, j) = exp(V t C (i, j)) j ∈B C (i) exp(V t C (i, j)) (2) V t C (i, j)

is a scalar score computed by concatenating the span representations

[g t i , g t j , g t i g t j ]

, where is element-wise multiplication. The concatenated vector is then fed as input to a FFNN, similar to .

Relation Propagation For each sentence, we define a beam B R consisting of b r entity spans that are mostly likely to be involved in a relation. Unlike the coreference graph, the weights of relation edges capture different relation types. Therefore, for the t-th iteration, we use a tensor V t R ∈ R b R ×b R ×L R to capture scores of each of the L R relation types. In other words, each edge in the relation graph connects two entity spans s i and s j in the relation beam

B R . V t R (i, j) is a L R -length vector of relation scores, computed with a FFNN with [g t i , g t j ]

as the input. The relation update vector u t R (i) ∈ R d is computed by aggregating neighboring span representations on the relation graph:

EQUATION (3): Not extracted; please refer to original document.

where A R ∈ R L R ×d is a trainable linear projection matrix, f is a non-linear function to select the most important relations. Because only a small number of entities in the relation beam are actually linked to the target span, propagation among all possible span pairs would introduce too much noise to the new representation. Therefore, we choose f to be the ReLU function to remove the effect of unlikely relations by setting the all negative relation scores to 0. Unlike coreference connections, two spans linked via a relation are not expected to have similar representations, so the matrix A R helps to transform the embedding g t j according to each relation type.

Updating Span Representations With Gating

To compute the span representations for the next iteration t ∈ {1, . . . , N + M }, we define a gating vector f t

x (i) ∈ R d , where x ∈ {C, R}, to determine whether to keep the previous span representation g t i or to integrate new information from the coreference or relation update vectors u t

x (i). Formally,

f t x (i) = g(W f x [g t i , u t x (i)]) (4) g t+1 i = f t x (i) g t i + (1 − f t x (i)) u t x (i),

where W f x ∈ R d×2d are trainable parameters, and g is an element-wise sigmoid function.

3.3 Training

The loss function is defined as a weighted sum of the log-likelihood of all three tasks:

(D,R * ,E * ,C * )∈D λ E log P (E * | C, R, D) (5) + λ R log P (R * | C, D) + λ C log P (C * | D)

where E * , R * and C * are gold structures of the entity types, relations and coreference, respectively. D is the collection of all training documents D. The task weights λ E , λ R , and λ C are hyperparameters to control the importance of each task. We use a 1 layer BiLSTM with 200-dimensional hidden layers. All the feed-forward functions have 2 hidden layers of 150 dimensions each. We use 0.4 variational dropout (Gal and Ghahramani, 2016) for the LSTMs, 0.4 dropout for the FFNNs, and 0.5 dropout for the input embeddings. The hidden layer dimensions and dropout rates are chosen based on the development set performance in multiple domains. The task weights, learning rate, maximum span length, number of propagation iterations and beam size are tuned specifically for each dataset using development data.

4 Experiments

DYGIE is a general IE framework that can be applied to multiple tasks. We evaluate the performance of DYGIE against models from two lines of work: combined entity and relation extraction, and overlapping entity extraction.

4.1 Entity And Relation Extraction

For the entity and relation extraction task, we test the performance of DYGIE on four different datasets: ACE2004, ACE2005, SciERC and the Wet Lab Protocol Corpus. We include the relation graph propagation layer in our models for all datasets. We include the coreference graph propagation layer on the data sets that have coreference annotations available.

Data All four data sets are annotated with entity and relation labels. Only a small fraction of entities (< 3% of total) in these data sets have a text span that overlaps the span of another entity. Statistics on all four data sets are displayed in Table 1 .

Table 1: Datasets for joint entity and relation extraction and their statistics. Ent: Number of entity categories. Rel: Number of relation categories.

The ACE2004 and ACE2005 corpora provide entity and relation labels for a collection of documents from a variety of domains, such as newswire and online forums. We use the same entity and relation types, data splits, and preprocessing as Miwa and Bansal (2016) and Li and Ji (2014) . Following the convention established in this line of work, an entity prediction is considered correct if its type label and head region match those of a gold entity. We will refer to this version of the ACE2004 and ACE2005 data as ACE04 and ACE05. Since the domain and mention span annotations in the ACE datasets are very similar to those of OntoNotes (Pradhan et al., 2012) , and OntoNotes contains significantly more documents with coreference annotations, we use OntoNotes to train the parameters for the auxiliary coreference task. The OntoNotes corpus contains 3493 documents, averaging roughly 450 words in length. The SciERC corpus (Luan et al., 2018a) provides entity, coreference and relation annotations for a collection of documents from 500 AI paper abstracts. The dataset defines scientific term types and relation types specially designed for AI domain knowledge graph construction. An entity prediction is considered correct if its label and span match with a gold entity.

The Wet Lab Protocol Corpus (WLPC) provides entity, relation, and event annotations for 622 wet lab protocols (Kulkarni et al., 2018) . A wet lab protocol is a series of instructions specifying how to perform a biological experiment. Following the procedure in Kulkarni et al. (2018) , we perform entity recognition on the union of entity tags and event trigger tags, and relation extraction on the union of entity-entity relations and entity-trigger event roles. Coreference annotations are not available for this dataset.

Baselines

We compare DYGIE with current state of the art methods in different datasets. Miwa and Bansal (2016) provide the current state of the art on ACE04. They construct a Tree LSTM using dependency parse information, and use the repre-sentations learned by the tree structure as features for relation classification. Bekoulis et al. (2018) use adversarial training as regularization for a neural model. Zhang et al. (2017) cast joint entity and relation extraction as a table filling problem and build a globally optimized neural model incorporating syntactic representations from a dependency parser. Similar to DYGIE, Sanh et al. (2019) and Luan et al. (2018a) use a multi-task learning framework for extracting entity, relation and coreference labels. Sanh et al. (2019) improved the state of the art on ACE05 using multi-task, hierarchical supervised training with a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. Luan et al. (2018a) previously achieved the state of the art on SciERC and use a span-based neural model like our DYGIE. Kulkarni et al. (2018) provide a baseline for the WLPC data set. They employ an LSTM-CRF for entity recognition, following Lample et al. (2016) . For relation extraction, they assume the presence of gold entities and train a maximum-entropy classifier using features from the labeled entities.

Results Table 2 shows test set F1 on the joint entity and relation extraction task. We observe that DYGIE achieves substantial improvements on both entity recognition and relation extraction across the four data sets and three domains, all in the realistic setting where no "gold" entity labels are supplied at test time. DYGIE achieves 7.1% and 7.0% relative improvements over the state of the art on NER for ACE04 and ACE05, respectively. For the relation extraction task, DYGIE attains 25.8% relative improvement over SOTA on ACE04 and 13.7% relative improvement on ACE05. For ACE05, the best entity extraction performance is obtained by switching the order between CorefProp and RelProp (RelProp first then CorefProp).

Table 2: F1 scores on the joint entity and relation extraction task on each test set, compared against the previous best systems. * indicates relation extraction system that takes gold entity boundary as input.

On SciERC, DYGIE advances the state of the art by 5.9% and 1.9% for relation extraction and NER, respectively. The improvement of DYGIE over the previous SciERC model underscores the ability of coreference and relation propagation to construct rich contextualized representations.

The results from Kulkarni et al. (2018) establish a baseline for IE on the WLPC. In that work, relation extraction is performed using gold entity boundaries as input. Without using any gold entity information, DYGIE improves on the baselines by 16.8% for relation extraction and 2.2% for NER. On the OntoNotes data set used for the auxiliary coreference task with ACE05, our model achieves coreference test set performance of 70.4 F1, which is competitive with the state-of-the-art performance reported in Lee et al. (2017) .

4.2 Overlapping Entity Extraction

There are many applications where the correct identification of overlapping entities is crucial for correct document understanding. For instance, in the biomedical domain, a BRCA1 mutation carrier could refer to a patient taking part in a clinical trial, while BRCA1 is the name of a gene.

We evaluate the performance of DYGIE on overlapping entity extraction in three datasets: ACE2004, ACE2005 and GENIA. Since relation annotations are not available for these datasets, we include the coreference propagation layer in our models but not the relation layer. 2 Data Statistics on our three datasets are listed in Table 3 . All three have a substantial number (> 20% of total) of overlapping entities, making them appropriate for this task.

Table 3: Datasets for overlapping entity extraction and their statistics. Ent: Number of entity categories. Overlap: Percentage of sentences that contain overlapping entities.

As in the joint case, we evaluate our model on ACE2004 and ACE2005, but here we follow the same data preprocessing and evaluation scheme as Wang and Lu (2018) . We refer to these data sets as ACE04-O and ACE05-O. Unlike the joint entity and relation task in Sec. 4.1, where only the entity head span need be predicted, an entity prediction is considered correct in these experiments if both its entity label and its full text span match a gold prediction. This is a more stringent evaluation criterion than the one used in Section 4.1. As before, we use the OntoNotes annotations to train the parameters of the coreference layer.

The GENIA corpus (Kim et al., 2003) provides entity tags and coreferences for 1999 abstracts from the biomedical research literature. We only use the IDENT label to extract coreference clusters.

Dataset

System Entity F1

ACE04-O Katiyar and Cardie (2018) 72.7 Wang and Lu (2018) 75.1 DYGIE 84.7

Ace05-O

Katiyar and Cardie (2018) 70.5 Wang and Lu (2018) 74.5 DYGIE 82.9 GENIA Katiyar and Cardie (2018) 73.8 Wang and Lu (2018) 75.1 DYGIE 76. 2 Table 4 : Performance on the overlapping entity extraction task, compared to previous best systems. We report F1 of extracted entities on the test sets. We use the same data set split and preprocessing procedure as Wang and Lu (2018) for overlapping entity recognition.

Table 4: Performance on the overlapping entity extraction task, compared to previous best systems. We report F1 of extracted entities on the test sets.

Baselines The current state-of-the-art approach on all three data sets is Wang and Lu (2018) , which uses a segmental hypergraph coupled with neural networks for feature learning. Katiyar and Cardie (2018) also propose a hypergraph approach using a recurrent neural network as a feature extractor.

Results Table 4 presents the results of our overlapping entity extraction experiments on the different datsets. DYGIE improves 11.6% on the state of the art for ACE04-O and 11.3% for ACE05-O. DY-GIE also advances the state of the art on GENIA, albeit by a more modest 1.5%. Together these results suggest that DYGIE can be utilized fruitfully for information extraction across different domains with overlapped entities, such as bio-medicine.

5 Ace05-O

Table 5: Ablations on the ACE05 development set with different graph propagation setups. −CorefProp ablates the coreference propagation layers, while −RelProp ablates the relation propagation layers. Base is the system without any propagation.

5 Analysis Of Graph Propagation

We use the dev sets of ACE2005 and SciERC to analyze the effect of different model components. Tables 5 and 6 show the effects of graph propagation on entity and relation prediction accuracy, Table 6 : Ablations on the SciERC development set on different graph progation setups. CorefProp has a much smaller effect on entity F1 compared to ACE05. where −CorefProp and −RelProp denote ablating the propagation process by setting N = 0 or M = 0, respectively. Base is the base model without any propagation. For ACE05, we observe that coreference propagation is mainly helpful for entities; it appears to hurt relation extraction. On SciIE, coreference propagation gives a small benefit on both tasks. Relation propagation significantly benefits both entity and relation extraction in both domains. In particular, there are a large portion of sentences with multiple relation instances across different entities in both ACE05 and Sci-ERC, which is the scenario in which we expect relation propagation to help.

Table 6: Ablations on the SciERC development set on different graph progation setups. CorefProp has a much smaller effect on entity F1 compared to ACE05.

5.1 Coreference And Relation Graph Layers

Since coreference propagation has more effect on entity extraction and relation propagation has more effect on relation extraction, we mainly focus on ablating the effect of coreference propagation on entity extraction and relation propagation on relation extraction in the following subsections.

5.2 Coreference Propagation And Entities

A major challenge of ACE05 is to disambiguate the entity class for pronominal mentions, which requires reasoning with cross-sentence contexts. For example, in a sentence from ACE05 dataset, "One of [them] PER , from a very close friend of [ours] ORG ." It is impossible to identity whether them and ours is a person (PER) or organization (ORG) unless we have read previous sentences. We hypothesize that this is a context where coreference propagation can help. Table 7 shows the effect of the coreference layer for entity categorization of pronouns. 3 DYGIE has 6.6% improvement on pronoun performance, confirming our hypothesis.

Table 7: Entity extraction performance on pronouns in ACE05. CorefProp significantly increases entity extraction F1 on hard-to-disambiguate pronouns by allowing the model to leverage cross-sentence contexts.

Looking further, Table 8 shows the impact on all entity categories, giving the difference between the confusion matrix entries with and without CorefProp. The frequent confusions associated with pronouns (GPE/PER and PER/ORG, where GPE is a geopolitical entity) greatly improve, but the benefit of CorefProp extends to most categories.

Table 8: Difference in the confusion matrix counts for ACE05 entity extraction associated with adding CorefProp.

Of course, there are a few instances where CorefProp causes errors in entity extraction. For example, in the sentence "[They] ORG PER might have been using Northshore...", DYGIE predicted They to be of ORG type because the most confident antecedent is those companies in the previous sentence: "The money was invested in those companies." However, They is actually referring to these fund managers earlier in the document, which belongs to PER category.

In the SciERC dataset, the pronouns are uniformly assigned with a Generic label, which explains why CorefProp does not have much effect on entity extraction performance.

The Figure 3a shows the effect of number of iterations for coreference propagation in the entity extraction task. The figure shows that coreference layer obtains the best performance on the second iteration (N = 2). Figure 4 shows relation scores as a function of number of entities in sentence for DYGIE and DYGIE without relation propagation on ACE05. The figure indicates that relation propagation achieves significant improvement in sentences with more entities, where one might expect that using broader context 3 Pronouns included:

Figure 3: F1 score of each layer on ACE development set for different number of iterations. N = 0 or M = 0 indicates no propagation is made for the layer.

Figure 4: Relation F1 broken down by number of entities in each sentence. The performance of relation extraction degrades on sentences containing more entities. Adding relation propagation alleviates this problem.

5.3 Relation Propagation Impact

anyone, everyone, it, itself, one, our, ours, their, theirs, them, themselves, they, us, we, who LOC WEA GPE PER FAC ORG VEH LOC 5 0 -2 -1 2 -1 0 WEA 0 3 0 0 1 -3 -1 GPE -3 0 31 -26 3 -7 0 PER 0 -2 -3 18 -1 -26 4 FAC 4 -1 2 -3 2 -5 1 ORG 0 0 0 -8 -1 6 0 VEH 0 -2 -1 2 5 -1 1 Table 8 : Difference in the confusion matrix counts for ACE05 entity extraction associated with adding CorefProp. could have more impact. Figure 3b shows the effect of number of iterations for relation propagation in the relation extraction task. Our model achieves the best performance on the second iteration (M = 2).

6 Conclusion

We have introduced DYGIE as a general information extraction framework, and have demonstrated that our system achieves state-of-the art results on entity recognition and relation extraction tasks across a diverse range of domains. The key contribution of our model is the dynamic span graph approach, which enhance interaction across tasks that allows the model to learn useful information from broader context. Unlike many IE frameworks, our model does not require any preprocessing using syntactic tools, and has significant improvement across different IE tasks including entity, relation extraction and overlapping entity extraction. The addition of co-reference and relation propagation across sentences adds only a small computation cost to inference; the memory cost is controlled by beam search. These added costs are small relative to those of the baseline span-based model. We welcome the community to test our model on different information extraction tasks. Future directions include extending the framework to encompass more structural IE tasks such as event extraction.

Code and pre-trained models are publicly available at https://github.com/luanyi/DyGIE.

We use the pre-processed ACE dataset from previous work and relation annotation is not available.