Procedural Reading Comprehension with Attribute-Aware Context Flow

Aida Amini
Antoine Bosselut
Bhavana Dalvi
Yejin Choi
Hannaneh Hajishirzi
AKBC
2020
View in Semantic Scholar

Abstract

Procedural texts often describe processes (e.g., photosynthesis and cooking) that happen over entities (e.g., light, food). In this paper, we introduce an algorithm for procedural reading comprehension by translating the text into a general formalism that represents processes as a sequence of transitions over entity attributes (e.g., location, temperature). Leveraging pre-trained language models, our model obtains entity-aware and attribute-aware representations of the text by joint prediction of entity attributes and their transitions. Our model dynamically obtains contextual encodings of the procedural text exploiting information that is encoded about previous and current states to predict the transition of a certain attribute which can be identified as a span of text or from a pre-defined set of classes. Moreover, our model achieves state of the art results on two procedural reading comprehension datasets, namely ProPara and npn-cooking

1. Introduction

Procedural text describes how entities (e.g., fuel, engine) or their attributes (e.g., locations) change throughout a process (e.g., a scientific process or cooking recipe). Procedural reading comprehension is the task of answering questions about the underlying process in the text (Figure 1) . Understanding procedural text requires inferring entity attributes and their dynamic transitions, which might only be implicitly mentioned in the text. For instance, in Figure 1 , the creation of the mechanical energy in alternator can be inferred from second and third sentences.

Figure 1: Example of a procedural text and the predicted attributes and transitions for each entity. Procedural reading comprehension is the task of answering questions about the underlying process. Sample questions in a PROPARA tasks are: ‘What is the process input?’, ‘What is the process output?’, ‘What is the location of the entity?’.

Full understanding of a procedural text requires capturing the full interplay between all components of a process, namely entities, their attributes and their dynamic transitions. Recent work in understanding procedural texts develop domain-specific models for tracking entities in scientific processes or cooking recipes . More recently, Gupta and Durrett [2019b] obtain general entity-aware representations of a procedural text leveraging pretrained language models, and predict entity transitions from a set of pre-defined classes independent of entity attributes. Pre-defining the set of entity states limits the general applicability of the model since entity attributes can be arbitrary spans of text. Moreover, entity attributes can be exploited

The fuel source will power an alternator.

The electrons will run through to the outlets of the generator.

An alternator will convert mechanical energy in to measurable electrical energy.

An engine must be powered by gas or some fuel source. Procedural reading comprehension is the task of answering questions about the underlying process. Sample questions in a PROPARA tasks are: 'What is the process input?', 'What is the process output?', 'What is the location of the entity?'.

for tracking entity state transitions. For example, in Figure 1 , the location of fuel can be effectively inferred from text as engine without the explicit mention of the movement transition in the first sentence. Moreover, the phrase converted in the third sentence gives rise to predicting two transition actions of destruction of one type of energy and creating the other type. In this work, we introduce a general formalism to represent procedural text and develop an endto-end neural procedural reading comprehension model that jointly identifies entity attributes and transitions leveraging dynamic contextual encoding of the procedural text. The formalism represents entities, their attributes, and their transitions across time. Our model obtains attribute-aware representation of the procedural text leveraging a reading comprehension model that jointly identifies entity attributes as a span of text or from a pre-defined set of classes. Our model predicts state transitions given the entity-aware and attribute-aware encoding of the context up to a certain time step to consistently capture the dynamic flow of contextual encoding through an LSTM model.

Our experiments show that our method achieves state of the art results across various tasks introduced on the PROPARA dataset to track entity attributes and their transitions in scientific processes. Additionally, a simple variant of our model achieves state of the art results in the NPN-COOKING dataset.

Our contributions are three-fold: (a) We present a general formalism to model procedural text, which can be adapted to different domains. (b) We develop DYNAPRO, an end to end neural model that jointly and consistently predicts entity attributes and their state transitions, leveraging pretrained language models. (c) We show that our model can be adapted to several procedural reading comprehension tasks using the entity-aware and attribute-aware representations, achieving state of art results on several diverse tasks.

2. Related Work

Most previous work in reading comprehension [Rajpurkar et al., 2016] focus on identifying a span of text that answers a given question about a static paragraph. This paper focuses on procedural reading comprehension that inquires about how the states of entities change over time. Similar to us, there are several previous work that focus on understanding temporal text in multiple domains. Cooking recipes describe instructions on how ingredients consistently change. bAbI [Weston et al., 2015 ] is a collection of datasets focusing on understanding narratives and stories. Math word problems [Kushman et al., 2014 , Hosseini et al., 2017 , Amini et al., 2019 , Koncel-Kedziorski et al., 2016 describe how the state of entities change throughout some mathematical procedures. Narrative question answering [Kočiskỳ et al., 2018 inquires to reason about the state of a story over time. The PROPARA dataset [Mishra et al., 2018 ] is a collection of procedural texts that describe how entities change throughout scientific processes over time, and inquires about several aspects of the process such as the entity attributes or state transitions. Several models (e.g., EntNet [Henaff et al., 2017] , QRN [Seo et al., 2017] , MemNet [Weston et al., 2014] ) have also been introduced to track entities in narratives.

The closest work to ours is the line of work focusing on the PROPARA and NPN-COOKING datasets. use an attention-based neural network to find transitions in ingredients. Pro-local and Pro-Global [Mishra et al., 2018] first identify locations of entities using an entity recognition approach and use manual rules or global structure of the procedural text to consistently track entities. leverage manually defined and knowledge-base driven commonsense constraints to avoid nonsensical predictions in Pro-Struct model (e.g., entity trees don't moves to different locations). KG-MRC [Das et al., 2019 ] maintain a knowledge graph of entities over time and identify entity states by predicting the location span with respect to each entity while utilizing a reading comprehension model. NCET (Gupta and Durrett [2019a] ) introduces a neural conditional random field model to maintain the consistency of state predictions. Most recently, ET BERT [Gupta and Durrett, 2019b ] uses transformers to construct entity-aware representation of each entity and predict the state transitions from a set of predefined classes. In this paper, we integrate all previous observation and develop a model that jointly identifies entities, attributes, and transitions over time. Unlike previous work that is designed to address either attributes or transitions, our model benefits from the clues that are implicitly and explicitly mentioned for both entity attributes and transitions. Leveraging both aspects of procedural reading comprehension lead us to a general and adaptive definition and model for such task that has achieved state of art in several tasks.

3. Procedural Text Representation

Procedural text is a sequence of sentences describing how entities and their attributes change throughout a process. We introduce a general formalism to represent a procedural text:

EQUATION (1): Not extracted; please refer to original document.

where E is the list of entities participating in the process, A is the list of entity attributes, and T is the list of transitions.

Entities are the main elements participating in the process. For example, in the scientific processes described in PROPARA entities include elements such as energy, fuel, etc. In the cooking Figure 2 : DYNAPRO takes the procedural context X k as input and predicts attributes A k−1 , A k and transitions T k at each time step k P {?,−, * } indicates the probability of the location type among nowhere, unkown, and span of text respectively.. The model uses the changes in attribute values from time steps k − 1 to k to predict transitions.

Figure 2: DYNAPRO takes the procedural context Xk as input and predicts attributes Ak−1, Ak and transitions Tk at each time step k P{?,−,∗} indicates the probability of the location type among nowhere, unkown, and span of text respectively.. The model uses the changes in attribute values from time steps k − 1 to k to predict transitions.

[CLS] Q | | [SEP] [SEP] ... 0 Entity Aware Representation Attribute Aware Representation − 1 Span Prediction Class Prediction 1 2 − 1 { ? / − / * } − 1 Span Prediction Class Prediction 1 2 { ? / − / * } ... Transition = ( , , ) [ ] [ , ] [ , ] − 1 − 1

recipe domain, the entities are ingredients such as milk, flour, etc. The entities can be given based on the task such as in PROPARA and cooking domain or they can be inferred from the context (e.g., math word problems).

Attributes are entity properties that can change over time. We model attributes as functions Attribute(e) = val that assign a value val to an attribute of the entity e. The entity state at each time is derived by combining all the attribute values of that entity. Attribute values can be either spans of text or can be derived from a predefined set of classes. For example, in PROPARA an important attribute of an entity is its location which can be a span of text. Npn-Cooking dataset introduces several attributes (such as shape and cookedness) for each ingredient. Example attributes addressing the entities in PROPARA are modeled as follows:

exists(e) = {none, unknown, spanintext} at loc(e) = l → Assigns the location l to entity e

Transitions capture changes in the entity states. More specifically, transitions indicate how entity attributes change over time. We model each transition with an action name and a list of arguments that include the entity and some attribute values. For example, PROPARA consists of four transition types : Create(e, loc), Destroy(e), None(e) and Move(e, loc).

4. Model

We introduce DYNAPRO, an end-to-end neural architecture that jointly predicts entity attributes and their transitions. Figure 2 depicts an overview of our model. DYNAPRO first obtains the representation of the procedural text corresponding to an entity at each time step (Section 4.1). It then identifies entity attributes for current and previous time steps (Section 4.2) and uses them to develop an attribute-aware representation of the procedural context (Section 4.3). Finally, DYNAPRO uses entity-aware and attribute-aware representations to predict transitions that happen at that time step (Section 4.4).

4.1 Entity-Aware Representation

Given a procedural text S 0 . . . S k . . . S T and an entity e, DYNAPRO encodes procedural context X k at each time step k and obtains the entity-aware representation vector R k (e). The procedural context is formed by concatenating entity name, query, and a fragment of the procedural text. The entity name and the query are included in the procedural context to capture the entity-aware representation of the context. Since entity attributes are changing throughout the process, we form the context at each step k by truncating the procedural text up to the k th sentence. More formally, the procedural context is defined as:

EQUATION (2): Not extracted; please refer to original document.

where [S 0 . . . S k ] is the fragment of the procedural text up to the k th sentence, Q e is the entity-aware query (e.g., "Where is e?"), [C i ] includes tokens that are reserved for attribute value classes (e.g., nowhere, unknown), and [cls] and [sep] are special tokens to capture sentence representations and separators. DYNAPRO then uses a pre-trained language model to encode the procedural context X k (e) and returns the entity-aware representation R k (e) = BERT (X k (e)). Hereinafter, for the ease of notation we will remove the argument e from the equations.

4.2 Attribute Identification

DYNAPRO identifies attribute values for each entity from the entity-aware representation R k (e) by jointly predicting attribute values from a pre-defined set of classes or extracts them as a text span.

Class Prediction Some attribute values can be identified from a set of pre-defined classes. For instance existence attribute of an entity can be identified from {nowhere, unknown, spanoftext}. Our model predicts the probability distribution P class k over different classes of attribute values given the entity-aware representation R k .

EQUATION (3): Not extracted; please refer to original document.

where R k is the entity-aware representation, g is a non-linear function, f is a linear function and θ 1 are learnable parameters.

Span Prediction Defining all attribute values apriori limits the general applicability of procedural text understanding. Some attribute values are only mentioned within spanoftext. For example, the location of an entity may be mentioned in the text, but not as a set of pre-defined classes. For span prediction, we follow the standard procedure of phrase extraction in reading comprehension [Seo et al., 2016 ] that predicts two probability distributions over start and end tokens of the span.

EQUATION (4): Not extracted; please refer to original document.

where g is a non-linear function, f is a linear function and θ 2 and θ 3 are the learnable parameters used to find the probability distributions of start and end tokens of the span. In order to capture the transitions of entity attributes, our model captures attributes for time steps k − 1 and k given a procedural context X k . More specifically, we use equations 3 and 4 to compute the probability distributions P class k−1 , P span k−1 , P class k and P span k for both time steps k and k − 1.

4.3 Attribute-Aware Representation

DYNAPRO obtains attribute-aware representation R a k of the context to encode entity attributes and their transitions at each time step k using the predicted distributions P span k and P class k for each entity e. The intuition is to assign higher probabilities to the tokens corresponding to the attribute value of the entity at time step k.

EQUATION (5): Not extracted; please refer to original document.

Where class ∈ {nowhere, unknown, span} are the predefined classification of attributes, P class k and P span k denote the probability distribution of attribute values over predefined classes and the span of text respectively, and are calculated using equations 3 and 4. m class is a vector that masks out the input tokens that do not correspond with a specific class. We model the flow of the context by concatenating attribute-aware representations for time step k and k − 1 as,

EQUATION (6): Not extracted; please refer to original document.

4.4 Transition Classification

DYNAPRO predicts attribute transitions from entity-aware and attribute-aware representations. In order to make smooth transition predictions and avoid redundant transitions we include a Bi-LSTM layer before the classification of the transition.

EQUATION (7): Not extracted; please refer to original document.

where h is the hidden vector of sequential layer, θ 4 is the learnable parameter and R seq k is the output of the sequential layer.

4.5 Inference And Training

Training Our model is trained end-to-end by optimizing the loss function below:

EQUATION (8): Not extracted; please refer to original document.

Each loss function is defined as a cross entropy loss. (loss span , loss state ) k and loss transition k are the losses of attribute prediction and the transition prediction modules at time step k, respectively.

Inference At each time step k, the attributes A k and transitions T k are predicted given P span k , P class k , and P transition k . The final output of the model consists of two sets of predictions, the attributes A 0...k and transitions T 0...k which are combined to track entities throughout a process given a task-specific objective (more in implementation details).

5.1 Datasets

We evaluate our model over the PROPARA dataset introduced by with the vocabulary size of 2.5k. This dataset contains over 400 manually-written paragraphs of scientific process descriptions. Each paragraph includes average of 4.17 entities and 6 sentences. The entities are extracted by experts and the transitions are annotated by crowd-workers.

We additionally evaluate our model on the NPN-COOKING dataset. This corpus contains 65k cooking recipes. Each recipe consists of ingredient tracked during the process. Training samples are heuristically annotated by string matching and dev/test samples are annotated by crowd-workers. We randomly sample from the training recipes that have contained ingredients which changed in location attribute.

5.2 Tasks And Metrics

We evaluate DYNAPRO on three tasks in PROPARA and one task in NPN-COOKING. Document-level predictions This task is introduced by Mishra et al.

[2018] that evaluates four different objectives per entity and process: Whether the entity is the (1) input or (2) output of the process. (3) The moves and (4) the conversions of the entity in the process. The final metric reported for this evaluation is the average precision, recall and F1 score of all four questions.

Sentence-level predictions The task is introduced by Mishra et al. [2018] that considers questions about the procedural text: Cat-1 asks if the specific entity is Created/Destroyed/Moved, Cat-2 asks the time step at which the entity has been Created/Destroyed/Moved, and Cat-3 asks about the location that entity is Created/Moved/Destroyed. The evaluation metric calculates the score of all transitions in each category and reports the micro and macro average of the scores among three categories.

Action dependencies The task is recently introduced by Mishra et al. [2019] to check whether the actions predicted by a model have some role to play in overall dynamics of the procedural paragraph. The final metric reported for this task is the precision, recall, and F1 scores of the dependency links averaged over all paragraphs.

Location prediction in Recipes The task is to identify the location of different entities in the cooking domain. In this domain, the list of attributes are fixed. We evaluate by measuring the change in location and compute F1 and accuracy in attribute prediction.

5.3 Implementation Details

We use the official implementation of BERT base huggingface library [Wolf et al., 2019] . We choose cross entropy loss function. The learning rate for training is 3e −5 and the training batch size is 8. The hidden size of the sequential layer is set to 1000 and 200 for class prediction and transition prediction respectively.

We use the predicted A k−1 to initialize the attribute of timestep 0 and at any other timestep we use at A k predictions for finding the value of attribute at timestep k. In the sentence level evaluation task introduced in , the consistency is not required. Inference phase for this task only uses the attribute predictions. For the document-level predictions, we construct the final predictions by favoring the transition predictions. In case of inconsistency where there is no valid attribute prediction to support the transition we refer to the attribute value to deterministically infer the transition. To adapt the results of DYNAPRO to identify action dependencies, we postprocess the results using similar heuristics described in the original task. To adapt DYNAPRO to the NPN-COOKING dataset, we use a 243-way classification to predict attributes because the attributes are known apriori. Table 1 and Table 2 compare DYNAPRO with previous work (detailed in Section 2) in PROPARA and NPN-COOKING tasks. As shown in the tables, DYNAPRO outperforms the state of the art models in most of the evaluations.

Table 1: Results comparing DYNAPRO to previous state of the art methods on sentence-level , document-level and Action Dependency tasks of PROPARA (test set).

Table 2: F1 and accuracy score on the location prediction task in NPN-COOKING.

Document-Level Task

We observe the most significant gain (3 absolute points in F1) in the documentlevel tasks, indicating the ability of the model in global understanding of the procedural text by joint predictions of entity attributes and transitions throughout time. Overall, DYNAPRO predicts transitions with higher confidence, and hence results in high precision in most document-level tasks.

Sentence-level task DYNAPRO outperforms the state-of-the-art models on Ma-Avg and Mi-Avg metrics when comparing the full predictions and gives comparable results to previous work on change and time step predictions. Note that ET BERT [Gupta and Durrett, 2019b] only predicts actions (Create, Destroy, Move) but fails to predict location attributes as spans DYNAPRO obtains a good performance on Cat-1 and Cat-2 prediction while learning to predict questions with more complex structure. We speculate that our lower numbers in Cat-1 and Cat-2 are due to DYNAPRO's highly confident decisions that lead to high precision, but lower prediction rate, noting that Cat-1 and Cat-2 evaluate accuracy.

Action Dependency DYNAPRO outperforms all previous work with F1 score of 43.7. Note that XPAD [Mishra et al., 2019] explicitly favors predicting state changes that result in dependencies across steps. In contrast, DYNAPRO is only optimized to track entities.

Location prediction in Recipes Finally, a simple variant of DYNAPRO achieves best performance in the NPN-COOKING dataset, showcasing the importance of procedural text encoding over time. Table 3 : Ablation study of different components in DYNAPRO by comparing F1 score on PROPARA Document Level task (dev set).

Table 3: Ablation study of different components in DYNAPRO by comparing F1 score on PROPARA Document Level task (dev set).

5.5 Ablation Studies And Analyses

In order to better understand the impact of DYNAPRO's components, we evaluate different variants of DYNAPRO in the document-level task of the PROPARA dataset.

• No attribute-aware representation The model only considers entity-aware representations in Equation 7 for transition predictions. instead of the attribute-aware representation R a k . • Full procedural input that uses the full text of the procedure instead of the truncated text X k at time step k. Table 3 shows that removing each component from DYNAPRO hurts the performance, indicating that joint prediction of attribute spans, classes, and transitions are all important in procedural reading comprehension. Moreover, the table shows the effect of attribute-aware representations that incorporate the flow of context by predicting attributes of two consecutive time steps. Finally, the table shows the effect of procedural context modeling by truncating sentences up to a certain time step rather than considering the full document at each time step. Note that document-level evaluation in PROPARA requires spans of texts being identified, therefore removing span prediction from DYNAPRO cannot be ablated.

# Sentence

Gold Prediction 1.1 Blood enters the right side of your heart. heart right side of your heart 1.2 Blood travels to the lungs. lungs lungs 1.3 Carbon dioxide is removed from the blood lungs lungs 1.4 Blood returns to left side of your heart heart left side of your heart 2.1 Blood travels to the lungs blood blood 2.2 Carbon dioxide is removed from the blood.

-? 3.1 Fuel converts to energy when air and petrol mix. -air and petrol 3.2 The car engine burns the mix of air and petrol. engine air and petrol 3.3 Hot gas from the burning pushes the pistons. piston air and petrol 3.4 The resulting energy powers the crankshaft.

crankshaft crankshaft Table 4 : Examples of correct and incorrect predictions of DYNAPRO. Entities in the first, second, and third examples are blood, carbondioxide, energy, respectively.

Table 4: Examples of correct and incorrect predictions of DYNAPRO. Entities in the first, second, and third examples are blood, carbondioxide, energy, respectively.

5.6 Error Analysis

Qualitative Analyses Table 4 shows the three types of common mistakes in the final predictions.

In the first example DYNAPRO successfully tracks the blood entity while circulating in the body, yet there is a mismatch of what portion of the text it chooses as the span. In the second example, the model correctly predicts the location of carbondioxide as blood, but there is not enough external knowledge provided for the model to predict that this entity gets destroyed. In the third example, the model mistakenly predicts the airandpetrol as a container for the energy, and since the changes are explicitly happening to the container they are not propagate to the entity. Inconsistent Transitions We categorize possible inconsistencies in transition predictions into three categories. (The percentage numbers shows how many times that inconsistency was observed in the inference step.):

• Creation(2.0%): When the supporting attribute is predicted to be non-existence or the previous attribute shows that the entity already exists. • Move(1.5%): When the predicted attribute is not changed from previous prediction or it refers to a non-existence case. • Destroy(1.0%): When the predicted attribute for the last timestep is non-existence.

6. Conclusion

We introduce an end-to-end model that benefits from both entity-aware representations and attributeaware representations to jointly predict attributes values and their transitions related to an entity. We present a general formalism to model procedural texts and introduce a model to translate procedural text into that formalism. We show that entity-aware and temporal-aware construction of the input helps to achieve better entity-aware and attribute-aware representations of the procedural context. Finally, we show how our model can achieve inferences about state transitions by tracking transition in attribute values. Our model achieves the state of the art results on various tasks over the PROPARA dataset and the NPN-COOKING dataset. Future work involves extending our method to automatically identifying entities and their attribute types and adapting to other domains.