Go To:

Paper Title Paper Authors Table Of Contents Abstract References
Home
Report a problem with this paper

PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction

Authors

Abstract

Unconscious biases continue to be prevalent in modern text and media, calling for algorithms that can assist writers with bias correction. For example, a female character in a story is often portrayed as passive and powerless (“_She daydreams about being a doctor_”) while a man is portrayed as more proactive and powerful (“_He pursues his dream of being a doctor_”). We formulate **Controllable Debiasing**, a new revision task that aims to rewrite a given text to correct the implicit and potentially undesirable bias in character portrayals. We then introduce PowerTransformer as an approach that debiases text through the lens of connotation frames (Sap et al., 2017), which encode pragmatic knowledge of implied power dynamics with respect to verb predicates. One key challenge of our task is the lack of parallel corpora. To address this challenge, we adopt an unsupervised approach using auxiliary supervision with related tasks such as paraphrasing and self-supervision based on a reconstruction loss, building on pretrained language models. Through comprehensive experiments based on automatic and human evaluations, we demonstrate that our approach outperforms ablations and existing methods from related tasks. Furthermore, we demonstrate the use of PowerTransformer as a step toward mitigating the well-documented gender bias in character portrayal in movie scripts.

1 Introduction

Narratives and news texts often reflect societal biases and stereotypes, such as the traditional gender role that women are passive and submissive (Lakoff, 1973; Fiske, 1993; Fast et al., 2016) . The task of controllable text revision, i.e., rephrasing text to a targeted style or framing, can help correct for these biases by altering and equalizing the way Both authors contributed equally.

Powertransformer

Mey daydreams of being a doctor. Figure 1 : Examples of using connotation frames (Sap et al., 2017) for controllable revisions to portray characters with more agency and power. In the second example, "Ana strutted" implies that she is more active and decisive, compared to "Ana wandered" which portrays her as aimless and passive. people are described. For example, automatically rewriting "Mey daydreamed about being a doctor" as "Mey pursued her dream to be a doctor" portrays Mey with more authority and decisiveness ( Figure 1 ). Such controllable revision methods could be used to help reshape how gender roles are portrayed in media (e.g., through machine-in-theloop writing systems; Clark et al., 2018) .

Figure 1: Examples of using connotation frames (Sap et al., 2017) for controllable revisions to portray characters with more agency and power. In the second example, “Ana strutted” implies that she is more active and decisive, compared to “Ana wandered” which portrays her as aimless and passive.

To edit such biases out of text, a controllable rewriting model faces three key challenges. First, a model should be able to make edits beyond surfacelevel paraphrasing, as simple paraphrasing will often not adequately debias the underlying events described. For example, Mey's portrayal in Figure 1 carries both overt bias (the choice of action) and subtle bias (the framing of the action), both of which require rewriting to be adequately debiased. Second, a model's debiasing revisions should be purposeful and precise and should not make unnecessary changes to the underlying meaning of the original text. Lastly, since parallel data does not exist, models must learn to revise and debias text without supervised data, thereby preventing straightforward machine translation-style modelling.

We formulate Controllable Debiasing as a new controllable text revision task that aims to correct the implicit and possibly unwanted bias against or towards a specific character portrayed in text ( §2). As shown in Figure 1 (top), we study the portrayal biases through the lens of connotation frames of power and agency (Sap et al., 2017) , which provide pragmatic knowledge about implied power and agency levels projected onto characters by a predicate.

We create POWERTRANSFORMER, an encoderdecoder model that rewrites sentences with a desired portrayal using agency connotation frames ( §3). We combine a reconstruction and paraphrase objective into our model to overcome the lack of parallel supervised data, building off of the denoising autoencoder setup from Li et al. (2018a) . To steer the revisions, we endow the model with connotation frame knowledge both at training time using control tokens, and at generation time using agency-based vocab boosting.

Our findings show that POWERTRANSFORMER is effective at rewriting sentences with desired agency connotations while only making minimal changes to their meaning, as measured through both human and automatic evaluations ( §4). We also show that POWERTRANSFORMER significantly outperforms existing stylistic rewriting methods (Prabhumoye et al., 2018; Dathathri et al., 2020) on those metrics. Additionally, through ablations studies, we establish the usefulness of each component of the model, finding benefits from both the joint objective (47% gain in accuracy) and the agency scaling (12% gain in accuracy).

Finally, in §5, we apply Controllable Debiasing to a corpus of modern English movies (Gorinski and Lapata, 2015) as a step towards removing gender bias in character portrayal established by prior work (Sap et al., 2017) . Using POW-ERTRANSFORMER, we revise the movie scripts and significantly increase the agency levels of female characters, thereby reducing the gender bias. Our findings show promise for using modern NLP tools to help mitigate societal biases in text. We release our preprocessed data and code at http://maartensap.com/controllable-debiasing.

2 Controllable Debiasing

Controllable Debiasing is a novel formalization of stylistic rewriting that aims to debias the portrayal of characters through controllable revision. To achieve the desired character portrayal, a system must be able to change the underlying meaning of events, unlike certain formalizations (e.g., politeness transfer; Rao and Tetreault, 2018) where full meaning preservation is required. Without this, systems run the risk of merely paraphrasing the biases in text. However, revisions must be precise and avoid unnecessary meaning changes, which can often occur in stylistic rewriting (e.g., reversing the sentiment of a review drastically changes its underlying meaning).

For our new rewriting task of changing portrayal bias, we focus on connotation frames that measure the power and agency ascribed to characters through the actions they take. Connotation frames (Rashkin et al., 2016; Sap et al., 2017) distill implicit relations between a verb, its agent, and its theme. In this work, we use the positive, neutral, and negative agency dimensions, where agency is defined as the capacity to intentionally make changes or act upon one's environment (Dennett, 1989) . For example, illustrated in Figure 1 , "X pursued Y" implies that X has positive agency. 1 Using machine-in-the-loop writing systems (e.g., Ghazvininejad et al., 2016 Ghazvininejad et al., , 2017 Clark et al., 2018, Textio 2 ), models trained on this task could help authors write news, stories, or movies that portray characters in less biased ways, and thereby help mitigate the negative effects of stereotypical portrayals in media (Behm-Morawitz and Mastro, 2008; .

3 Powertransformer

We present a new approach for Controllable Debiasing called POWERTRANSFORMER, which addresses two key challenges: the paucity of parallel supervised data for training and the difficulty of incorporating fine-grained control for steering the agency of the output. Our approach ( Figure 2 ) jointly learns to reconstruct partially masked story

Figure 2: Overview of the full POWERTRANSFORMER model. An input sentence is masked for verb tokens indicative of agency. Masked inputs and target agency are used as GPT inputs. We use a joint objective using both paraphrase data and masked input sentences for training. At decoding time, we employ a vocab boosting technique to steer generations towards the target agency.

Transformer Inputs

Issa enjoyed football growing up. Figure 2 : Overview of the full POWERTRANSFORMER model. An input sentence is masked for verb tokens indicative of agency. Masked inputs and target agency are used as GPT inputs. We use a joint objective using both paraphrase data and masked input sentences for training. At decoding time, we employ a vocab boosting technique to steer generations towards the target agency.

sentences while also learning to paraphrase from an external corpus of paraphrases ( §3.2). At generation time, we also include a boosting method for fine-grained steering towards the desired agency level as described in §3.3.

3.1 Model Overview

POWERTRANSFORMER is an encoder-decoder style model with an OpenAI-GPT transformer model (Radford et al., 2018) as the base. The input sentence x is converted to a sequence of byte pair encodings (BPE) {x 1 , ..., x n }, and given to the encoder after being scrubbed of its agency markers as described below. To steer the model, we also give the encoder the target agency t, which we represent as one of three special tokens {,,}. 3

3.2 Joint Objective

We train our model on both a reconstruction and a paraphrasing task, for which inputs are masked and paraphrased versions of the output, respectively.

EQUATION (1): Not extracted; please refer to original document.

Masking and Reconstructing Inspired by the delete-retrieve-generate model from Li et al. (2018a) , this objective teaches the model to recover masked out agency-associated verbs in sentences. We first assign an agency level to an input sentence by counting verbs in the agency lexicon from Sap et al. 2017. 4 Then, we mask out all verbs indicative of the agency level, replacing them with a special token. In this setup, the target output is the original sentence x = {x 1 , ..., x n }, with the masked sentencex and the target agency level t as inputs. During training, we minimize the cross entropy of the target output sentence given the inputs:

EQUATION (2): Not extracted; please refer to original document.

Paraphrasing To go beyond reconstructing sentences, we add a paraphrasing objective using an out-of-domain paraphrase corpus ( §4.1). We extract agency levels for each sentence and its paraphrase and mask out the agency verbs in the input, using the same methods as described above. Here, the inputs are the masked sentencex and the target agency t, while the target output y = {y 1 , ..., y m } is the paraphrase. As with reconstruction, we minimize the cross entropy of the target output given the inputs:

EQUATION (3): Not extracted; please refer to original document.

l i ∈ R V

, where V is the vocabulary size) to boost the likelihood of predicting words with the target agency. The next token probabilities are then computed using the "boosted" logits:

P (y i |y

where A is a R V ×3 matrix that represents a 3dimensional {positive, equal, and negative} agency embedding for each token in the vocabulary, w is a R 3 one-hot vector denoting the target agency for the output, and β is a scalar hyperparameter representing the boosting strength. We create A manually using the verbs in the agency lexicon (Sap et al., 2017) . 5 Used only at decoding time, this method effectively increases the likelihood of using a word with the target agency level.

Table 1: Statistics for our main story sentences dataset (ROC) and for the external paraphrase corpus (Para.).
Table 2: Ablation study results on the development set. We present separate metrics for evaluating the change in agency, the meaning preservation, fluency, repetitiveness and diversity of the output (bolding the best performance). (↑) indicates that higher is better and (↓) indicates that lower is better.
Table 3: Performance of different re-writing methods on the neg-to-pos and pos-to-neg subsets of the test set (bolding the best performance). We evaluate the change in agency and the meaning preservation. As secondary metrics, we include fluency, repetitiveness, and diversity of the output.

4 Controllable Debiasing Experiments

In this section, we describe three experiments for investigating POWERTRANSFORMER performance. First, we evaluate performance of our full model and ablated baselines, using automatic metrics to quantify the effectiveness of each modelling component ( §4.4). Next, we compare our full model to baselines from related work ( §4.5). Lastly, given the limitations of automated metrics for evaluating generations Mir et al., 2019) , we obtain human judgments of model performance through crowdsourcing ( §4.6). We additionally include examples of generations in Table 4 .

Table 4: Example sentences from our dev. set, along with their revisions from various models and the achieved agency levels (Agency(out)). Examples (a)-(c) should be rewritten from high to low agency, and (d)-(f) from low to high agency. Confirming our quantitative results in Tables 2 and 3, POWERTRANSFORMER (Joint+Boost) is the most effective at making purposeful and precise changes to the input sentences to alter their agency while minimally changing their meaning. Revisions from more models are listed in Table 6 (in the appendix).

4.1 Datasets

In our experiments, we use a dataset of short stories for the reconstruction task and a parallel corpus of paraphrases for both paraphrase and reconstruction tasks. We show data statistics in Table 1 , with additional preprocessing details in Appendix A.

ROC story corpus The main focus of our study is controllable revision of story sentences; therefore, we select sentences from the ROC story corpus (ROC Mostafazadeh et al., 2016) . After extracting agency levels for all sentences from the training stories, we sample roughly equal amounts of all three agency levels, and randomly split sentences into training, development, and test sets. 6

Paraphrase corpus As additional training data, we use the corpus of automatically aligned paraphrases of TV subtitles (Creutz, 2018, Para.) . As with the ROC story corpus, we extract agency levels for each sentence and its paraphrase, then sample roughly equal amounts of pairs with all different sentence-paraphrase agency combinations (further details in §A.2). We randomly split the data into 45k train and 10k dev. instances (Table 1 ). 7

4.2 Metrics

In addition to human evaluations, we also use a variety of automated evaluation metrics to characterize different aspects of performance. We measure the accuracy of the change in agency by comparing the target agency level with that of the output (extracted using the connotation frames lexicon). As a measure of meaning preservation, we use BERTscore F1 metrics (Zhang et al., 2020) to compare the semantic similarity of the input sentence with the machine output. As additional metrics, we measure the fluency, the repetitiveness, and diversity of the output. Following previous work (Dai et al., 2019) , we measure fluency as perplexity (PPL) of the output sentence using a pre-trained GPT model that has not been fine-tuned for this task. As an additional metric of potential text degeneration, we compute the fraction of output sentences that have a bigram that is repeated two or more times (w/ rep). Finally, we compute the fraction of generations that are unique with respect to the rest of the output, to ensure diverse, input-specific generations (unique).

Main Metrics

Additional Metrics Table 2 : Ablation study results on the development set. We present separate metrics for evaluating the change in agency, the meaning preservation, fluency, repetitiveness and diversity of the output (bolding the best performance). (↑) indicates that higher is better and (↓) indicates that lower is better.

Agency Meaning Fluency Repetition Diversity POWERTRANSFORMER variants Acc (↑) BertScore (↑) PPL (↓) w/ Rep (↓) Unique (↑) (ParaOnly+noBoost

Additional Metrics Agency Meaning Fluency Repetition Diversity Acc (↑) BertScore (↑) PPL (↓) w/ rep (↓) unique (↑) PPLM (Dathathri et al., 2020) . Table 3 : Performance of different re-writing methods on the neg-to-pos and pos-to-neg subsets of the test set (bolding the best performance). We evaluate the change in agency and the meaning preservation. As secondary metrics, we include fluency, repetitiveness, and diversity of the output.

4.3 Experimental Setup

We randomize ROC story and paraphrase data, and use OpenAI GPT LM as our pretrained model. For decoding, we use top-p=0.4 nucleus sampling (Holtzman et al., 2020) , and a boosting strength of β=5 (hyperparameters and details in §B.1).

4.4 Investigating Effectiveness Of Approach

We first establish our model's effectiveness at Controllable Debiasing on our dev. set, and investigate the importance of various components in our approach through ablation analyses. For qualitative analyses, we also show example revisions in Table 4 (and Table 6 in the appendix).

Table 6: Full version of Table 4. Example revisions from various models for sentences from the dev. set. Columns are: the target change in agency from the original to the target agency, the input sentence, the model, generated output, and the actual agency level of the output measured by the connotation frame lexicon.

4.4.1 Ablated Baselines

We first investigate the importance of the reconstruction objective, by comparing our joint objective model (Joint) with a model trained with just the paraphrasing objective (without masking, ParaOnly). Then, to quantify the effect of boosting, we compare models with (Boost) and without (no-Boost) agency-specific vocab boosting. Note that ParaOnly+noBoost is equivalent to a GPT-based encoder-decoder model, similar to seq2seq frameworks commonly used in paraphrasing tasks (Cao et al., 2017; Li et al., 2018b; Prakash et al., 2016) . As a final comparison, we implement a model variant that more closely mirrors the delete-retrievegenerate paradigm (Li et al., 2018a ) by adding a "retrieve" step in which we concatenate transformer input with a verb retrieved from the verb agency lexicon that is most similar to the masked out verb (SupplyVerb). 8 8 We retrieve a verb from the Sap et al. (2017) lexicon that has the target agency and is most similar to the masked out

4.4.2 Results

In Table 2 , our results show that the full model (Joint+Boost) yields text revisions with the most accurate target agency and the most meaning preservation. In general, we find that both the joint objective and vocab boosting (Boost) substantially increase the target agency accuracy, as also illustrated in examples (d) and (e) in Table 4 . However, unsurprisingly, vocab boosting also slightly lowers fluency, yielding higher perplexities than models' nonboosted counterparts. Our results also show that using the joint objective with boosting increases the diversity of output, but causes marginally more repetition of bigrams.

Table 5: POWERTRANSFORMER hyperparameters.

Counterintuitively, our ablations show that supplying a verb to the model as an explicit retrieval step (SupplyVerb) does not improve the agency or meaning metrics and actually hurts the fluency of the output (as measured by higher perplexities). Upon qualitative investigation ( Table 6 in the appendix), the retrieved verb is often related to a different word sense of the masked verb, breaking the grammaticality of the sentence.

4.5 Comparison With External Approaches

To further validate our approach, we compare against two baselines from related style transfer and stylistic generation tasks. As these models were designed for binary style transfer, we only report our baseline and model results on the positive and negative agency portions of our data.

verb, where similarity is defined as cosine distance between word embeddings using GloVe 300-d embeddings (Pennington et al., 2014) .

Test Set Comparisons (pos-to-neg and neg-to-pos set)

4.5.1 Baselines

BST We compare to the backtranslation style transfer model from Prabhumoye et al. (2018) . This model first translates input sentences to a pivot language (preserving the meaning but losing languagespecific style), then relies on style-specific decodertranslators for generating the output sentence. We include set-up details in §B.3.

PPLM Recent work in controllable generation has introduced PPLM, a new plug-and-play technique with promising results for decoding stylistic text (Dathathri et al., 2020) . This method operates on an underlying neural language model at decoding time. It uses backpropagation from a stylistic discriminator to update the past and present hidden representations to be more consistent with the targeted style or domain. We adapt the approach to controllable revision by replacing the base language model with an autoencoder trained on a reconstruction objective, described in detail in §B.2.

4.5.2 Results

We present results in Table 3 . Our experiments show that POWERTRANSFORMER performs better than the baselines overall. Specifically, while the BST revisions obtain slightly higher accuracy on the output agency levels, these revisions have the both the lowest diversity and meaning preservation, suggesting the model ignores the input (Table 4) . PPLM shows opposite trends, yielding the lowest accuracy with high meaning preservation and high diversity of generations. Illustrated in Table 4 , this model often makes less purposeful and less concise alterations.

4.6 Evaluating With Human Judgements

To validate our automatic evaluations, we collect human judgments of the controllable revisions

Figure 3: Human judgements of target agency and meaning preservation in POWERTRANSFORMER vs. three other model variants. Selection rates >50% indicate preference towards our model.
Figure 4: Average agency levels (i.e., number of agency verbs) for female characters in original and revised scripts. POWERTRANSFORMER can revise the portrayals of female characters in movies to give them higher positive agency and lower negative agency.

4.6.1 Human Evaluation Task

We design a head-to-head 9 crowdsourcing task on Amazon Mechanical Turk where we ask raters to compare two outputs from different models given the same input sentence and target agency (see Figure 5 in the appendix). We first ask them to judge whether either output is gibberish, then, in two questions, choose which revision has better targeted agency and which better preserves the meaning of the original sentence. For consistency, each pair is rated by three judges. To ensure the quality of our evaluations, we selected workers who could reliably distinguish high from low agency sentences in a qualification task (see Figure 6 in