PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction

Xinyao Ma
Maarten Sap
Hannah Rashkin
Yejin Choi
EMNLP
2020
View in Semantic Scholar

Abstract

Unconscious biases continue to be prevalent in modern text and media, calling for algorithms that can assist writers with bias correction. For example, a female character in a story is often portrayed as passive and powerless (“_She daydreams about being a doctor_”) while a man is portrayed as more proactive and powerful (“_He pursues his dream of being a doctor_”). We formulate **Controllable Debiasing**, a new revision task that aims to rewrite a given text to correct the implicit and potentially undesirable bias in character portrayals. We then introduce PowerTransformer as an approach that debiases text through the lens of connotation frames (Sap et al., 2017), which encode pragmatic knowledge of implied power dynamics with respect to verb predicates. One key challenge of our task is the lack of parallel corpora. To address this challenge, we adopt an unsupervised approach using auxiliary supervision with related tasks such as paraphrasing and self-supervision based on a reconstruction loss, building on pretrained language models. Through comprehensive experiments based on automatic and human evaluations, we demonstrate that our approach outperforms ablations and existing methods from related tasks. Furthermore, we demonstrate the use of PowerTransformer as a step toward mitigating the well-documented gender bias in character portrayal in movie scripts.

1 Introduction

Narratives and news texts often reflect societal biases and stereotypes, such as the traditional gender role that women are passive and submissive (Lakoff, 1973; Fiske, 1993; Fast et al., 2016) . The task of controllable text revision, i.e., rephrasing text to a targeted style or framing, can help correct for these biases by altering and equalizing the way Both authors contributed equally.

Powertransformer

Mey daydreams of being a doctor. Figure 1 : Examples of using connotation frames (Sap et al., 2017) for controllable revisions to portray characters with more agency and power. In the second example, "Ana strutted" implies that she is more active and decisive, compared to "Ana wandered" which portrays her as aimless and passive. people are described. For example, automatically rewriting "Mey daydreamed about being a doctor" as "Mey pursued her dream to be a doctor" portrays Mey with more authority and decisiveness ( Figure 1 ). Such controllable revision methods could be used to help reshape how gender roles are portrayed in media (e.g., through machine-in-theloop writing systems; Clark et al., 2018) .

Figure 1: Examples of using connotation frames (Sap et al., 2017) for controllable revisions to portray characters with more agency and power. In the second example, “Ana strutted” implies that she is more active and decisive, compared to “Ana wandered” which portrays her as aimless and passive.

To edit such biases out of text, a controllable rewriting model faces three key challenges. First, a model should be able to make edits beyond surfacelevel paraphrasing, as simple paraphrasing will often not adequately debias the underlying events described. For example, Mey's portrayal in Figure 1 carries both overt bias (the choice of action) and subtle bias (the framing of the action), both of which require rewriting to be adequately debiased. Second, a model's debiasing revisions should be purposeful and precise and should not make unnecessary changes to the underlying meaning of the original text. Lastly, since parallel data does not exist, models must learn to revise and debias text without supervised data, thereby preventing straightforward machine translation-style modelling.

We formulate Controllable Debiasing as a new controllable text revision task that aims to correct the implicit and possibly unwanted bias against or towards a specific character portrayed in text ( §2). As shown in Figure 1 (top), we study the portrayal biases through the lens of connotation frames of power and agency (Sap et al., 2017) , which provide pragmatic knowledge about implied power and agency levels projected onto characters by a predicate.

We create POWERTRANSFORMER, an encoderdecoder model that rewrites sentences with a desired portrayal using agency connotation frames ( §3). We combine a reconstruction and paraphrase objective into our model to overcome the lack of parallel supervised data, building off of the denoising autoencoder setup from Li et al. (2018a) . To steer the revisions, we endow the model with connotation frame knowledge both at training time using control tokens, and at generation time using agency-based vocab boosting.

Our findings show that POWERTRANSFORMER is effective at rewriting sentences with desired agency connotations while only making minimal changes to their meaning, as measured through both human and automatic evaluations ( §4). We also show that POWERTRANSFORMER significantly outperforms existing stylistic rewriting methods (Prabhumoye et al., 2018; Dathathri et al., 2020) on those metrics. Additionally, through ablations studies, we establish the usefulness of each component of the model, finding benefits from both the joint objective (47% gain in accuracy) and the agency scaling (12% gain in accuracy).

Finally, in §5, we apply Controllable Debiasing to a corpus of modern English movies (Gorinski and Lapata, 2015) as a step towards removing gender bias in character portrayal established by prior work (Sap et al., 2017) . Using POW-ERTRANSFORMER, we revise the movie scripts and significantly increase the agency levels of female characters, thereby reducing the gender bias. Our findings show promise for using modern NLP tools to help mitigate societal biases in text. We release our preprocessed data and code at http://maartensap.com/controllable-debiasing.

2 Controllable Debiasing

Controllable Debiasing is a novel formalization of stylistic rewriting that aims to debias the portrayal of characters through controllable revision. To achieve the desired character portrayal, a system must be able to change the underlying meaning of events, unlike certain formalizations (e.g., politeness transfer; Rao and Tetreault, 2018) where full meaning preservation is required. Without this, systems run the risk of merely paraphrasing the biases in text. However, revisions must be precise and avoid unnecessary meaning changes, which can often occur in stylistic rewriting (e.g., reversing the sentiment of a review drastically changes its underlying meaning).

For our new rewriting task of changing portrayal bias, we focus on connotation frames that measure the power and agency ascribed to characters through the actions they take. Connotation frames (Rashkin et al., 2016; Sap et al., 2017) distill implicit relations between a verb, its agent, and its theme. In this work, we use the positive, neutral, and negative agency dimensions, where agency is defined as the capacity to intentionally make changes or act upon one's environment (Dennett, 1989) . For example, illustrated in Figure 1 , "X pursued Y" implies that X has positive agency. 1 Using machine-in-the-loop writing systems (e.g., Ghazvininejad et al., 2016 Ghazvininejad et al., , 2017 Clark et al., 2018, Textio 2 ), models trained on this task could help authors write news, stories, or movies that portray characters in less biased ways, and thereby help mitigate the negative effects of stereotypical portrayals in media (Behm-Morawitz and Mastro, 2008; .

3 Powertransformer

We present a new approach for Controllable Debiasing called POWERTRANSFORMER, which addresses two key challenges: the paucity of parallel supervised data for training and the difficulty of incorporating fine-grained control for steering the agency of the output. Our approach ( Figure 2 ) jointly learns to reconstruct partially masked story

Figure 2: Overview of the full POWERTRANSFORMER model. An input sentence is masked for verb tokens indicative of agency. Masked inputs and target agency are used as GPT inputs. We use a joint objective using both paraphrase data and masked input sentences for training. At decoding time, we employ a vocab boosting technique to steer generations towards the target agency.

Transformer Inputs

Issa enjoyed football growing up. Figure 2 : Overview of the full POWERTRANSFORMER model. An input sentence is masked for verb tokens indicative of agency. Masked inputs and target agency are used as GPT inputs. We use a joint objective using both paraphrase data and masked input sentences for training. At decoding time, we employ a vocab boosting technique to steer generations towards the target agency.

sentences while also learning to paraphrase from an external corpus of paraphrases ( §3.2). At generation time, we also include a boosting method for fine-grained steering towards the desired agency level as described in §3.3.

3.1 Model Overview

POWERTRANSFORMER is an encoder-decoder style model with an OpenAI-GPT transformer model (Radford et al., 2018) as the base. The input sentence x is converted to a sequence of byte pair encodings (BPE) {x 1 , ..., x n }, and given to the encoder after being scrubbed of its agency markers as described below. To steer the model, we also give the encoder the target agency t, which we represent as one of three special tokens {,,}. 3

3.2 Joint Objective

We train our model on both a reconstruction and a paraphrasing task, for which inputs are masked and paraphrased versions of the output, respectively.

EQUATION (1): Not extracted; please refer to original document.

Masking and Reconstructing Inspired by the delete-retrieve-generate model from Li et al. (2018a) , this objective teaches the model to recover masked out agency-associated verbs in sentences. We first assign an agency level to an input sentence by counting verbs in the agency lexicon from Sap et al. 2017. 4 Then, we mask out all verbs indicative of the agency level, replacing them with a special token. In this setup, the target output is the original sentence x = {x 1 , ..., x n }, with the masked sentencex and the target agency level t as inputs. During training, we minimize the cross entropy of the target output sentence given the inputs:

EQUATION (2): Not extracted; please refer to original document.

Paraphrasing To go beyond reconstructing sentences, we add a paraphrasing objective using an out-of-domain paraphrase corpus ( §4.1). We extract agency levels for each sentence and its paraphrase and mask out the agency verbs in the input, using the same methods as described above. Here, the inputs are the masked sentencex and the target agency t, while the target output y = {y 1 , ..., y m } is the paraphrase. As with reconstruction, we minimize the cross entropy of the target output given the inputs:

EQUATION (3): Not extracted; please refer to original document.

l i ∈ R V

, where V is the vocabulary size) to boost the likelihood of predicting words with the target agency. The next token probabilities are then computed using the "boosted" logits:

P (y i |y

where A is a R V ×3 matrix that represents a 3dimensional {positive, equal, and negative} agency embedding for each token in the vocabulary, w is a R 3 one-hot vector denoting the target agency for the output, and β is a scalar hyperparameter representing the boosting strength. We create A manually using the verbs in the agency lexicon (Sap et al., 2017) . 5 Used only at decoding time, this method effectively increases the likelihood of using a word with the target agency level.

Table 1: Statistics for our main story sentences dataset (ROC) and for the external paraphrase corpus (Para.).

Table 2: Ablation study results on the development set. We present separate metrics for evaluating the change in agency, the meaning preservation, fluency, repetitiveness and diversity of the output (bolding the best performance). (↑) indicates that higher is better and (↓) indicates that lower is better.

Table 3: Performance of different re-writing methods on the neg-to-pos and pos-to-neg subsets of the test set (bolding the best performance). We evaluate the change in agency and the meaning preservation. As secondary metrics, we include fluency, repetitiveness, and diversity of the output.

4 Controllable Debiasing Experiments

In this section, we describe three experiments for investigating POWERTRANSFORMER performance. First, we evaluate performance of our full model and ablated baselines, using automatic metrics to quantify the effectiveness of each modelling component ( §4.4). Next, we compare our full model to baselines from related work ( §4.5). Lastly, given the limitations of automated metrics for evaluating generations Mir et al., 2019) , we obtain human judgments of model performance through crowdsourcing ( §4.6). We additionally include examples of generations in Table 4 .

Table 4: Example sentences from our dev. set, along with their revisions from various models and the achieved agency levels (Agency(out)). Examples (a)-(c) should be rewritten from high to low agency, and (d)-(f) from low to high agency. Confirming our quantitative results in Tables 2 and 3, POWERTRANSFORMER (Joint+Boost) is the most effective at making purposeful and precise changes to the input sentences to alter their agency while minimally changing their meaning. Revisions from more models are listed in Table 6 (in the appendix).

4.1 Datasets

In our experiments, we use a dataset of short stories for the reconstruction task and a parallel corpus of paraphrases for both paraphrase and reconstruction tasks. We show data statistics in Table 1 , with additional preprocessing details in Appendix A.

ROC story corpus The main focus of our study is controllable revision of story sentences; therefore, we select sentences from the ROC story corpus (ROC Mostafazadeh et al., 2016) . After extracting agency levels for all sentences from the training stories, we sample roughly equal amounts of all three agency levels, and randomly split sentences into training, development, and test sets. 6

Paraphrase corpus As additional training data, we use the corpus of automatically aligned paraphrases of TV subtitles (Creutz, 2018, Para.) . As with the ROC story corpus, we extract agency levels for each sentence and its paraphrase, then sample roughly equal amounts of pairs with all different sentence-paraphrase agency combinations (further details in §A.2). We randomly split the data into 45k train and 10k dev. instances (Table 1 ). 7

4.2 Metrics

In addition to human evaluations, we also use a variety of automated evaluation metrics to characterize different aspects of performance. We measure the accuracy of the change in agency by comparing the target agency level with that of the output (extracted using the connotation frames lexicon). As a measure of meaning preservation, we use BERTscore F1 metrics (Zhang et al., 2020) to compare the semantic similarity of the input sentence with the machine output. As additional metrics, we measure the fluency, the repetitiveness, and diversity of the output. Following previous work (Dai et al., 2019) , we measure fluency as perplexity (PPL) of the output sentence using a pre-trained GPT model that has not been fine-tuned for this task. As an additional metric of potential text degeneration, we compute the fraction of output sentences that have a bigram that is repeated two or more times (w/ rep). Finally, we compute the fraction of generations that are unique with respect to the rest of the output, to ensure diverse, input-specific generations (unique).

Main Metrics

Additional Metrics Table 2 : Ablation study results on the development set. We present separate metrics for evaluating the change in agency, the meaning preservation, fluency, repetitiveness and diversity of the output (bolding the best performance). (↑) indicates that higher is better and (↓) indicates that lower is better.

Agency Meaning Fluency Repetition Diversity POWERTRANSFORMER variants Acc (↑) BertScore (↑) PPL (↓) w/ Rep (↓) Unique (↑) (ParaOnly+noBoost

Additional Metrics Agency Meaning Fluency Repetition Diversity Acc (↑) BertScore (↑) PPL (↓) w/ rep (↓) unique (↑) PPLM (Dathathri et al., 2020) . Table 3 : Performance of different re-writing methods on the neg-to-pos and pos-to-neg subsets of the test set (bolding the best performance). We evaluate the change in agency and the meaning preservation. As secondary metrics, we include fluency, repetitiveness, and diversity of the output.

4.3 Experimental Setup

We randomize ROC story and paraphrase data, and use OpenAI GPT LM as our pretrained model. For decoding, we use top-p=0.4 nucleus sampling (Holtzman et al., 2020) , and a boosting strength of β=5 (hyperparameters and details in §B.1).

4.4 Investigating Effectiveness Of Approach

We first establish our model's effectiveness at Controllable Debiasing on our dev. set, and investigate the importance of various components in our approach through ablation analyses. For qualitative analyses, we also show example revisions in Table 4 (and Table 6 in the appendix).

Table 6: Full version of Table 4. Example revisions from various models for sentences from the dev. set. Columns are: the target change in agency from the original to the target agency, the input sentence, the model, generated output, and the actual agency level of the output measured by the connotation frame lexicon.

4.4.1 Ablated Baselines

We first investigate the importance of the reconstruction objective, by comparing our joint objective model (Joint) with a model trained with just the paraphrasing objective (without masking, ParaOnly). Then, to quantify the effect of boosting, we compare models with (Boost) and without (no-Boost) agency-specific vocab boosting. Note that ParaOnly+noBoost is equivalent to a GPT-based encoder-decoder model, similar to seq2seq frameworks commonly used in paraphrasing tasks (Cao et al., 2017; Li et al., 2018b; Prakash et al., 2016) . As a final comparison, we implement a model variant that more closely mirrors the delete-retrievegenerate paradigm (Li et al., 2018a ) by adding a "retrieve" step in which we concatenate transformer input with a verb retrieved from the verb agency lexicon that is most similar to the masked out verb (SupplyVerb). 8 8 We retrieve a verb from the Sap et al. (2017) lexicon that has the target agency and is most similar to the masked out

4.4.2 Results

In Table 2 , our results show that the full model (Joint+Boost) yields text revisions with the most accurate target agency and the most meaning preservation. In general, we find that both the joint objective and vocab boosting (Boost) substantially increase the target agency accuracy, as also illustrated in examples (d) and (e) in Table 4 . However, unsurprisingly, vocab boosting also slightly lowers fluency, yielding higher perplexities than models' nonboosted counterparts. Our results also show that using the joint objective with boosting increases the diversity of output, but causes marginally more repetition of bigrams.

Table 5: POWERTRANSFORMER hyperparameters.

Counterintuitively, our ablations show that supplying a verb to the model as an explicit retrieval step (SupplyVerb) does not improve the agency or meaning metrics and actually hurts the fluency of the output (as measured by higher perplexities). Upon qualitative investigation ( Table 6 in the appendix), the retrieved verb is often related to a different word sense of the masked verb, breaking the grammaticality of the sentence.

4.5 Comparison With External Approaches

To further validate our approach, we compare against two baselines from related style transfer and stylistic generation tasks. As these models were designed for binary style transfer, we only report our baseline and model results on the positive and negative agency portions of our data.

verb, where similarity is defined as cosine distance between word embeddings using GloVe 300-d embeddings (Pennington et al., 2014) .

Test Set Comparisons (pos-to-neg and neg-to-pos set)

4.5.1 Baselines

BST We compare to the backtranslation style transfer model from Prabhumoye et al. (2018) . This model first translates input sentences to a pivot language (preserving the meaning but losing languagespecific style), then relies on style-specific decodertranslators for generating the output sentence. We include set-up details in §B.3.

PPLM Recent work in controllable generation has introduced PPLM, a new plug-and-play technique with promising results for decoding stylistic text (Dathathri et al., 2020) . This method operates on an underlying neural language model at decoding time. It uses backpropagation from a stylistic discriminator to update the past and present hidden representations to be more consistent with the targeted style or domain. We adapt the approach to controllable revision by replacing the base language model with an autoencoder trained on a reconstruction objective, described in detail in §B.2.

4.5.2 Results

We present results in Table 3 . Our experiments show that POWERTRANSFORMER performs better than the baselines overall. Specifically, while the BST revisions obtain slightly higher accuracy on the output agency levels, these revisions have the both the lowest diversity and meaning preservation, suggesting the model ignores the input (Table 4) . PPLM shows opposite trends, yielding the lowest accuracy with high meaning preservation and high diversity of generations. Illustrated in Table 4 , this model often makes less purposeful and less concise alterations.

4.6 Evaluating With Human Judgements

To validate our automatic evaluations, we collect human judgments of the controllable revisions

Figure 3: Human judgements of target agency and meaning preservation in POWERTRANSFORMER vs. three other model variants. Selection rates >50% indicate preference towards our model.

Figure 4: Average agency levels (i.e., number of agency verbs) for female characters in original and revised scripts. POWERTRANSFORMER can revise the portrayals of female characters in movies to give them higher positive agency and lower negative agency.

4.6.1 Human Evaluation Task

We design a head-to-head 9 crowdsourcing task on Amazon Mechanical Turk where we ask raters to compare two outputs from different models given the same input sentence and target agency (see Figure 5 in the appendix). We first ask them to judge whether either output is gibberish, then, in two questions, choose which revision has better targeted agency and which better preserves the meaning of the original sentence. For consistency, each pair is rated by three judges. To ensure the quality of our evaluations, we selected workers who could reliably distinguish high from low agency sentences in a qualification task (see Figure 6 in

Figure 5: Screenshot of the human evaluation annotation task.

Figure 6: Screenshot of the qualification task and its instructions. In the real task, workers rated three pairs of sentences, but only one is shown here.

A friend asked me to watch her two year old child for a minute.

PPLM a friend asked me to watch her two year old child for a minute.

+

BST l didn 't have a word of this , you 're .

-POWERTJoint+NoBoost a friend needed me to watch her two year old child for a minute.

-POWERTJoint+Boost a friend needed me to watch her two year old child for a minute.

-

After filling in the data it looked quite sharp.

PPLM before filling the last question it it it it looked quite sharp. before filling the last question it it + BST when the 't you want a word ? -POWERTJoint+NoBoost after analyzing in the data it looked quite sharp. = POWERTJoint+Boost after seeing in the data it seemed quite sharp.

- Table 4 : Example sentences from our dev. set, along with their revisions from various models and the achieved agency levels (Agency(out)). Examples (a)-(c) should be rewritten from high to low agency, and (d)-(f) from low to high agency. Confirming our quantitative results in Tables 2 and 3 , POWERTRANSFORMER (Joint+Boost) is the most effective at making purposeful and precise changes to the input sentences to alter their agency while minimally changing their meaning. Revisions from more models are listed in Table 6 (in the appendix).

EQUATION (d): Not extracted; please refer to original document.

the appendix). For this evaluation, we generate three revisionsone for each target agency level-for a random subset of 100 test examples. We compare the output of our full POWERTRANSFORMER model with two external baselines (PPLM and BST). For further comparison, we also include the most competitive ablated baseline from Table 2 (i.e., Joint+noBoost).

BST l didn 't have a word of this , you 're .

-POWERT P araOnly+N oBoost a friend asked me to watch her two year old child for a minute.

POWERT P araOnly+Boost a friend asked me to watch her two year old child for a minute.

POWERT Joint+SupplyV erb a friend told me to watch her two year old child for a minute.

POWERTJoint+NoBoost a friend needed me to watch her two year old child for a minute.

-POWERTJoint+Boost a friend needed me to watch her two year old child for a minute.

-(c) + → -After filling in the data it looked quite sharp.

PPLM Before filling the last question it it it it looked quite sharp. Before filling the last question it it + BST when the 't you want a word ? -POWERT P araOnly+N oBoost after filling in the data it looked quite sharp. + POWERT P araOnly+Boost after filling in the data it seemed quite sharp. + POWERT Joint+SupplyV erb after putting in the data it looked quite sharp. = POWERTJoint+NoBoost after analyzing in the data it looked quite sharp. = POWERTJoint+Boost after seeing in the data it seemed quite sharp.

-

(d) -→ +

Allie was failing science class.

PPLM Allie was failing science grade.

-BST do you want me ? + POWERT P araOnly+N oBoost allie was failing science class.

-POWERT P araOnly+Boost allie was failing science class.

-POWERT Joint+SupplyV erb allie was ignoring science class. + POWERTJoint+NoBoost allie was failing science class.

-POWERTJoint+Boost allie was taking science class. + (e) -→ + Darla wanted a soft drink.

PPLM darla wants a hard hard drink. -BST don 't take me a man . + POWERT P araOnly+N oBoost darla wanted a soft drink.

-POWERT P araOnly+Boost darla wanted a soft drink.

-POWERT Joint+SupplyV erb darla got a soft drink. + POWERTJoint+NoBoost darla ordered a soft drink. + POWERTJoint+Boost darla ordered a soft drink. + (f) -→ + Clint paused on the trail.

PPLM clint was on the trail. BST don 't you want me , -POWERT P araOnly+N oBoost clint paused on the trail.

-POWERT P araOnly+Boost clint stopped on the trail. + POWERT Joint+SupplyV erb clint walked on the trail. + POWERTJoint+NoBoost clint hiked on the trail. = POWERTJoint+Boost clint walked on the trail heading down. + Table 6 : Full version of Table 4 . Example revisions from various models for sentences from the dev. set. Columns are: the target change in agency from the original to the target agency, the input sentence, the model, generated output, and the actual agency level of the output measured by the connotation frame lexicon.

4.6.2 Results

In Figure 3 , we show the percentages of times in which POWERTRANSFORMER was preferred over the three baseline models. 10 Percentages >50% indicate a preference towards POWER-TRANSFORMER.

Overall, the sentence revisions by POWER-TRANSFORMER are preferred over all of the baselines in obtaining the desired agency level. For 10 Judgments in our evaluation task had an average pairwise agreement of 75% (Krippendorf's α=.52).

meaning preservation, our model is always selected over BST, mirroring BertScores in Table 3 . The difference is less stark when comparing to PPLM which sometimes makes no changes or irrelevant changes to the input sentence, and reversed when comparing to the ablated noBoost.

Additionally, BST revisions were marked as gibberish substantially more than those by other models (63% vs. 3-7%). While this seemingly contradicts BST's low perplexity scores, this is in line with previous work showing automatic fluency metrics can favor degenerate, bland, or repetitive language (Holtzman et al., 2020) .

5 Gender Bias In Movies

As a proof-of-concept of Controllable Debiasing, we investigate whether gender biases in portrayals of movie characters can be mitigated using POW-ERTRANSFORMER.

5.1 Movie Scripts Corpus

We draw our data from the 767 modern English movie scripts by Gorinski and Lapata (2015) , focusing on the narrations which describe characters and their actions (as opposed to the character's dialogue utterances). Described in further detail in §C in the appendix, we automatically extract characters and assign them a binary 11 gender (man, woman) using a list of highly gendered names (e.g., "Sarah", "William") and a list of gendered words (e.g., "waiter," "waitress"). Following previous work (Ramakrishna et al., 2017; Sap et al., 2017) , we assign narration sentences to characters if their name appears in them.

Our corpus contains 16,763 characters from 767 different English movies. Of those characters, 68% are inferred to be men and only 32% to be women, 12 consistent with known gender skews in movie characters (Google, 2017) . This bias in representation is also present at the narrative level. Specifically, female characters are only mentioned in n narr,f =27 narrations on average, compared to n narr,m =34 narrations for male characters (Cohen's |d| = 0.13, p < 0.001). Similarly, compared to their male counterparts, female characters are described in significantly fewer words (n words,f = 329, n words,m = 435, |d| = 0.14, p < 0.001) and with fewer verbs (n verbs,f = 41, n verbs,m = 54, |d| = 0.13, p < 0.001).

5.2 Debiasing Portrayal In Movies

Given the known bias that female characters are portrayed with less agency (Sap et al., 2017) , our goal is to re-balance their agency levels to be more on par with those of male characters. Therefore, we revise only the sentences describing female characters to have higher agency, using POWERTRANS-FORMER. Then we extract connotation frames of agency for revised script sentences, and aggregate per character. Shown in Figure 4 , revisions successfully increase the instances of positive agency of female characters, and decrease their negative agency or passiveness.

We further examine the change in gender association of positive and negative agency, to verify the effectiveness of Controllable Debiasing. We first count all the positive and negative agency verbs used to describe characters (in original or rewritten sentences). Following Sap et al. 2017, we then fit a logistic regression model to quantify the association between character's gender with their agency levels, controlling for their number of words, verbs, and narrations. For better interpretation of the β coefficients, we z-score all the continuous variables. We confirm that indeed, Controllable Debiasing using POWERTRANSFORMER can reverse the bias in portrayal in movies. In original scripts, male characters were portrayed with significantly higher positive agency (β pos = 1.2, p < 0.001) and lower negative agency (β neg = −0.3, p < 0.001) than female characters. However, our model successfully reverses this gender bias, portraying women with significantly more positive agency (β pos = −62.6, p < 0.001) and significantly less negative agency (β neg = 8.7, p < 0.001).

Our findings on movie scripts show the promise of using Controllable Debiasing to successfully mitigate gender biases in portrayal of characters, which could be extended to other domains (e.g., news or fiction, Fast et al., 2016) . Additionally, future work could consider alternative views of portrayal biases (e.g., "regard" or bias directed at different demographic groups; Sheng et al., 2019; Sap et al., 2020) , or use more holistic views of gender roles (e.g., "masculine default" cultures; Cheryan and Markus, 2020).

6 Related Work

Controllable Debiasing is a new formalization of the unsupervised stylistic rewriting task, contrasting with supervised approaches which benefit from parallel corpora (e.g., Xu et al., 2012 Xu et al., , 2015 Rao and Tetreault, 2018; Pryzant et al., 2020) . In unsupervised settings, a majority of work has dealt with the dearth of parallel data by using encoderdecoder setups paired with discriminators to disentangle style from content and steer generations (e.g., Shen et al., 2017; Zhang et al., 2018; Fu et al., 2018; Yang et al., 2018; Niu and Bansal, 2018; Romanov et al., 2019; Dai et al., 2019; John et al., 2019) or backtranslation setups (Prabhumoye et al., 2018; Lample et al., 2018) . In contrast, Li et al. (2018a) introduce a modular approach (later adapted to transformer models by Sudhakar et al., 2019 ) that relies on drop-in replacement of attribute markers followed by language correction. POWER-TRANSFORMER improves on this approach with an additional out-of-domain paraphrasing objective.

While a majority of related existing stylistic rewriting work defines style as sentiment (e.g., on reviews), a notable exception is Nogueira dos Santos et al. 2018, who use stylistic rewriting to make text less hateful or offensive. Similar in spirit, Controllable Debiasing is a novel formalization that aims to address and revise social biases expressed in text, but using the nuanced implications distilled in connotation frames of power and agency instead of binary offensiveness.

Our work also draws inspiration from controllable generation methods (e.g., Koncel-Kedziorski et al., 2016; Hu et al., 2017; Ficler and Goldberg, 2017) . While those methods steer the generation output to contain desired attributes, controllable revision is constrained to revise an input sentence in addition to generating with desired attributes.

7 Conclusion

We introduce a new text revision task of Controllable Debiasing, to help debias the portrayal of characters through the lens of connotation frames of power and agency. To this end, we create POW-ERTRANSFORMER, a transformer-based encoderdecoder trained on a joint reconstruction and paraphrasing objective. Our approach demonstrates promising results to revise sentences with targeted power and agency, and outperforms ablations and baselines on both automatic and human evaluations. Finally, as a case study, we show the feasibility for Controllable Debiasing at debiasing the portrayal of characters in movie scripts. Our findings highlight the potential of neural models as a tool for editing out social biases in text. is 1e-5 with AdamW optimizer, which is tuned manually in the [1e-6, 1e-3] range for 7 times. We use p = 0.4 for nucleus sampling and p is tuned manually in the [0.4, 0.9] range for 5 values.

B.1.2 Powert P Araonly+Static

The POWERT P araOnly+Static loads the trained model from POWERT P araOnly+N one and add rescaling to the logits. The re-scaling factor, β was tuned manually tuned in the [0, 10] range. We try 8 βs and use 5 in the final model. We use the same p as POWERT P araOnly+N one

B.1.3 Powert Joint+N One

Similar to POWERT P araOnly+N one , we train this model for 10 epochs with each epoch taking approximately an hour. The learning rate is 1e-5 with AdamW optimizer, which is tuned manually in the [1e-6, 1e-3] range for 7 times.We use the same p as POWERT P araOnly+N one

B.1.4 Powert Joint+Static

The POWERT Joint+Static loads the trained model from POWERT Joint+N one and add re-scaling to the logits. The re-scaling factor, β was tuned manually tuned in the [0, 10] range. We try 8 βs and use 5 in the final model. We use the same p as POWERT P araOnly+N one

B.2 Pplm Details

The PPLM decoding method can be used on top of any model, but their original codebase is for use with a pre-trained language model rather than a model for paraphrasing or style transfer. We augment their techniques for this task by replacing the base model in their code with a denoising autoencoder that was trained to reconstruct the input sentence. The denoising autoencoder was implemented using the base GPT2 model (to fit with their code library and be similar size to our model). It was trained on our ROC only training data with a reconstruction objective. In order to denoise the autoencoder, we randomly "dropout" about 50% of the tokens from the context by replacing them with mask tokens. This autoencoder is trained to reconstruct input sentences, but when used with the PPLM decoding method, the input gets dynamically updated to decode a sentence that is similar in meaning but more likely to have a positive/negative agency according to a discriminator that is trained on top of the autoencoder. The PPLM decoding method also has hyperparameters that control the strength of the target label. If set too high, then the output could be degenerate. We manually set the hyperparameters to be as strong possible without producing degenerate text, using a subset of the dev. set as a guide.

B.3 Backtranslation Details

We use the code provided by Prabhumoye et al. 2018for running this baseline. After lowercasing all the negative and positive agency examples in our training data (ROC and OpusParcus), we translate to French using the machine translation model provided in the code base. This baseline requires training a style classifier (agency) and two decoders (one for each agency level). Since the classifier essentially re-learns the agency lexicon, we do not search for hyperparameters, and simply set a learning rate of 5, and 6 epochs. For training the decoders, we perform grid search to find the best hyperparameters. We experiment with a learning rates of {0.5, 1, 2, 5}, {2, 3, 5} epochs, a classification-loss weight of {0.5, 1, 2}, and a word-loss weight of {0.5, 1, 2}, and select the configuration with the best word-level accuracy on the dev. set. We use SGD with a batch size of 64 for all experiments, and refer the reader to the code base for other default parameters.

C.1 Extracting Gender From Characters

The movie scripts mention characters in all caps, making it easy to identify and extract them. We then cross reference the name (or, description for unnamed characters, e.g., "the doorman") with a list of gendered names 14 and gendered words (e.g., "waitress," "policeman," "police woman"). To allow for better rewriting using our model, we split Task Q1: Which of these portrays the main person so they have the highest agency (regardless of meaning preservation)?

If there are multiple characters in the sentence, usually the ones referred to by pronouns (he, she, etc.) are the main characters.

Revision A Alex loves watching football.

Revision B Alex loves to play football.

Q2: Which do you think is closer in meaning to the original sentence (regardless of agency change)?

Pick the sentence that has the general events and measing closest to the original.

Revision A Alex loves watching football.

Revision B Alex loves to play football.

Submit

Original Sentence:

Alex loves football.

Revisions:

Revision A:

Alex loves watching football. Sentence B she decided to go and the la and the de 2) Now, let's rank them in terms of agency level:

Q2: Which of these portrays the main person so they have the highest agency?

Sentence A Yolanda hates roller coasters.

Sentence B she decided to go and the la and the de

Instructions

Thanks for participating in this qual task! Your job is to:

Read a pair of sentences Select which ones portray the main character with the highest agency vs. the lowest agency.

What Is Agency

Agency: The agency level is how active, decisive, or powerful the main person in the sentence is. For example, someone with high agency is: actively participating in events has a lot of power or ability to shape their own future pro-active in making their own decisions

Background

We are trying to test out a few automatic systems for automatically generating sentences, and want to see how they portray characters / people in sentences. Machines are not as good at understanding nuanced concepts like agency, so your help is crucial and very much appreciated!

Sentence Agency Level Explanation

Alex answered a phone call. low agency Alex picked up the phone but did not actively initiate the conversation. Alex waited around all day while the TV played.

low agency Alex was not actively participating in actions.

Alex received a book from their friend. low agency

Alex is portrayed passively receiving things not actively asking for the book. Alex calls their friend.

high agency Alex initiated a conversation.

Alex did most of the work by themselves. high agency Alex is taking charge of the situation.

Alex took a book from the friend. high agency Alex is actively participating in borrowing the book.

Sentence A: Yolanda hates roller coasters.

Sentence B: she decided to go and the la and the de Figure 6 : Screenshot of the qualification task and its instructions. In the real task, workers rated three pairs of sentences, but only one is shown here. + POWERT P araOnly+Boost after the party i headed home. + POWERT Joint+SupplyV erb after the party i faced home.

-POWERTJoint+NoBoost after the party i stayed home.

-POWERTJoint+Boost after the party i stayed home.

-

(b) + → -

A friend asked me to watch her two year old child for a minute.

PPLM A Friend asked me to watch her two year old child for a minute.

Future work could explore using the power dimension instead of agency, or alternative operationalizations of biases, e.g., Social Bias Frames(Sap et al., 2020) or regard towards minorities as introduced by Sheng et al. (2019).2 https://textio.com/

In earlier experiments, we also provided the original agency as an input to the model during training and decoding, but found that it made little difference in performance.

For sentences that have multiple verbs, we assign the agency level that the most verbs in the sentence have (e.g., a sentence with two positive agency verbs and one negative agency verb will be assigned positive agency).

Since our model operates on BPE tokens, we manually set the first BPE token of every tense of every verb to the desired agency. We also experimented with learning A from data, but found no improvement over manually setting it.

We use a 80:13:7 train, development, test ratio.7 Since this is just additional training data, we do not test our models on this corpus, but do use the dev. set for selecting some hyperparameters.

We use head-to-head evaluations as those have been shown to be more reliable than scale-rating evaluations (Kiritchenko and Mohammad, 2017).

Note that gender is a social construct that goes beyond the man-woman binary(Lorber et al., 1991), however more inclusive analyses (e.g., with non-binary genders) are not possible given the limited information about the individuals mentioned in our data.12 There were 2597 characters for which the gender could not be inferred.

From http://www.opensubtitles.org

http://www.cs.cmu.edu/Groups/AI/util/areas/ nlp/corpora/names/0.html the narratives into sentences (using NLTK's sentence tokenizerBird et al., 2009), and assign each sentence to a character if their name appears in the sentence.

References

Colin Bannard and Chris Callison-Burch. 2005. Para- phrasing with bilingual parallel corpora. In ACL.

Elizabeth Behm-Morawitz and Dana E Mastro. 2008. Mean girls? the influence of gender portrayals in teen movies on emerging adults' gender-based atti- tudes and beliefs. Journalism & Mass Communica- tion Quarterly, 85(1):131-146.
Return to section: 2 Controllable Debiasing

Ethan Fast, Tina Vachovsky, and Michael S Bernstein. 2016. Shirtless and dangerous: Quantifying linguis- tic signals of gender bias in an online fiction writing community. In ICWSM.
Return to section: 1 Introduction, 5.2 Debiasing Portrayal In Movies

Jessica Ficler and Yoav Goldberg. 2017. Controlling linguistic style aspects in neural language genera- tion. In EMNLP Workshop on Stylistic Variation.
Return to section: 6 Related Work

Anjalie Field, Gayatri Bhat, and Yulia Tsvetkov. 2019. Contextual affective analysis: A case study of peo- ple portrayals in online #metoo stories. In ICWSM.

Anjalie Field and Yulia Tsvetkov. 2019. Entity-centric contextual affective analysis. In ACL.

Susan T Fiske. 1993. Controlling other people. the im- pact of power on stereotyping. American psycholo- gist, 48(6):621-628.
Return to section: 1 Introduction

Zhenxin Fu, Xiaoye Tan, Nanyun Peng, Dongyan Zhao, and Rui Yan. 2018. Style transfer in text: Explo- ration and evaluation. In AAAI.
Return to section: 6 Related Work

Marjan Ghazvininejad, Xing Shi, Yejin Choi, and Kevin Knight. 2016. Generating topical poetry. In EMNLP.
Return to section: 2 Controllable Debiasing

Marjan Ghazvininejad, Xing Shi, Jay Priyadarshi, and Kevin Knight. 2017. Hafez: an interactive poetry generation system. In ACL Demonstrations.
Return to section: 2 Controllable Debiasing

Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, and Stefan Scherer. 2017. Affect-LM: A neural language model for customiz- able affective text generation. In ACL.

Google. 2017. Using technology to address gender bias in film. https://www.google.com/about/main/gender-equality-films/index.html.
Return to section: 5.1 Movie Scripts Corpus

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyz- ing text with the natural language toolkit. " O'Reilly Media, Inc.".

Philip Gorinski and Mirella Lapata. 2015. Movie script summarization as graph-based scene extraction. In NAACL.
Return to section: 5.1 Movie Scripts Corpus

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The curious case of neural text degeneration. In ICLR.
Return to section: 4.3 Experimental Setup, 4.6.2 Results

Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. 2017. Toward con- trolled generation of text. In ICML.
Return to section: 6 Related Work

Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga Vechtomova. 2019. Disentangled representation learning for non-parallel text style transfer. In ACL.
Return to section: 6 Related Work

Svetlana Kiritchenko and Saif Mohammad. 2017. Best- worst scaling more reliable than rating scales: A case study on sentiment intensity annotation. In ACL.

Rik Koncel-Kedziorski, Ioannis Konstas, Luke Zettle- moyer, and Hannaneh Hajishirzi. 2016. A theme- rewriting approach for generating algebra word problems. In EMNLP.
Return to section: 6 Related Work

Robin Lakoff. 1973. Language and woman's place. Language in society, 2(1):45-79.
Return to section: 1 Introduction

Guillaume Lample, Sandeep Subramanian, Eric Smith, Ludovic Denoyer, Marc'aurelio Ranzato, and Y-Lan Boureau. 2018. Multiple-Attribute text rewriting. In ICLR.
Return to section: 6 Related Work

Juncen Li, Robin Jia, He He, and Percy Liang. 2018a. Delete, retrieve, generate: A simple approach to sen- timent and style transfer. In NAACL.

Zichao Li, Xin Jiang, Lifeng Shang, and Hang Li. 2018b. Paraphrase generation with deep reinforce- ment learning. In EMNLP.
Return to section: 4.4.1 Ablated Baselines

Ziqiang Cao, Chuwei Luo, Wenjie Li, and Sujian Li. 2017. Joint copying and restricted generation for paraphrase. In AAAI.
Return to section: 4.4.1 Ablated Baselines

Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Nose- worthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An em- pirical study of unsupervised evaluation metrics for dialogue response generation. In EMNLP.

Judith Lorber, Susan A Farrell, et al. 1991. The social construction of gender. Newbury Park, 5.

Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In ICLR.

Remi Mir, Bjarke Felbo, Nick Obradovich, and Iyad Rahwan. 2019. Evaluating style transfer for text. In NAACL.
Return to section: 4 Controllable Debiasing Experiments

Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James Allen. 2016. A cor- pus and cloze evaluation for deeper understand- ing of commonsense stories. In NAACL. Cor- pus available at https://www.cs.rochester.edu/nlp/rocstories/.

Tong Niu and Mohit Bansal. 2018. Polite dialogue gen- eration without parallel data. TACL.
Return to section: 6 Related Work

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word rep- resentation. In EMNLP.
Return to section: 4.5 Comparison With External Approaches

Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, and Alan W Black. 2018. Style transfer through Back-Translation. In ACL. Code available at https://github.com/shrimai/Style-Transfer-Through-Back-Translation.
Return to section: Powertransformer, 4.5.1 Baselines, 6 Related Work

Aaditya Prakash, Sadid A. Hasan, Kathy Lee, Vivek Datla, Ashequl Qadir, Joey Liu, and Oladimeji Farri. 2016. Neural paraphrase generation with stacked residual LSTM networks. In COLING.
Return to section: 4.4.1 Ablated Baselines

Reid Pryzant, Richard Diehl Martinez, Nathan Dass, Sadao Kurohashi, Dan Jurafsky, and Diyi Yang. 2020. Automatically neutralizing subjective bias in text. In AAAI.
Return to section: 6 Related Work

Sapna Cheryan and Hazel Rose Markus. 2020. Mascu- line defaults: Identifying and mitigating hidden cul- tural biases. Psychological Review.

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language under- standing by generative pre-training. Unpublished.
Return to section: 3.1 Model Overview

Anil Ramakrishna, Victor R Martínez, Nikolaos Ma- landrakis, Karan Singla, and Shrikanth Narayanan. 2017. Linguistic analysis of differences in portrayal of movie characters. In ACL.
Return to section: 5.1 Movie Scripts Corpus

Sudha Rao and Joel Tetreault. 2018. Dear sir or madam, may I introduce the GYAFC dataset: Cor- pus, benchmarks and metrics for formality style transfer. In NAACL.
Return to section: 2 Controllable Debiasing, 6 Related Work

Hannah Rashkin, Sameer Singh, and Yejin Choi. 2016. Connotation frames: A data-driven investigation. In ACL.
Return to section: 2 Controllable Debiasing

Alexey Romanov, Anna Rumshisky, Anna Rogers, and David Donahue. 2019. Adversarial decomposition of text representation. In NAACL.
Return to section: 6 Related Work

Cicero Nogueira dos Santos, Igor Melnyk, and Inkit Padhi. 2018. Fighting offensive language on social media with unsupervised text style transfer. In ACL.

Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Juraf- sky, Noah A Smith, and Yejin Choi. 2020. Social bias frames: Reasoning about social and power im- plications of language. In ACL.
Return to section: 5.2 Debiasing Portrayal In Movies

Maarten Sap, Marcella Cindy Prasettio, Ari Holtzman, Hannah Rashkin, and Yejin Choi. 2017. Connota- tion frames of power and agency in modern films. In EMNLP. Connotation Frames downloaded from http://maartensap.com/movie-bias/.
Return to section: Powertransformer, 2 Controllable Debiasing, 3.2 Joint Objective, 4.4.1 Ablated Baselines, 5.1 Movie Scripts Corpus, 5.2 Debiasing Portrayal In Movies

Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2017. Style transfer from Non-Parallel text by Cross-Alignment. In NeurIPS.
Return to section: 6 Related Work

Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. 2019. The woman worked as a babysitter: On biases in language generation. In EMNLP.
Return to section: 5.2 Debiasing Portrayal In Movies

Elizabeth Clark, Anne Spencer Ross, Chenhao Tan, Yangfeng Ji, and Noah A Smith. 2018. Creative writ- ing with a machine in the loop: Case studies on slo- gans and stories. In IUI.
Return to section: Powertransformer

Akhilesh Sudhakar, Bhargav Upadhyay, and Arjun Ma- heswaran. 2019. Transforming delete, retrieve, gen- erate approach for controlled text style transfer. In EMNLP.
Return to section: 6 Related Work

Thomas Wolf, L Debut, V Sanh, J Chaumond, C De- langue, A Moi, P Cistac, T Rault, R Louf, M Fun- towicz, et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. Un- published.

Wei Xu, Chris Callison-Burch, and Courtney Napoles. 2015. Problems in current text simplification re- search: New data can help. TACL.
Return to section: 6 Related Work

Wei Xu, Alan Ritter, Bill Dolan, Ralph Grishman, and Colin Cherry. 2012. Paraphrasing for style. In COL- ING.
Return to section: 6 Related Work

Zichao Yang, Zhiting Hu, Chris Dyer, Eric P Xing, and Taylor Berg-Kirkpatrick. 2018. Unsupervised text style transfer using language models as discrimina- tors. In NeurIPS.
Return to section: 6 Related Work

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. Bertscore: Eval- uating text generation with BERT. In ICLR.
Return to section: 4.2 Metrics

Ye Zhang, Nan Ding, and Radu Soricut. 2018. SHAPED: Shared-Private Encoder-Decoder for text style adaptation. In NAACL.
Return to section: 6 Related Work

Mathias Creutz. 2018. Open subtitles paraphrase cor- pus for six languages. In LREC. Corpus available at http://urn.fi/urn:nbn:fi:lb-201804191.

Ning Dai, Jianze Liang, Xipeng Qiu, and Xuanjing Huang. 2019. Style transformer: Unpaired text style transfer without disentangled latent representation. In ACL.
Return to section: 4.2 Metrics, 6 Related Work

Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2020. Plug and play language models: A simple approach to controlled text generation. In ICLR.
Return to section: Powertransformer, Main Metrics, 4.5.1 Baselines

Daniel Clement Dennett. 1989. The intentional stance. MIT press.
Return to section: 2 Controllable Debiasing

PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction

Authors

Abstract

1 Introduction

Powertransformer

2 Controllable Debiasing

3 Powertransformer

Transformer Inputs

3.1 Model Overview

3.2 Joint Objective

4 Controllable Debiasing Experiments

4.1 Datasets

4.2 Metrics

Main Metrics

4.3 Experimental Setup

4.4 Investigating Effectiveness Of Approach

4.4.1 Ablated Baselines

4.4.2 Results

4.5 Comparison With External Approaches

4.5.1 Baselines

4.5.2 Results

4.6 Evaluating With Human Judgements

4.6.1 Human Evaluation Task

+

4.6.2 Results

5 Gender Bias In Movies

5.1 Movie Scripts Corpus

5.2 Debiasing Portrayal In Movies

6 Related Work

7 Conclusion

B.1.2 Powert P Araonly+Static

B.1.3 Powert Joint+N One

B.1.4 Powert Joint+Static

B.2 Pplm Details

B.3 Backtranslation Details

C.1 Extracting Gender From Characters

Submit

Revisions:

Instructions

What Is Agency

Background

Sentence Agency Level Explanation