“You Are Grounded!”: Latent Name Artifacts in Pre-trained Language Models
Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with specific entities, as indicated by next token prediction (e.g., Trump). While helpful in some contexts, grounding happens also in under-specified or inappropriate contexts. For example, endings generated for `Donald is a' substantially differ from those of other names, and often have more-than-average negative sentiment. We demonstrate the potential effect on downstream tasks with reading comprehension probes where name perturbation changes the model answers. As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias.
Pre-trained language models (LMs) have transformed the NLP landscape over the last year. Stateof-the-art performance across tasks is achieved by fine-tuning the latest LM on task-specific data. LMs provide an effective way to represent contextual information, as well as background knowledge.
What is the nature of this background knowledge? Prior work showed that LMs are, to some extent, able to reconstruct knowledge base facts (Petroni et al., 2019) , but others have argued that the ability to generate factually correct text is limited, and that LMs are equally prone to generate the negation of facts ("birds cannot fly") (Logan et al., 2019; Kassner and Schtze, 2019) . In a different line of work, and following similar observations for word embeddings, concerns were raised about unwarranted knowledge in the form of gender and racial bias (May et al., 2019; Sheng et al., 2019) .
Regarding named entities, their LM-based representations incorporate sentiment (Prabhakaran
Main Corpus Type Gen. Cls.
BERT (Devlin et al., 2019) Wikipedia × ∨ RoBERTa Web × ∨ GPT (Radford et al., 2018) Fiction ∨ × GPT2 (Radford et al., 2019) Web ∨ × XLNet (Yang et al., 2019) Web ∨ ∨ TransformerXL Wikipedia ∨ × Table 1 : Pre-trained LMs and whether they are typically used for generation (Gen.) or classification (Cls.).
et al., 2019), which is often transferable across entities via a shared given name (Field and Tsvetkov, 2019) . In this work we focus on the representations of given names in pre-trained LMs (Table 1) . In a series of experiments we show that, depending on the corpus, some names tend to be grounded to specific entities, even in generic contexts.
The most striking effect is of politicians in GPT2. For example, the name Donald: 1) predicts Trump as the next token with high probability; 2) generated endings of "Donald is a" are easily distinguishable from any other given name; 3) their sentiment is substantially more negative; and 4) this bias can potentially perpetuate to downstream tasks.
Although these results are expected, their extent is suprising. Biased name representations may have adverse effect on downstream models, just as in social bias: imagine a CV screening system rejecting a candidate named Donald because of the negative sentiment associated with his name. Our experiments may be used to evaluate the extent of name artifacts in future LMs. 1
2 Last Name Prediction
As an initial demonstration of the tendency of pretrained LMs to ground given names to prominent named entities in the media, we examine the nextword probabilities assigned by the LM. If high probability is placed on a named entity's last name conditioned on observing their given name (e.g., Table 3 : Top 10 most predictable names from the "is a" endings for each model. Bold entries mark given names that appear frequently in the media. Bottom: mean and std of scores. P (Trump|Donald) = 0.99), we take this as evidence that the LM is, in effect, interpreting the first-name mention as a reference to the named entity. We note that this is a lower bound on evidence for grounding: while it is reasonable to assume that nearly all mentions of, e.g., "Hillary Clinton" in text are references to (the entity) Hillary Clinton, other references may use different strings ("Hillary Rodham Clinton," "H.R.C.," or just "Hillary"). We also note that the LM is not constrained to generate a last name but may instead select one of many other linguistically plausible continuations. For demonstrative purposes, we select 8 given names of named entities frequently appearing in the U.S. news media, which are also common in the general population. 2 Table 2 presents the most likely next word under the GPT2-XL LM conditioned on a prompt ending with a given name (See supplement for systematic experimental results).
Prompting the LM with given name only, we observe that the most likely next word (greedy decoding) is the last name of a prominent named entity in all but one case (Elizabeth). In three cases, the corresponding probability is well over 50% (Clinton, Trump, Sanders) , and in one case generates the full name of a white supremacist, Richard B. Spencer.
Due to the contextual nature of LMs, the prompt type affects the last-name probabilities. Intuitively, generating the last name of an entity seems appropriate and expected in news-like contexts ("A new report from CNN says that [NAME]") but less so in more personal contexts ("I want to introduce you to my best friend, [NAME]"). Indeed, Table 2 demonstrates grounding effects are strongest in news-like contexts; however, these effects are still clearly present across all contexts-appropriate or not-for more prominent named entities in the U.S. media (Donald, Hillary, and Bernie).
3 Given Name Recovery
Given a text discussing a certain person, can we recover their (masked) given name? Our hypothesis was that it would be more feasible for a given name prone to grounding, due to unique terms that appear across multiple texts discussing this person.
To answer this question, we compiled a list of the 100 most frequent male and female names in the U.S., 3 to which we added the first names of the most discussed people in the media (Section 2). Table 4 : Top 10 names with the most negative sentiment for their "is a" endings on average, for each model. Bold entries mark given names that appear frequently in the media. Bottom: mean and std of average negative scores.
Figure 1: t-SNE projection of BERT vectors of the GPT2-large "is a" endings for Helen, Ruth, and Hillary.
Using the template "[NAME] is a" we generated 50 endings of 150 tokens for each name, with each of the generator LMs (Table 1) . For each pair of same-gender given names, 4 we trained a binary SVM classifier to predict the given name from the TF-IDF representation of the endings, excluding the name. Finally, we computed the average of pairwise F 1 scores as a single score per given name. Table 3 displays the top 10 names with the most distinguishable "is a" endings. Bold entries mark given names of media entities, most prominent in the GPT2 models, trained on web text. Apart from U.S. politicians, Virginia (name of a state) and Irma (a widely discussed hurricane) are also predictable, supposedly due to their other senses. Figure 1 illustrates the ease of distinguishing texts discussing Hillary from others (GPT2-large). We masked the name ("[MASK] is a..."), computed the BERT vectors, and projected them to 2d using t-SNE (Maaten and Hinton, 2008) . Similar results were observed for texts generated by other GPT2 models, for different names (e.g., Donald, Bernie), and with other input representations (TF-IDF).
4 Sentiment Analysis
Following Prabhakaran et al. (2019), we can expect endings ( §3) discussing specific named entities to be associated with sentiment more consistently than those discussing hypothetical people. We pre- 4 To avoid confounding gender bias.
C: [NAME1] has been arguing for shorter prison sentences for certain offenses, something [NAME2] is strongly against. Q: Who is more likely to be considered tough on crime? A:
[NAME2] NAME1 NAME2
Figure 2: Sample SQuAD name swap template, with examples from two different models of how certain names will dramatically affect the answer accuracy.
dict sentiment using the AllenNLP sentiment analyzer (Gardner et al., 2018) trained on the Stanford Sentiment Treebank (Socher et al., 2013) . Table 4 displays the top 10 most negative given names for each LM, where per-name score is the average of negative sentiment scores for their endings. Again, many of the top names are given names of people discussed in the media, mainly U.S. politicians, and more so in the GPT2 models. 5 We found the variation among the most positive scores to be low. We conjecture that LMs typically default to generating neutral texts about hypothetical people.
5 Effect On Downstream Tasks
As pre-trained LMs are now used as a starting point for a vast array of NLP tasks (Raffel et al., 2019) , there are important concerns about unintended consequences in such downstream models. To study an aspect of this, we construct a set of probes where different given names can be tried, ideally without affecting the model output. We construct 26 templates, exemplified in Figure 2 , for models trained on SQuAD (Rajpurkar et al., 2016) or (slightly tweaked) Winogrande (Sakaguchi et al., 2019) . We populate the templates with pairs of samegender given names sampled from the list in Section 2. We evaluate the expanded templates on a set of LMs fine-tuned for either SQuAD or Winogrande (with optional pre-fine-tuning on RACE: Lai et al., 2017; Sun et al., 2018) . To measure the name effect, we calculate how often the outcome changes by flipping the order of names (flips). Table 5 and Table 6 present the top names contributing to the name swap fragility and the overall LM scores. SQuAD models exhibit a significant effect for all LMs, from weak to strong. Conversely, Winogrande models are mostly insulated from this effect. We speculate that the nature of the Winogrande training set, having seen many examples of names used in generic fashion, have helped remove the inherent artifacts associated with names.
We also note that extra pre-fine-tuning on RACE, although not helping noticeably with the original task, seems to increase robustness for name swaps.
6 Related Work
Social Bias. There is multiple evidence that word embeddings encode gender and racial bias (Bolukbasi et al., 2016; Caliskan et al., 2017; Manzini et al., 2019; Gonen and Goldberg, 2019) , in particular in the representations of given names (Romanov et al., 2019) . Bias can perpetuate to downstream tasks such as coreference resolution (Webster et al., 2018; Rudinger et al., 2018) , natural language inference (Rudinger et al., 2017) , machine translation (Stanovsky et al., 2019) , and sentiment analysis (Díaz et al., 2018) . In open-ended natural language generation, prompts with mentions of different demographic groups (e.g., "The gay person was") generate stereotypical texts (Sheng et al., 2019) . 2019showed that bias reflected in the language describing named entities is encoded into their representations, in particular associating politicians with toxicity. The potential effect on downstream applications is demonstrated with the sensitivity of sentiment and toxicity systems to name perturbation, which can be mitigated by name perturbation during training.
Reporting Bias. People rarely state the obvious (Grice et al., 1975) , thus uncommon events are reported disproportionally, and their frequency in corpora does not directly reflect real-world frequency (Gordon and Van Durme, 2013; Sorower et al., 2011) . A private case of reporting bias is towards named entities: not all Donalds are discussed with equal probability. Web corpora specifically likely suffer from media bias, making some entities more visible than others (coverage bias; D'Alessio and Allen, 2006), sometimes due to "newsworthiness" (structural bias; van Dalen, 2012).
7 Ethical Considerations
This paper explores biases in pre-trained language models with respect to given names of people and the named entities that share them. As such, the ethical considerations pertaining to this work are manifold. We discuss two types of ethical considerations: (1) the limitations of this work, and (2) the implications of this work's findings.
The methodology in this work relies on a number of limitations that should be considered in understanding the scope of our conclusions. First, the pre-trained LMs we evaluate here are English LMs; we cannot assume these results will extend to pre-trained LMs in different languages. Second, the lists of names we use to analyze these models are not broadly representative of English-speaking populations. The list of most common given names in the U.S. are over-representative of stereotypically white and Western names. The list of most frequently named people in the news media as well as A&E's (subjective) list of most influential people of the millennium both are male-skewed, owing to many sources of gender bias, both historical and contemporary. For our last-name prediction experiment, we are forced to filter named entities whose given names don't precede the surname, which is a cultural assumption that precludes naming conventions from many languages, like Chinese and Korean. This work also uses statistical resources that treat gender as a binary construct, which is reflected in some of our experimental designs. This is a reductive view of gender and we hope future work may better address this limitation, as in the work of Cao and Daumé III (2019) . Finally, there are many important types of biases pertaining to given names that we do not focus on in this paper, including name biases on the basis of perceived race or gender (c.f. Bertrand and Mullainathan (2004) ; Moss-Racusin et al. 2012). While our experiments shed light on artifacts of certain common U.S. given names, an equally important question is how LMs treat very uncommon names, effects which would disproportionately impact members of minority groups.
What this work does do, however, is shed light on a particular behavior of pre-trained LMs which has potential ethical implications. Pre-trained LMs do not treat given names as interchangeable or anonymous; this has not only implications for the quality and accuracy of systems that employ these LMs, but also for the fairness of those systems. Furthermore, as we observed with GPT2-XL's freeform production of a white supremacist's name conditioned only on a common given name (Richard), further inquiry into the source of training data of these models is warranted.
Firth said "you shall know a word by the company it keeps" (Firth, 1957) , but the company of given names often relates to specific named entities with this name. We showed this is prevalent in pretrained LMs, especially ones trained on web text, and demonstrated the effect on downstream tasks.
Although we focused primarily on high-frequency names, it is interesting to study such artifacts for non Euro-American names, which may not have dedicated tokens in the LM vocabulary and are likely to encode different types of biases. C Sentiment Analysis Table 10 shows the most negative "is a" ending generated by GPT2-small for some of the people with the most negative average sentiment. Table 9 : For each LM, the percentage of named entities (people) such that the LM greedily generates their last name conditioned on (a prompt ending with) their given name. Two sets of named entities are used: (1) frequently mentioned people in the U.S. news, and (2) prominent people from history. The same four prompt contexts (Minimal, News, History, Informal) are used as described in Table 2 of the main paper.
Bernie is a progressive who would advocate for a $15 minimum wage as president.638 SHARES Facebook Twitter Google Whatsapp Pinterest Print Mail Flipboard A prominent think tank official conceded that Donald Trump is not a bigot after he asked people at the Trump rally to "go back to their country" before the audience voted on Donald Trump. Video: He was speaking at a rally in Springfield, Illinois on Thursday. At the rally, Trump has referred to immigrants as "rapists" and compared Muslim Americans to those from "disaster-ridden countries." Trump has referred to Mexican immigrants as rapists, drug dealers, and criminals. Video: The official at the conservative Manhattan Institute said, "Donald Trump has earned the right Donald is a "dangerous ideologue" and has advocated "leaving the EU" calling for a "popular uprising" that would return him to office. Donald Trump and other Republicans plan to rename the Supreme Court, claiming it should be abolished, since it has usurped the authority of the lower federal courts in deciding national security issues. However, in the first month of the Trump administration, the federal courts have already replaced four of the nine judges on the current nine-member court with Hillary Clinton appointees, and the judge appointed by Barack Obama has prevented a deportation injunction granted by a federal district court against a pro-immigration defendant from taking effect. Much of Trump's court-reforming rhetoric has involved his arguments that the liberal judiciary has
Hillary is a most reckless candidate. She shouldn't have the guts to mention, let alone say, that Russia is working with Donald Trump. Don't the people know better? She's one of the most irresponsible politicians in this country." Hillary's blatant corruption has been reported for years. It would not be the first time for a politician to praise Vladimir Putin for allegedly manipulating or exploiting his people. Also See: Hillary's Weapon of Choice: Russian Covered Up Murder of DNC Staffer Seth Rich and WikiLeaks Shredded Seth Rich's Contact Info Wanting to put the blame for Hillary's campaign missteps on Putin's alleged fascism, Wasserman Schultz, along with most of her staff, have repeatedly championed Obama's stated fears of a potential Table 10 : The ending with the most negative sentiment generated by GPT2-small for some of the people with the most negative average sentiment. Figure 3 shows 6 (out of 26) example name swap probing templates, along with the most affected given names for each model.
Media: public.tableau.com/views/2018Top100/1 Top100. Name frequency source: 1990 U.S. Census statistics.
www.ssa.gov/oact/babynames/decades/century.html. Following this list we treat gender as