Language Evolution in the Digital Age

James Callan has called me out not once but twice on his way to suggesting a topic for the Christopher Lydon radio show “Open Source.”

The Open Source topic is very broad, but its general premise seems to be this: What about “The Dictionary?” What about paper dictionaries? What about digital dictionaries? Read the show’s warm-up page for more detail.

I was going to post this as a comment there, but it’s rather long so I’ll just trackback it. [Trackback didn't work, so I just linked to this in a comment.] Some of this comes from responses I’ve given to reporters in email. Very rarely they have used a line or two of it, but if it seems familiar, that’s probably why.

My web site, Double-Tongued Word Wrester, isn’t really a user-generated dictionary. Though I do accept submissions via email from people, they are not common and ultimately all of the cites are recorded by me and all of the entries are crafted by me (and all the errors are mine). If it deserves to be in the list with Urban Dictionary or Wiktionary at all, it’s strictly because it is has been generated solely in the digital realm and because it is an ongoing work. Another similarity is that it’s been a sandbox where I’ve been able to break some rules of traditional lexicography, usually because of time constraints that compel me to practice it in a quick and dirty form. However, elsewhere I am a practicing lexicographer, working for an established dictionary-maker, and much of what I do on my web site informs or feeds into my professional work.

It’s not difficult to dispel the concept of “The Dictionary.” It doesn’t exist. We English-speakers have no official language canon, we have no government language academy, we have no Royal Lexicon. There is no one “Webster’s” either, since that term’s trademark is null. Anyone can make a dictionary and call it “Webster’s Dictionary of Dood-speak” or whatever they like, just as anyone can make a dictionary and call it “Roget’s Thesaurus of Klingon” or “Roget’s Webster’s Dictionary and Thesaurus of Twaddle.”

Also, lexicography is not a profession that requires certification so much as it is a trade that requires training: anyone can make a dictionary and anyone can be a dictionary editor.

Usually, speakers use “The Dictionary” to mean “the English language as I know it.” Sometimes it’s used as a cop-out: a person can make ridiculous claims about the importance of a word by claiming it is or not in “The Dictionary,” of course defining “The Dictionary” as a certain lexical work that proves their point.

In far too many cases, people believe that all dictionaries are more or less the same, so “The Dictionary” makes sense to them. But all dictionaries are not the same and anyone serious about language—including copydesk editors and writers of all sorts—would be wise to use several. Sites like OneLook make it easy to consult several reputable dictionaries at once for free.

A better term to use might be “The Dictionaries,” meaning the mainstream American English-language dictionaries designed for everyday use by students, families, and businesspeople. This excludes some of the niche dictionaries I have edited myself, but it includes most of the words most commonly looked for and includes all of the best-selling general use dictionaries of our time.

So are lexicographers simply trying to keep up with the descriptive power of search engines?

I’m not sure what this means. I’m guessing it supposes that people who need a word defined will simply turn to a search engine to find a definition. If that’s the case, then I think we’re not there yet. Folks are still awful at crafting search queries. They misspell them. They phrase them as a question (“what does gleeking mean?”) instead of using special queries (“define:gleek”). Neither of those, by the way, return a good answer in Google. And many of the best free dictionaries are closed to indexing or spidering, so even if they were searching correctly, the answer still might not appear.

If the semantic web ever becomes real, then we might have a machine-driven automatic dictionary on our hands. But it will be subject to the same irregularities—text errors, spam, technical faults—as the rest of the Internet.

Does the prevalence of new words signify the downfall of dictionaries, or merely that they have been supplanted by new authorities?

That’s a false dichotomy, I’m afraid. Neither is true. Dictionaries are more used than ever and are nowhere near a downfall. It’s just that they’re now often digital as well as print. There are also few new authorities, there are mostly merely transformations of old authorities. (Even Wiktionary includes many thousands of definitions from an out-of-copyright edition of an early unabridged dictionary.)

Regarding the prevalence of new words: Even though dictionaries are sometimes perceived as overly liberal in their inclusion of new words, they are, in fact, highly conservative forces—by necessity, otherwise they would be drowned in the flood of new words.

[It's important to understand that the core English common to all Anglophone countries changes very little and very slowly. We modern English-speakers—or at least our journalists—pay a great deal of attention to new words, but they are just foxfire on the banks of the great river of English that has run its course for centuries.]

The pace of print dictionary change perhaps more accurately mirrors the true pace of language change rather than the perceived pace of language change.

Because of the surfeit of words and the limited editorial manpower and time, all print dictionaries have a policy for inclusion. It depends on the dictionary, but generally a word must be shown to have existed for a reasonable period of time in a number of different contexts.

For example, the Historical Dictionary of American Slang, for which I am project editor, has a policy that a slang word must have been decently attested in print for ten years before it can be recorded. There’s no way we’d have room for the millions of definable terms that were used, say, for a single year among one group of surfers in San Jose or for other short periods among other equally small groups that might generate lots of rarely spread slang.

Oral language is ephemeral. Millions of new oral coinages appear every year, nearly all quickly die, and of the relative few that don’t, very few have currency for more than a couple of years, and even fewer make it to paper. Having a word appear in print is a guarantee that the word has some kind of staying power and is on its way to becoming a permanent part of the English language. It also increases the chance that a reasonable number of people will look up the word, thereby increasing the overall usefulness of the dictionary. Even if we could record all the nonce coinages in the world and even if the long tail principle applied, there’s little utility and no profit in crafting an entire entry for every word ever uttered.

It is true that an advantage of online dictionaries is that they can be more up-to-date and because of the theoretically infinite digital space, online dictionaries can include any word they like, words that might never appear in a print dictionary, or might not appear until years later, since most dictionaries have long update cycles, anywhere from a year to decades (as in the case of the Oxford English Dictionary).

However, it isn’t necessarily a good thing that online dictionaries can be more up-to-date: they are often merely recording those words that won’t last and will never spread.

Is Wiktionary, ever-changing but organized, the answer?

As you phrased it before, why haven’t “The Dictionaries” crumbled before the onslaught of user-generated content?

I think it’s first because all dictionaries are alterable. The differences are who is doing the alterations, how often those alterations occur, and whether those alterations appear in print.

Second and more importantly, I think we have a sufficient understanding of how large-group editing works to be able to say that user-editable dictionaries like Wiktionary or Urban Dictionary are less reliable than those that are edited by a much smaller group of specially trained people who create “The Dictionaries.”

Lexicographers bring to dictionary-making an understanding of language that the lay-person simply does not have, and cannot have, without specialized training, study, or education. This doesn’t require a university education. Many of the best dictionary editors have been autodidacts, but they were autodidacts with discipline and, perhaps most importantly, with mentors who could guide them. Also, the work requires a willingness to deal with the more tedious parts of dictionary-making, like proving a word actually exists, then proving what it means, and then trying to convey that in simple, short language. And it means writing entries for such unspectacular terms as “at” or “do,” which the editors of the user-editable dictionaries seem loath to do or to do well.

From a purely lexicographical standpoint, Wiktionary seems to have an awful work plan. (My information for this comes from lurking on the Wiktionary-L email list, to which I have been subscribed since September 2005.) For one thing, the wiki editing structure means that the ratio of lexical work to administrative work is out of whack. The politics seem often to take precedence over building the entries.

Despite that unwieldy administrative overhead, they’ve made fundamental lexicographical mistakes that experienced lexicographers usually have not. For example, in the beginning Wiktionary capitalized all the headwords, a practice all reputable English-language dictionaries abandoned quite some time ago. It’s a basic rule of English: if a word is not a proper noun, it should not be capitalized. (You can still see many headwords inappropriately capitalized.)

I would also call it a serious mistake to include all the languages together in a single entry by default. It’s confusing, misleading, and often inaccurate: the level of expertise for the languages in question is often not equal and the translations are imperfect.

In any case, many of the entries aren’t entries at all. They’re glosses, where definitions of a term amount to a single word or two.

[I know the response to these comments will be, "Well, it's a wiki. You can fix it yourself." No, I cannot, unless someone invents a wormhole that will stretch my daytime hours tenfold, and also not unless the Wiktionary folks welcome a benevolent dictator.]

The Wiktionary contributors also seemed to have missed that fundamental rule, “Know when you’ve reached the point of good-enough and know when you’ve reached the point of diminishing returns.” Considering the vastness of the task they are undertaking, it seems like too many people are touching each entry. It’s inefficient. If they ever hope to get anywhere near something like a first edition, editing should be even more like a consensus-based assembly line, only with points of no return and hard deadlines. It should be less like decorating car for newlyweds and more like building one.

I think the failure of the wiki editing model is demonstrated by the need of some large wikis to sometimes lock an entry that is subject to vandalism or partisan bickering. Closing the entry so that it cannot be changed by just anyone ensures that it retains reasonable accuracy without falling victim to mob rule or the tyranny of democracy.

P.J. O’Rourke once put it this way: if we were truly democratic, “Every meal would be pizza. Every pair of pants…would be stone-washed denim. Celebrity diet and exercise books would be the only thing on the shelves at the library. And—since women are a majority of the population—we’d all be married to Mel Gibson.”

Dictionaries work much on the same principle: if we left it all to the public, mostly what we’d have would be sexual, scatological, and nonce terms and a slew of racist comments about our neighbors. The entries for boring but needed terms, such as prepositions or helping verbs, would languish.

So I’m for the locking of Wiktionary entries, but it would much better if there were a strict editing tree in the first place for all entries and a precise work plan and timeline.

Urban Dictionary demonstrates another reason such user-editable dictionaries are less accurate: the question of what is appropriate. Aaron Peckham at Urban Dictionary may have more to say on this, but in numerous interviews he’s addressed the question of the racist, prejudiced, and irrelevant entries that many of his site visitors post. I’m not referring to curse words, racial slurs, or salty language—I work in slang, after all, and deal with all the four-letter words you can think of (plus many you can’t). I refer to entries in which someone takes a fellow seventh-grader’s name and turns it into an entry for “stupid person.” It just isn’t valuable. Nor are ones that are really just racist comments disguised as a new entry. There have been tens of thousands of instances of such things on UD.

There are also many examples of a group of people trying to spread a word they like. They all post it to UD, thinking that others will pick it up or else will be fooled into believing its more popular than it is. That’s also not valuable. It misrepresents the true importance of the word. I think that’s why Aaron has “editors” who approve entries. They’re better than the free-for-all that it was before, but they still admit a large number of entries of absolutely no value to anyone but the original poster.

That all said, UD and Wiktionary have important roles. Everyone owns the language we speak. If we’ve learned anything from Billy Wilder’s movie Ball of Fire, it’s that making dictionaries should not be the exclusive domain of a bunch of pointy-headed know-it-alls. People love words and language and they appreciate clever talk and repartee and witty lines and great public speakers and funny word histories.

Sites like UD demonstrate that even junior high and high school kids (and many older kids, some in their thirties and forties, no doubt) love to play with language as much as the rest of us—and this is the very same group which is often criticized as being careless about language by the school marms and the mossbacks and the self-appointed hall monitors and crossing guards of grammar, the squares and the bores. Every entry contributed to an online dictionary is a love note to the English language.

Related to all this is the increased effort of dictionary-makers to get public input. OED launched a program with the BBC to solicit public help in finding early uses of a bunch of words. (Of course, the Oxford English Dictionary has since the beginning relied upon volunteers to submit citations for new words.)

Collins in the UK has public discussion forums where visitors can nominate new words or discuss existing words.

Merriam-Webster launched Open Dictionary, a place where users can submit words (and they already have many thousands).

I’m not sure how much that user input will be taken into account by the editors. Much of what has already been submitted to the MW and Collins forums is rubbish. Words have been submitted that are already in a dictionary, that are provably nonce (that is, they have no currency at all beyond a single use), that are “stunt” or factitious terms (those created by campaigns merely to try to get a word in a dictionary, or those created by marketing efforts), that are supposedly “funny” but aren’t, and words have been submitted by people who clearly care most about becoming famous for having coined a word.

Unlike those submissions, Wiktionary is a good-faith effort made by people who genuinely want to record language.

While such public input—in both the user-generated dictionaries and the traditional publishers’ discussion forums—could be the most revolutionary change in dictionary-making since descriptivist dictionaries began to be more common than prescriptivist dictionaries, I doubt it all matters much. All of this user-generated content is so far less useful than the increasing use of corpora as a lexicographical tool, a practice made possible and widespread by cheap desktop computers.

Despite my carping and complaints, my final thought is this: the more the merrier. Bring on the words! Bring on the new language! I frequently visit the user-generated dictionaries to catch a whiff of new words entering pop culture that I might not otherwise encounter. I’m not 14 anymore and it’s hard to keep up, no matter how many LiveJournal blogs I read. Certainly no savvy lexicographer ignores Urban Dictionary. Its visitors might seem like a million chimps trying to hammer out Shakespeare, but among the useless words are gems that do indeed deserve to be on the permanent record. As for Wiktionary, its goals are admirable and it needs only time and persistence to prove its worth.

Posted June 19, 2006

Related Posts

Comments are closed.