Programmable Dictionaries 25 Sep, 2017

I don’t use a time tracking app, but I wouldn’t be surprised if one revealed that I open Dictionary almost as often as Notes or Mail. While I’m still actively building my English vocabulary, I’ve developed a habit of making a note whenever I encounter an unfamiliar word, hear an obscure idiom, or stumble upon a peculiar phrasal verb. I then turn these notes into flashcards in my spaced repetition software. While any extra effort you put into making your flashcards generally pays off, it’s still a tedious task involving quite a bit of copy-pasting between Dictionary and Studies. A minor hindrance to a pragmatic learner, but not the one my indolent programmer’s brain could tolerate. Thus, another automation spree began.

A typical flashcard in my English vocabulary stack looks something like this: the front has a word (or a short phrase), part of speech info, IPA transcription, and one or more examples of using that word in a sentence. The back holds a dictionary page for that word and possible Russian translations. For reverse studying, I often put the first or the last letter of the word on the back of the card to make studying synonyms easier. Ever since I discovered that Studies allows importing cards as formatted CSV, I thought generating flashcards from my Notes would be a simple project for one weekend night. Spoiler alert: I was wrong.

A screenshot of a flashcard open in Studies app. — My *hitherto* manual process of making these flashcards is now finally automated.

Mislead by the omnipresence of Look Up functionality across Apple’s software; I naively assumed that Dictionary Services on macOS would expose a comprehensive API allowing me to grab some structured information about any given word. Unfortunately, that wasn’t the case. The existing API only provides a plain text definition, e.g.:

lollapalooza |ˌläləpəˈlo͞ozə | (also lalapalooza or lollapaloosa) ▶noun North American informal a person or thing that is particularly impressive or attractive: it’s a lollapalooza, just like your other books. ORIGIN late 19th century: of fanciful formation.

While I could try to extract something from this format using regular expressions or a custom parser, it seemed like handling all of the possible edge cases would be even less fun than continuing to make flashcards by hand forever. I began to research other options, and finally comprehended the current predicament of open digital dictionaries.

I wanted my program to work offline and not depend on one of the numerous web APIs recommended by the StackOverflow crowd. APIs tend to shut down or introduce regressions, and I get very frustrated when things that used to work suddenly don’t. I imagined there would be a Ruby gem (or a Node package, Python egg, whatever) that offers exactly what I needed, including ready-made dictionaries for common languages. But, aside from several API wrappers and dictd clients, my search came short.

Then came the discovery that all free English dictionaries are either blatantly incomplete or just plain text. Of course, there’s Wiktionary — an amazing community effort that incorporates more articles than Oxford and Merriam-Webster combined. Yet, under a closer examination, it reveals an apparent lack of structure. Wiktionary seems optimized for human access, not machine processing. Hopefully, this improves with time, but I couldn’t figure out how to structure their data in a way that would serve my use case.

After spending several hours researching different offline options and not finding anything, I had to admit my failure and pick a web API. I chose The Oxford Dictionaries API as a source of definitions, usage examples, and other info. For Russian translations, I added an English-Russian dictionary in dictd format found in some dark corner of SourceForge. The Oxford’s API is free to use for up to 3,000 requests per month, which is at least ten times more than I typically need. It is well designed, too. Yay?

Nay. Instead of cheering for the “Power of the Web,” I was disappointed. Dictionary as a Service feels like a particularly bad case of digital capitalism. Just think about it: both Oxford and Merriam-Webster estimate their corpuses to contain about 500,000 articles (though, less than a third is generally used). The Oxford’s API offers a £200 plan for 500,000 API requests per month. One might fairly argue that hosting an API service costs a lot of money, but that’s exactly my grumble with it! I don’t want the API. Your entire useful dataset fits in RAM! I console myself saying that my use case falls under Advanced User Whims category, and therefore it’s only logical that a 3rd party vendor is serving such user. And (lucky me) it’s free for personal use! The invisible hand at work.

Sarcasm aside, I can’t get over the fact that a thing as essential and foundational as a dictionary is being reduced to “a service.” A dictionary is a bold attempt to structure the lexical chaos which is a living language. While it does have to be regularly updated to remain in touch with reality, the majority of the corpus remains unchanged for a long time. Vladimir Dahl spent 53 years working on the Explanatory Dictionary of the Living Great Russian Language, and allegedly was still dictating new words to his daughter laying on his deathbed in 1872. First released in 1863, his dictionary outlived his creator by a hundred years, surviving the language reforms imposed by the Soviets and not receiving any edits since 1909.

Don’t get me wrong though: maintaining a dictionary is a hard job that should be done by experts, and it’d be unreasonable to expect any professional lexicographer to do it for free. Based on the release notes regularly published by the Oxford English Dictionary, each of the quarterly updates touches about a thousand articles, newly added ones included. These numbers seem like a scale of work that could be covered by a government grant allotted to a Department of Lexicography at any major university, and then the whole nation (or the entire world) could benefit from a free dictionary. Language is a major part of national identity, so it seems only fair that the government and the citizens invest into documenting their cultural heritage. Any private enterprises serving niche audiences can still have their piece of the pie as long as there is a free alternative for everyone else.

Given enough time, we might arrive at a similar result through community efforts like Wiktionary, and I’ll be equally happy if that happens. Having grown up in Russia, where Ozhegov, Ushakov, and aforementioned Dahl have practically established dictionary-making as a one-man gig, I see it as an almost laughable paradox that in our today’s weird free-to-play economy, dictionaries (of all things) are not owned by the public. Steve Jobs understood the power of dictionaries and, starting with NeXT STEP, each computer he sold came with a copy of Oxford English Dictionary included. It was a great advance at a time, but today I want to see free, programmable, up-to-date dictionaries becoming available for everyone, regardless of their platform of choice. Dreams, oh dreams.

Enjoyed this post? Spread the word on Mastodon or Bluesky, subscribe to the RSS feed, or email me directly.

← Back to blog