Over the holidays, I took upon myself a challenge. I’ve been fiddling around creating a Mobipocket format dictionary consisting of Spanish words and their English translations. I wanted to be able to set it as my primary dictionary on the Kindle and then use it for on-demand word translation as I gut through trying to read documents in Spanish. A few months back, I couldn’t find any unencrypted ones for sale although now apparently some do exist.
This seemed straightforward enough, so I did a little Ruby scripting (getting a crash course in Unicode characters in the process). I found these lists of Spanish word frequencies and wrote a script to parse them into one word per line. I then wrote a script to take lists of words from STDIN, check to see if they existed in the map and if not look them up from various online sources and add it, and then save it as a YAML file. It was most of a week including false starts and do-overs to finally run the whole list against online translating tools. From there, I created another script to take the YAML file and rewrite it as a (roughly) alphabetically sorted and tab delimited text file. With that done, I used these already available tools to take that file and create files suitable for Mobipocket Creator.
The upshot is that this Kindle formatted Spanish to English translation dictionary is available now to download, for free. [Update 2012/05/11 – I’d recommend you use Marc Sturm’s version as of today.] In order to use it, place it on your Kindle via USB or emailling it to your device. Go to “Home->Menu->Settings”, then “Menu->Change Primary Dictionary.” From that point, moving the cursor over a word will work like the dictionary used to with definitions, but with English translations of Spanish words.
I offer this to the world, for free, no strings attached. In fact, because of the Creative Commons license on it (described below) you are free to take the files and do whatever you want with them as long as you comply. Be aware of the following caveats with this dictionary:
1: This is a machine generated translation from various online sources. There is no guarantee of correctness for any given term. I did find and scrub some bogus racist translations that have been put into some online repositories, and there may be other erroneous or malicious terms submitted that have ended up in this dictionary. I warrantee nothing and can pretty much say there are some translations or source words that can offend delicate sensibilities. Over time I might try to find ways to improve this file, continue to fill out the dictionary word list and maybe even improve the translations that are already there. Keep watching this blog for future revisions.
2: There are still formatting issues for the dictionary popup lookup. While you will see your term first in the list, it will not stop at the following term. Any feedback on how to engineer the source files to make this work correctly can be sent to firstname.lastname@example.org or left as a comment on this post.
3: This book is offered with a Creative Commons license: BY-NC-SA For the required attribution, please provide a link to http://www.evilgeniuschronicles.org
4: This is a word-by-word dictionary, so you won’t get a translation for idiomatic phrases. That’s a downside. An upside is that because this was done word by word from frequency lists, conjugated Spanish verbs get their own entry and each get translated individually.
I’m glad to get any feedback on this dictionary, particularly on point #2. If anyone can describe how to reformat the HTML input files to make the dictionary popups not run together, I’d be highly appreciative. Beyond that, roll and have fun with it and let me know how it works for you. If someone can point me to directions on how to turn these source files into the equivalent version for the Nook, I’d be happy to publish that as well, although I’ll need volunteers to help me test it.
For me, I’m off to take another crack at Don Quixote.
[Update 2011/12/01] The long asked for source files have been committed to this repository at github. I don’t blame you if you can’t make heads nor tails of that as it stands, I certainly couldn’t. At my next available opportunity, I will document the process and try to improve the scripts. No timetable is promised nor implied, mileage may vary, secure your own mask before helping others.
Also, if you want to say thank you for this I ask for nothing other than clicking the Amazon ad in the upper right corner of this blog before you make a purchase sometime. It costs you nothing and kicks a few affiliate percentage points back my way. Thank you.
[Update 2012/05/11] Marc Sturm is the first person I know of to make a modification to this dictionary. He figured out how to make it work with the newer Kindles and has published his version. If you have a Kindle 3 or greater, I’d recommend using his version.