On our LCHH episode yesterday, I made a dumpster fire of it with a syntactic mistake that blew everything up. With a little more work, I was able to get to first pass of my original vision. Also seeking a little redemption from #ServiceNow #CodingFailAlso on:
Note: What I am discussing here is in no way my original work. This design was created by user Mohib on the DIY Bookscanner Forum. His original plans are here. These are my notes based on my experience building his design. In addition, his software recommendations are for iOS and Windows and I have exactly the opposite platforms – Android and OS X. I have my own set of recommendations as well as some scripts I wrote. All of this will be presented below. Mohib’s plans are sufficient to build a scanner but what they lack is an Instructable style “Step 1 – do this” flow. I will attempt to put a little of that in.
There is no tldr, you knucklehead! This is a build post, so it is all or nothing.
I have long been interested in the work at DIY Bookscanner. I was very interested in the version of the scanner that could be cut from a sheet of plywood with a CNC and used two DSLR cameras with remote triggers. Ultimately, that was a bit expensive, bulky and beyond my meager abilities to create. I’ve been in the market for some sort of nondestructive book scanner for some time. I have dozens if not hundreds of books that I would gladly trade in paper for a digital copy.
When I first learned of this simplified version that could be built entirely out of PVC pipe for the structure and used a single smartphone as the camera, my ears perked up Scooby Doo style. I saw the video and got very interested. It seemed within buildable reach and had a decent throughput. I have a lot of pages of books to scan, so the more pages per minute the less time I spend.
I decided to take the plunge and actually attempt building this. To that end, I basically bought most of the things straight out of Mohib’s recommendations from Amazon. What I didn’t buy originally, I bought almost all of them later. Pretty much everything marked “optional” in his plans I found essential. I will mention them all as we get to them.
I also was able to get almost everything in the plans locally at a neighborhood True Value. I hoped to spend as little time in big box stores as possible. In fact the local employees took an interest in the project and I had many conversations about my project. This is where those times spending 10 minutes finding me the right 17 cent screw got repaid, as I ultimately dropped around $100 in materials and tools there working on the scanner.
The other big sourcing issue was the plastic platen that sits on the book pages. I also tried to source it locally at a glass place but it was both prohibitively expensive but also, they were unwilling to sell me 3/8″ plexiglass. I found this awesome site, Tap Plastics, where I could very easily order exactly what I wanted with a simple web form. Because I knew this had the longest lead time, the very first thing I did was order this.
This link will start you at acrylic. Choose the following options:
15 1/4” Length
and then down below pick 6 holes if you want to buy exactly what is in the plans. It would be possible to in fact do 4 or 2 holes, depending on the flexibility you desire from your platen. In practice, I’ve been using mine for two weeks and have yet to move the handle to different holes. It costs $2 per hole drilled, so I gambled $8 on future flexibility. When you order from Tap Plastics and include holes, you need to send them a diagram of where to drill. I extracted that one page of CAD drawing from the plans and sent that to them and what I got back was perfect.
With the platen glass ordered, I went to my local hardware store (Walker’s True Value in Conway SC) carrying a printout of the plans with the goal of getting every item I could on the manifest. Ideally I wanted to be able to build this without walking into a Lowes or Home Depot. I failed slightly, but not by much.
One of the parts in question was the barrel nut. This is those little things you find in Ikea style flat pack furniture, cylinders with a slot for screwdrivers on the ends and a threaded hole through the center. The plans call for two with 1/4-20 threading and a length of 1 1/2″, which are hard to come by in the US. The plans cite a supplier in the UK but shipping is prohibitive for these things. I wanted to avoid that. Luckily, True Value had the same thing but in 3/4″ length. I decided to experiment with this and only buy the UK version if absolutely necessary. Spoiler Alert – the 3/4″ version worked.
I was able to get both the 3/4” PVC pipe and connectors for the handle and the 1 1/4” PVC pipe for the structure. What I could not get were the caps and sanitary T connector for 1 1/4” pipe. Although they had bins for them, they were out of stock. I ultimately did get these items at a Lowes which then had the complication of being available only in white plastic where all the other 1 1/4” PVC pipe and connectors I had were black. I bought every single bolt, nut, washer, threaded rod, steel cable etc on the manifest. There are still a few bolts that I never used and to this day don’t understand where they are supposed to go.
I also went through the plans and ordered most of the things from Amazon, such as the tripod ball head, the focus rail and the star knobs. Where I think Mohib is crazy is in not using affiliate links to Amazon because he’s just leaving 4% of all those purchases on the table. I’ll list out all the stuff here with my affiliate link because I am not crazy. If Mohib ever lets me know of an associate ID for him, then I will swap them out for his because he deserves the money more than me. However, someone does deserve it.
I originally skipped buying the LED lamp listed as optional. That was a mistake and after my first trial run I went back and bought it. I also originally tried using a nonskid mat from Dollar General under the books. That was a mistake. User dpc on the DIY Bookscanner forum pointed me to the sticky Siconi mat and it is night and day versus every other option I tried. So, I recommend just buying everything on the following list if you really want to build this.
I’m now going to list the steps I took. This will save you the trouble of reverse engineering from the CAD drawings. In at least one case, my original cuts were wrong because I misread the diagrams. You are welcome.
I won’t lie to you, the platen handle is by far the most challenging part of this whole thing. The steel cable holding it together is the worst bit of that so once you make it past that, it’s all downhill.
Step 1 – take the 3/4” PVC and cut one 13” piece and two 7” pieces. I just used a hacksaw because I had one handy. These will end up in connectors so if the cut is anywhere close to straight, the connectors will be pretty forgiving.
Step 2 – In the two 45 degree 3/4” connectors, drill a hole 0.5” from the end through both sides. The size of this hole is the same as your barrel nuts – 10 mm if you get the European version or 1/4” in my case. The barrel nuts will slide into these and I needed to buy one of the file drill bits and run it around the inside of the hole to get mine to fit eventually. You want the fit tight but they do need to be able to slide in and out.
Step 3 – Connect the 90 degree connectors to both ends of the 13” piece. Put both of the 7” pieces into those connectors. Attache the 45 degree connectors to that, ends with the hole away from the pipe.
Now is the worst part of the project, getting the steel cable in. It is pretty essential, as this is the piece that gives the platen handle stability (it gets the most wear and tear) and building it this way allows for modularity in a way that trying to affix permanently with some kind of serious cement or adhesive would not. Still, this is tough.
Step 4 – Measure for the steel cable. I tried a few things before I hit on what worked. I had a spool of thin gauge wire of the type you use to hang picture frames. Whatever you use to measure, it needs to not stretch so steel wire seemed like a good bet. Put in the barrel nuts into the holes. I ran two lengths of the cable through, enough that I could have a turn around each nut and enough sticking out of each pipe that I could pull it all tight. Pull it a few times because your pipes will be pulled tight into the connectors by this process and you don’t want to measure having a lot of slack. I used a Sharpie to mark the outermost part of the wire while I had it as tight as I could manage pulling with both hands while kneeling on the platen handle.
Step 5 – Pull out the wire and lay it alongside the bike brake cable to transfer that measurement. Swage the connectors down so that when you make the loop the mark is also at the outside most part. I actually put the barrel nut inside my loop before I swaged it down to ensure it would fit. Per directions of the hardware store people, I didn’t bother with a swaging tool. Once I had the fitting placed where I wanted it, I set the whole thing down on waste wood and hit it with a hammer a few times. It worked like a charm.
Step 6 – Put your swaged cable through the handle of the platen. Slide the first barrel nut through the holes and the loop. This is the easy one. Getting the second one in is challenging. I achieved it by using the same wire from my measurement to pull the other end tight. I did this by myself and found it extremely challenging to pull it hard enough to be able to slide the second barrel nut in. I did it with a combination of pulling with the wire and levering with a screwdriver while having the barrel nut through one side of the holes. I recommend doing this with a second person, which might save you the 30 minutes or so it took me. On the good news side, at this point all the hardest work is done.
Step 7 – Using the rubber washer and a steel one and a 1/4” – 20 bolt, connect the platen handle to the platen. I won’t lie, I found having a completed platen with handle in my hand extremely satisfying. I did a few camera tests with just that to test it out and it made me very happy.
Step 1 – Cut a 5” piece of 1 1/2” PVC pipe and a 13” piece. Because you will have bought it in 36” pieces, I recommend also cutting a few other lengths. I have every 3” from 7″ to 13” for the upright piece of my assembly for reasons I will discuss later. While you have it out and are cutting anyway, I suggest also cutting 7” and 10” pieces.
Step 2 – Cut pieces from the threaded rods into 9 1/4” and 19 1/2”. I tried a few techniques but what worked the best was just using a hacksaw and filing the end until I could screw one of the star knobs onto the end. After I completed this project, via a Facebook ad I saw this product which exists to fix screwed up threaded ends of things. i haven’t used it personally but if I had seen this first I totally would have bought it.
Step 3 – Drill a 1/4” hole in the center of the 1 1/2” PVC caps. Threaded rod will go through these. From the photos in the diagrams, it looks like the cap that connects to the clamp has extra holes and extra bolts in it. I did not drill these holes and had a few bolts left over from the original manifest. I am not sure what to do with these, so I ended up skipping.
Step 4 – Assemble the camera arm. On each of the threaded rod pieces, you will screw in two nuts and add a washer about 3/4” in from the end. This may vary wildly so experiment as you need. You will be sticking the balance of the rod through a cap and into an accessory. The long piece will be screwed into the clamp end and the short piece will be screwed into the rack mount. Twist the rod to screw it in to the accessories and then tighten down the nuts until the whole thing is tight. You won’t want these flapping about so make sure to get everything solid and tight.
Step 5 – Place the long piece of PVC over the long piece of threaded rod and slide down into the cap that holds the clamp. Over the top of this, place the sanitary T joint. Repeat the process with the short piece of PVC and the rack mount. At this point, you have the assembly mostly together with two pieces of threaded rod sticking out of the T joint.
Step 6 – For each of the threaded rods, place the 1 7/8” washer over the rod and slide down to the T joint. Screw the star connector into the threaded rod and tighten down. The first time you do this, you’ll be picking up slack just like with the platen handle. Be prepared to turn a little bit over a long period as some settling will occur.
Step 7 – Screw the ball joint adapter into the rack mount. When that is in, screw the smart phone adapter into the the ball joint adapter. At this point, you have the thing basically assembled. Clamp to a table or whatever will be your scanning surface and adjust the top piece so that the phone camera will be trained down on the book.
The original plans call for a 13” piece of PVC for the upright of the camera arm. I found when I started experimenting that for smaller books, that height places the camera pretty far away. My goal is to get as many pixels as possible devoted to the book. When I am scanning small paperbacks or digest sized magazines, I use a shorter length of PVC to get the camera closer. If using a phone with an optical zoom this would be less of an issue but I want as much as possible of the field of view to be filled with book. This is the thing of the whole project most subject to fiddling. Try some sizes and see what works for you. It is not hard to change them out. However particularly when going to really short ones, this may mean that you have a lot of threaded rod to get that star knob twisted over. This part can be tedious, but it is possible.
At one point I got overly aggressive on getting the phone close and got it so close that the platen was bumping the phone. Having the phone moved in the middle of a scan is – if not the worse case scenario – a really bad scenario. I had to seek a compromise of bringing the camera close enough to maximize pixels while high enough to avoid bumping.
A note on the color of the materials: careful readers will note I mentioned getting white PVC parts for the camera arm while noting those parts are black in the photo of my camera arm. I found early on that the white pieces showed up in reflection on the platen, so I spray painted them black.
Mohib lists his LED lamp as optional in the original plans but I found in my early tests that I wasn’t getting enough light on the book and it was very subject to shadows. I ended up buying the lamp and a switch and wiring it up. The plans call for attaching it to the camera arm with a bungee cord, but I could never get it at a good angle. I ended up setting it on a stack of books, which worked well enough. However, some of the light was shining directly into the camera lens so I further hacked this by masking off part of the top with a bit of cardboard. It’s kind of ridiculous, but it ultimately works. This part will almost certainly require fiddling on your part.
Mohib recommends some iOS apps. Here are what I am using as of today on Android. If you find better recommendations please let me know as nothing is perfect.
Bubble Level on Android. I use this to get the original placement as flat as I can. The closer to level the phone is, the less skewing to deal with in post-processing.
The camera app is pretty crucial and needs a few features to be viable. You must be able to control the ISO value rather than be automatically assigned. You want the lowest one your phone’s camera is capable of because that will be maximum signal on the sensors. You want to have maximum control over focus. You need to have some control over the naming. There is a world of difference in assigning a prefix and having your scan photos be PREFIX0001 to PREFIX0100 vs picking them out of some timestamped mess or arbitrary numbers.
I chose Camera FV on Android. I gave the free version a try to verify it would meet minimum standards and then paid for the upgrade to unlock the features I wanted. There are many alternatives so if there is one anyone finds suitable, please let me know so I can try it as well.
After scanning the book the files will need to be transferred to a computer. I find plugging via USB and using Android File Transfer to be painful so I installed Dropsync. In the free version, this allows one folder on the phone to be synced with one folder in Dropbox. I set up my Camera FV folder to be a special folder unique to the app then synced that to a “DIY Bookscanner Inbox” in my Dropbox account. This runs once every 5 minutes so it is always trailing somewhat. While I am doing the actual scanning, some fraction of the photos are syncing. While it is not real time, within 15 or 20 minutes it will catch up so I try to make sure I’m never in a hurry while I do this. Eventually I will batch books together so that many books are being scanned and synced over time. While the last ones trickle in, I can work on the earliest ones.
If there were a version that did the same functionality with the same ease of use but used only the local network, I’d be quite happy with that. Again, let me know if you are aware of such an Android app.
For preprocessing, I use ScanTailor on OS X which seems best of breed. I ended up building ScanTailor Advanced from source using these directions. If building it is beyond you, here is a download of a version I built on High Sierra, offered of course with no warranty of any kind.
For OCR, I installed Tesseract via Homebrew.
For text normalizing, I installed uni2ascii via Homebrew.
For PDF conversion, I installed imagemagick via Homebrew which includes the “convert” command.
At this point, you have the whole frame put together. You have the Android apps installed.
Although I will discuss file naming later, I set the counter and file prefix before aligning the camera because there is no point knocking this out of alignment while entering the text.
Take whatever phone you are using for your camera and place it in the phone holder. I level with the Bubble Level app as well as I can, then place the book down on the Siconi sticky pad and push hard. However hard you think is maximally reasonable, push 15% harder than that. I turn to a nice page with full text to the margins for ease of evaluating framing and set the platen down on it. I rotate the phone to make it as square as possible. If your camera has framing lines, you want the lines of type lined with those. Use the macro adjuster to move the phone up or down if you need to get more or less of it in frame (less is seldom my problem, usually I crank it all the way down.) After lining everything back up, I return to Bubble level one more time just to verify I didn’t knock it way out of level with the adjustments.
Camera FV only allows 4 characters of prefix. I choose three for any given book (I started with “AAA” and increment every time.) I reset the counter to 1 and set the prefix first to AAAR or whatever while I scan the rightmost (odd numbered pages) starting with the cover and working all the way to the inside back cover. Even if the page is blank, it needs to be scanned or else things will not line up later when collated. After scanning the R, I flip the book upside down and repeat the process with AAAL, resetting the counter to 1 again. These will be renumbered and collated in post-processing scripts. You could scan the left pages from front to back, it would just need adjusting the scripts.
If you are ambidextrous, you might scan both sides right side up. If you are left handed you may choose to do this whole process upside down from what I describe so you can work the platen with your left hand. The process of scanning is one of lifting the platen, turning the page, dropping it down and hitting the bluetooth trigger button. Configuring your camera app so the camera is snapped as quickly as possible helps. Mine takes too long and adding a second per photo adds at least 5 minutes to a 300 page book. I’ve tried different autofocus options to speed this up. Thus far I have had limited success.
This part is the practiced skill aspect that improves over time. The end goal is a rhythm of turning pages, dropping platen and capturing pictures as quickly as possible with a minimum of physical adjustment of the platen or book. I still scan at 30% the speed of Mohib in his demo video. If I never approach his speed, that will be a big blow to the viability of this project. I too want to scan ~22 pages a minute in order to get as many books processed and out of my space as possible per hour spent.
Following uses fake example names and values but you can figure out how to map it what you have. The shell scripts I use are available in my GitHub directory for the project. If you want to submit a change, feel free to fork and send a pull request. If you don’t know what that means, just download the zip file and use them that way.
At this point, the book has been scanned. You will eventually have in your Dropbox account or however you move your files a set of files named AAAL0001 – AAAL0156 and AAAR0001 – AAAR0156. I create folders called AAAL and AAAR and move the files into them. If the number of files don’t match, you have a problem. If they don’t match and are missing out of the middle it is pretty straight forward. Look at the ones on either side of the missing file, figure out the page and take the picture again. It may have different alignment and definitely a different name so keep track of what these are. I tend to use a prefix like AAAX to make it obvious that these are fill-in scans that need to be renamed.
Harder is if the files are contiguous yet mismatch in numbers. This means one of the two folders has either too many or too few pictures. If pages stuck together, you will need to find the missing or extra page. I look every 10 files to see if the page number has incremented by 20. If not, the problem is in that range. If a page was scanned twice, I delete one. If it was missing, I capture the missing page and name it such that it will sort correctly, something like AAAL0101a.jpeg. If files have been added or removed, I use my renumber.sh script to normalize back to sequential numbers with nothing missing.
Now it is time to process with ScanTailor. I will do the left and right separately because they will need different orientation correction. The goal is to have ScanTailor do as much as possible automatically. There is a wide variety of options possible, and I would recommend the ScanTailor forum for more information. I could spend as much time discussing ScanTailor configuration as the whole rest of this long post.
I will use ScanTailor to process from the original AAAL and AAAR directories to ones named AAA-stL and AAA-stR. I still need to keep them separate at this point. They will get merged soon. As stated, my goal is to configure and allow ScanTailor to do its work with default auto settings. This has yet to happen 100% for me on any book. I spot check and fix individual pages if needed and hope like hell I don’t need to.
I also create folders AAA-pdfL and AAA-pdfR. This is to allow me to potentially set up different ScanTailor configs for the OCR text versus the PDF of the captured pages. Although I like the flexibility and initially I envisioned an OCR target that strips out the page numbers while the PDF target preserved them, in practice I have used the same ones and just copy the files from -st directories to -pdf directories. These will come into play soon.
After all this pre-processing, now I run my processing script name process.sh. This does the following things:
- Creates folders for AAA-st-renumbered and AAA-pdf-renumbered, AAA-ocr, AAA-pdf and AAA-final
- Copies and renumbers the files from the AAA-st and AAA-pdf directories to the -renumbered equivalents. At this point the L and R are removed, the files are collated such that paging through file 1 to the end shows the book in order.
- Runs tesseract on each page. For AAA0001.tif , it will create AAA0001.txt
- After all files have been OCR’d, it will cat the files into one text file on AAA-final
- It will run uni2ascii on the text file to clean up weird artifacts in the text file
- It will use the convert function on the files in AAA-pdf-renumbered to create one full PDF file of all those graphics. This will typically be between 50 and 150 times larger than the plain text file.
At this point, I have an OCR version of the text suitable for an eInk reading device as well as a graphical version suitable for tablets or desktops. If one were desiring a permanent high quality text file (for Project Gutenberg for example) then the OCR version could be proofread against the PDF.
All of this seems like a long way to go to get a digital copy of a paper book and it is. Unless one values their time at a very low hourly rate it is often cheaper to purchase the book than actually scan it. For readily available books that are reasonably priced, I will just rebuy them. For books (often older non-bestselling books) with no current electronic version available for purchase, this is the best option. I want to archive my collection of 20 years of science fiction magazines before I transfer them to either some archivist or more likely, my curbside recycling. I want the stories and no longer have any desire to house the paper.
For many years I have said I’d gladly trade a number of paper books in my house for a digital version. Now, with a little time and manpower that is possible. I’m flat delighted.
Physical and design improvements or alternate software recommendations (and pull requests on GitHub improving my scripts) are happily accepted. Questions regarding my sanity in pursuing this project will be read bemusedly once then largely ignored. Comments berating me for any legal, moral or ethical implications of scanning my own property for my own use are wildly non-interesting to me and treated as such.
If you build this yourself, please let me know with pictures and your own stats for throughput. Good luck and be careful out there.Also on:
In this episode, I play a song from the Joe Perry Project; I discuss my 14th anniversary as a podcaster, why podcasting was inevitable and why most of history is accidental; I talk about why your 40th is the last realistic birthday to be a big deal and the aches of getting older; I discuss the terrible American eating habits; I close with a bit of a status update on my DIY Bookscanner project.
Here is the direct MP3 download for the Evil Genius Chronicles podcast, September 4 2018
Links mentioned in this episode:
- Support this show on Patreon
- Buy the Joe Perry Project song “East Coast, West Coast”
- Buy Eat to Live at Amazon
- Mohib’s thread on his DIY Bookscanner
- Auphonic podcast production tool is so good!
- Dog Days of Podcasting
NewSponsor: Audible.com! Go to audiblepodcast.com/egc for your free trial and free download!
is nowwas our sponsor: (Still) Use coupon code EGC to save 10% on domain registrations
- Theme song provided by the Gentle Readers
- My Google+
You can subscribe to this podcast feed via RSS. To sponsor the show, contact BackBeat Media. Don’t forget, you can fly your EGC flag by buying the stuff package. This show as a whole is Creative Commons licensed Attribution-NonCommercial 3.0 Unported. Bandwidth for this episode is provided by Cachefly.