I recently posted about finally finding a workflow to properly convert a cache of old Doc Savage books to ePub files.
In the last installment, I used Calibre to convert the original BBeB format (Sony) files into rich text files. Then I was using Nisus Writer Pro to clean them up, and then finally copy the text to Apple Pages for the final tweaks.
This was workable, but I learned a few things, and improved my processes:
First – OCR Conversion issues
Not sure if this was caused by the original conversion from print, or the translation into the BBeB format, but instead of proper CR/LF’s it was littered with soft newline characters. Perhaps in the late 1990’s when they were originally ripped, this was state of the art.
This caused some really weird things when it created a ePub. Weird text display, and faulty chapter breaks. Oddness.
Sadly, I figured this out after completing 20 or so conversions.
Fixing the newlines into proper paragraph breaks
Fortunately, there was a really kickass utility called Wordservice – freeware from Devonthink, that adds a bunch of really killer services. Installing it, adds a slew of options in the “Services” menu option under the Application name menu item (sorry, Mac only).
The “Reformat” option goes through and replaces the newlines with CR/LF as is proper. Boom. And it works in all programs that allow you to create or edit text.
Multiple line breaks
Not sure if this was an artifact of the original conversion, but each paragraph had a blank line between them. Perhaps the original conversion to BBeB didn’t have any formatting options to set line spacing.
Whatever, it made the amount of text per page really sparse.
Wordservice to the rescue again. It has an option to remove multiple feeds.
In about 3 minutes, I was able to take a munged rich text file, fix all the newlines, remove the multiple spaces, and check each chapter.
Towards the end of my 181 books I fixed, I was getting quite efficient.
The original conversion
Speaking of a “labor of love”, whomever originally took printed pulps, cut the bindings, scanned them, fixed the OCR fluff, and created proper books with chapters and formatted them for BBeB was dedicated.
I am astounded by how few typos, or fluff items I had to fix. They even captured some commonly used (but authentic) word choices. Makes the 181 passes through my workflow seem trivial. I really wish I knew who did that original work.
Three weekends, probably 30 hours total, 181 books, about 15 redone as I improved my workflow. I now have a very clean collection of all 181 original Doc Savage stories.
If anybody would like a copy, drop me an email and I will make a link available.