I recently posted about finally finding a workflow to properly convert a cache of old Doc Savage books to ePub files.
In the last installment, I used Calibre to convert the original BBeB format (Sony) files into rich text files. Then I was using Nisus Writer Pro to clean them up, and then finally copy the text to Apple Pages for the final tweaks.
This was workable, but I learned a few things, and improved my processes:
When I got my first eBook “Reader,” a Sony PRS-700, in 2008, I went stumbling around for books.
One of the things I stumbled on was an archive of the 180+ Doc Savage books, in .lrf format, for the Sony Reader series. (yes, I am aware that they were not “legal” to download, so if you are offended you can move away now).
The .lrf format worked great on the old Sony reader. Table of contents, chapters, and all navigation worked.
I thought I had written about this before, but apparently, searching my archives, I haven’t. Today’s the day I guess.
It is no secret that I have a voracious appetite for reading. It started young, when I was in High School, and was heavily Science Fiction oriented. It was escape from some reality, and I doubled down.
My introduction to Doc Savage came much earlier than that though In grade school, my dad gave me one of the paperback reprints for Christmas. I read it, but since i hadn’t developed a passion for the printed word at that point, I really just read it and put it down.
Fast forward until I got my first e-reader. I was googling around looking for things that were free (i.e. in the public domain) to load up on it, and I found a link to the 162 Doc Savage novels. Not sure where I found it, but I grabbed it, and loaded them up (later, I learnt that they were not in the public domain, but copyrighted, and owned by Conde Nast publishing. However they just sit on the rights and don’t make them available for purchase. Boo.)
I whipped through them quickly, enjoying the tales immensely. They were quick reads, they were written to attract the attention of a 15 year old boy, and unlike the comics and superhero stories, there was nothing magical.
I started this on a whim in 2009 more as a way to just post some top of mind items, and whatever tickled my fancy.
11 years later and more than 1,000 posts, and I have yet to go viral even once. No matter, this is for me. If anyone else gets any value from it, then so be it.
As a voracious reader, who primarily focuses on Science Fiction, I do branch out. One genre that I enjoy is the detective thriller. This penchant can be traced to my love of the Doc Savage stories of my youth, and has jumped into some more or less serious threads of what I read. From the flippant Stephanie Plum novels (a guilty, fun pleasure) to the work by J.A. Jance, I have enjoyed many a cliff hanger stories.
However, lately, I have become hooked on the Lew Archer novels by Ross Macdonald (pen name of Kevin Millar). Set in Southern California, and beginning shortly after the war, they stretch for 20ish years, and are lively depictions of the changes that the boom years brought to that part of California.
The principal character, Lew Archer, is a private investigator, a lone gun, whose marriage failed, and who hung out his shingle after being an LA Cop. Unlike the friction you find between many TV private eye’s and the police, you get the impression that the local constabulary appreciate, and respect Archer.
The stories often start with a missing person, or someone desperate for help, and the first person narrative draws you in, and holds your attention for 250 or so pages, almost always with a surprising twist at the end that keeps you guessing.
One thing that I enjoy about these stories is that they don’t telegraph the antagonist. You often are truly surprised in the outcome, or that the obvious villain isn’t the culprit, yet, the obvious villain is rarely unbloodied at the end.
One of the books that I just finished, The Wycherly Woman, was a classic example, where you thought you had it figured out, and then WHAM, it was a total surprise at the end. Additionally, this one was set in the San Francisco Bay Area, and having grown up there, it was a pleasant read about places I know well.
Millar’s writing style is crisp, his vocabulary is deep, and he does a fantastic job of engaging the reader in these page turners.
You could get wrapped up in a far worse series of novels. I have been through 10 or 11 of these, and have thoroughly enjoyed each one of them. Highly recommended.
I have a kindle, and I enjoy it. I haven’t always had a Kindle, I started as a Sony reader fan, and then an iPad user, but I succumbed to inevitability, and bought a Kindle.
I like it. I do prefer a eInk reader to a tablet, and today, you have to work really hard to live in this space and not use a Kindle.
I buy lots of books. Most are just throw-away pulpy fiction that I enjoy reading. Like the Doc Savage series (modern), or The Destroyer series. Mostly they are a couple of bucks, I enjoy them and delete them from my Kindle.
I have also borrowed a couple books via the Prime lending library. I wish I had something to say about that, but really, it is trivial to borrow, read, and “return“. Very uneventful.
Now I am struggling with joining Kindle Unlimited. Looking at the books included, much of the pulpy fiction things are there. So it would probably save me a few bucks (but not much, and I rarely spend more than $10 a month on those throw-aways.
But the convenience of Unlimited is tempting. Grab a book or 5, and try them. If they suck, you aren’t out any money.
The ethical qualm is how little of that $10 goes to authors. You have to read some percentage of the book for them to get any money, and the fee paid to them is low. Why should I care?
Good question. Unlike the average Slashdot user, I don’t subscribe that the near zero marginal cost of an e-book means I should pay pennies for it. I know how much effort it is to write, edit, and package even an e-book. I believe that the written words are the value, not the paper, ink and distribution costs.
Herein lies the problem. Kindle Unlimited appears to be a bad deal for authors. They are pressured to participate, but, like Spotify, the amount of subscriber or advertiser money that trickles to them is minuscule.
I prefer to spend the few bucks, have more of that go to the authors, and hopefully, they will continue to write things I want to read.
So, while Kindle Unlimited seems awesome, and a great deal, I will continue buying books, as I believe that will help the authors make a living, and thus not have to go back to a day job to put food on the table.
coda
Yes, I still use Spotify. However, I have bought many albums based on things I have found there. I find that if I really enjoy (read: replay songs) an artist, I will buy their album(s) to help support them.
I have admitted to being a fan of the Doc Savage stories in the past. Fun, targeted at teenaged boys, and quick reading, they are the classic adventure stories. I read all 181 original stories, as well as all the modern additions.
I have been searching for a similar series of stories, and am currently tasting the Matt Drake series by David Leadbeater. Not really in the Doc Savage mold, but more of an Indiana Jones on steroids, searching for relics, and fighting with organized mobsters (governments or whatever) to save them from evil plans.
Good action, somewhat believable plots (as long as the idea of Norse gods being real and 500M years old isn’t too far fetched).
Not sure I will stick with the whole series, but it has started well.
As I have mentioned many times, I have been a long time satisfied user of my reader and ebooks. Certainly better than hauling around a lot of dead trees when I travel.
All good. I have been building a collection for more than 5 years now, from a variety of sources, many commercial, but also many of the free sources (Project Gutenberg) as well as some other sources for out of print books that are ahem less than legit.
Most of the commercial options are DRM encumbered, so that I can’t peek inside with impunity. But all the others are open books, so to speak, mostly ePub format. There are some great tools to work with.
Sigil – a WYSIWYG ePub Editor
Sigil is free, open source, and pretty solid. It will help you put together a book, and fix minor errors.
It is a good place to start to figure out the ePub format.
ePub are pretty straightforward HTML with some special attributes. You can do just about anything that you can put on a web page (within reason, no javascript or animations).
But you can tweak up the look and feel of the book with stylesheets, inserted graphical elements, and all the other tricks that you can use with web pages.
Calibre – An open source library manager
Of course, your reader probably comes with software to manage its files, You will find that it is pretty limited. Perhaps you have some old files in one of the dead or dying formats (.lit, .lrf, BBeB etc.) Additionally there are a lot of eBooks in plain text format or Microsoft Word format.
It is helpful to be able to shift formats, and to clean up some of the glitches.
Enter Calibre. An open source, multi platform (Mac, Windows, Linux) environment for managing your library. It groks all the standard formats, and converts between them seamlessly. It is extensible with plugins, and it can help you clean up books as well as transcode them. Additionally, it connects with several sources to get covers, meta data, and other tangibles to improve the user experience.
It can be used to take HTML files or word processing files (RTF or .DOCX) and turn them into eBooks in any format.
Being a powerful package, to get the most out of it, you really need to understand what it is doing, and how to optimize the settings. By default it does an OK job, but as in many cases, garbage in equals garbage out.
Some issues
Why is this a problem? Well, it is because a lot of the free or community books are poorly formatted to begin with. Also, some sources in general suck. Often, I will find an out of print book that was scanned and OCR’d. Often this is turned into a MS word file. Until recently, you needed to save that file as an HTML file and run it through Calibre.
Calibre uses some pretty heavy stylesheets, that mostly look OK. The ambitious person can customize them easily, if you know what you are doing. Of course not every reader can handle all styesheet formats, so it can be a trial and error process.
Of course, there are some things that really foul up any book. Anything output by Microsoft Word uses a class structure that is insane. If you see class=”msonormalxx”, you know that you are going to have an ugly book.
RTF files are not much better. They typically have a lot less funky classes that are tossed in, but the conversion does glitch in some spectacular ways.
ePub versus other formats
I have a pretty large colletion of the Microsoft ebook format (.lit) and the old Sony reader format (.lrf) that I convert to read. Both these formats can be problematic.
The Sony format leads to ePubs with some really whacky xhtml coding in them. Really ugly to try to clean up. Additionally, they have odd chapter breaks, and pretty non functional Tables of Content.
Fortunately, it isn’t too difficult to clean them up, but it is time consuming. You need a few tools.
An HTML stripper. There are several options, but I use a simple app for my Mac HTML Stripper A reasonably priced utility. There are some free ones, but I like to support small vendors, and $15 is a good price for this tool.
The HTML stripper will give you good plain text. You will need to reformat that into clean HTML. Fortunately, Markdown is a fabulous way to do this. I use Mou for the Mac (free, but do donate to them), and MarkdownPad on my PC. Again free, but the pro version has some nice extensions, so it might be worth spending the $15 to buy it (I have).
The clean up workflow
First I extract the raw HTML. I do this chapter by chapter. It is best to create an ePub with one source file per chapter. That makes for clean chapter breaks, and a well functioning table of contents.
Then I run it through my HTML stripper. That gives me clean text file. It will likely have odd numbers of breaks in paragraphs, and some other interesting things. Fortunately that doesn’t matter.
I then import that text into my markdown editor. Add a chapter title in h1 and then you have a nice complete chapter to drop back into the epub. (every markdown editor has a “copy to HTML” function. Works great.)
Lastly, I build a new epub using Sigil. Add meta data, a cover, and construct a table of contents, and you have a nice book.
But what if you want to read it on your Kindle?
Of course, the Amazon kindle doesn’t support the ePub format. So you need to convert it into either an .AZW3 or a .mobi format file.
Calibre to the rescue again. Trivial, and the defaults are pretty good for conversion.
And naturally, you use Calibre to transfer or manage your library on the Kindle (this is only for files you didn’t buy from Amazon). Works like a charm.
Coda
I got into cleaning up ebooks when my collection of old Doc Savage books. Circa 2008 I found a repository of them in Sony format (I had a PRS 700 reader then), and the 181 original Doc Savage stories were a joy to read.
But they convert poorly into ePub. When I lost my PRS700, and replaced it with the PRS 600, the support for .lrf files was removed. My only options were to convert them. Calibre converted them, but it did a lousy job.
The last few days, I have been using the workflow above to clean some of these books. It takes me about 35 mintues to create a crisp, clean, and standards compliant ePub from a completely ugly converted ePub.
A labor of love.
Having a new Kindle is giving me the motivation to fix some my my titles.
I am a gadget person. I have always loved tech, and have often been on the leading edge of trends and an early adopter.
One category that I dove into head first was the e-Reader trend. I first stumbled across them in 2006, when Sony launched the PRS 500. I didn’t jumped then, but I had my eye on them.
At the time, I was traveling the better part of 50% of the time. Being a life long reader, and a SciFi junky, I was always hitting the used book stores and carrying 10#’s of book with me on my 2 week international trips. A definite burden.
Of course, the idea of an electronic book with a large number of books stored on it was a dream.
The first touchscreen reader, the Sony PRS 700
When Sony launched their second generation reader with the first “touch screen” reader, I pounced. I bought one of the first PRS 700’s, and loved it. I bought lots of books, and even found a fair number of public domain free books (the Doc Savage series was a good, quick read).
I probably put 500K miles of traveling with that reader, a constant companion. I probably had 500 books on it at any one time. It allowed me to have a wide selection of titles, including my favorite Science Fiction, some contemporary fiction, some technical references, and some classics. My tastes range widely.
Then one day in 2010, somebody decided they wanted it more than me. So I found myself without a reader.
In the interim, Amazon launched the Kindle line of readers, and a pretty wide selection of ebooks. The first Kindles were toy like, and pretty cheesy feeling (I had many friends with them). However all my books were in ePub format (the “standard” ebook format), whereas the Kindle used a proprietary format, based on the common “Mobi” format.
So I really didn’t consider the Kindle a suitable replacement.
Off to Best Buy and I went home with the successor to the PRS 700, the PRS 600. Still touch screen, and my library transfered over smoothly. One of the nice things about the Sony readers is that they allow expansion of the onboard storage with the Sony memory stick pro, and SD cards.
THe PRS 600 was a bit of a disappointment. The eInk display was fine, but the resistive touch screen made it full of glare. It also missed the built in LED light to read after dark, something that I did enjoy on the PRS 700.
I used the PRS 600 for a long time, until I picked up my iPad in 2011. It was a far better reading experience, and since all my library was ePub, it was trivial to use it.
It did have one other weakness. The battery sucked. It never gave me the expected lifetime for reading. I probably needed to charge it after 12 hours of reading. And it died early. By the end of the first year, the battery stopped holding a charge.
Fortunately, it wasn’t hard to find one, online, and it was easy to replace. But like the original battery, its life wasn’t great out of the box. Whether Sony had to compromise on the battery capacity, or whether there was some constant draw, it was a bummer to have the battery expire as quickly as it did.
Fortunately the arrival of the iPad, it pretty much was relegated to a drawer.
Enter the tablet for reading
In 2011, for my birthday, I splurged ang bought an iPad. While it didn’t have an e-ink display, it did have a great display, and I had no trouble reading on it. All my library moved easily, and I had tons of storage space.
Of course, the iPad lasts for 12 hours of reading easily, so long plane flights are not a problem.
But the display wasn’t as satisfying as the e-ink display. That and the constant distraction of email notifications, facebook, or even a quick hand of solitaire.
The iPad still is in my stable, but I have augmented it with a first generation Google Nexus 7 tablet. Excellent display on a 7″ tablet, and good book reader applications. As well, a really good integration with the Google Play store books. I have bought many books from there, so it was really convenient.
But its battery sucks really bad. I can get about 4 – 5 hours of reading before it shuts itself down. Ok if you can charge it every night, and don’t count on it for a long flight of reading. But that is a pretty big limitation.
Back to a Reader
As my travel schedule is going to ramp up this year, I know that I am going to want a reader for my books. I remain a voracious reader when I travel, so it is an easy choice.
There are still a few options out there. Sony still has a full line. Kobo is a smaller, open option. And naturally the Kindle.
A lot of players have come and gone. Barnes and Noble’s Nook line, while still available, is becoming a weak player.
So, I started looking into the Kindle. I still have a huge library of ePubs, but that is less of a detriment than it used to be. The Calibre package makes it child’s play to convert to different formats.
As I mentioned in my last post, the Amazon store has a great experience, and a large selection of books. And since I have been buying dead tree books from them for 14 years or so, they have a pretty good idea of my tastes.
I started slowly, with the Kindle app on my Nexus and my iPad. A couple of free books to start with, and I think I can live with their eco system. My Paper White Kindle should arrive any day now. I expect it to have a great display, with a backlight, and a seamless ecosystem.
Next up: a detailed review of the Kindle Paper White
Once I get it, I plan on doing a thorough review. I will get it setup, connect it with my Calibre library, and try it in a variety of scenarios.
I have wholly embraced the eBook revolution. As a long time traveler, and SciFi aficionado, I have assembled a large collection of books that I continue to read to mark them off my to do list.
Being a fan of science fiction, I have been forced to acquire some of my books by extra-legal means. Since many of the classic tomes of the golden era of SciFi are out of print, and have no official ebook release to buy, I turn to the internet.
With few exceptions, these books are scanned and OCR’d from print, and then stuffed into a file to read. Lots of early Heinlien, and obscure authors exist only this way.
The problem, OCR still sucks. Even the best algorithms barf a lot on text and thus there are spots of garbage in many of these books.
I sometimes make it a personal mitzvah to clean up a book.
Classic example was the “To the Stars” trilogy, by Harry Harrison (his real name, not a nom de plume). It was a rather poor scan and conversion to an RTF file. It was a painful process to fix, but totally worth it, because it made the book completely readable.
However, if your book is in ePub of PDF format, you have fewer options.
Sigil, a pretty awesome open source ePub editor
The program I go to is Sigil. Provided there is no DRM, you can open and inspect the book, and fix small things. If you are savvy, you can also dive into the CSS stylesheet and alter fonts, indents, and other text properties (but be warned, some readers ignore much of the CSS codes and classes – I’m looking at you Sony Reader).
Sigil allows you to look at the text as it renders, at a split screen with the code below the rendered text, or just pure code. You can fix a lot of errors and glitches with the search and edit the code, saving back to the original file.
A future series of posts will go into depth on how to better structure the ebook.
Another good program, and one that is widely used Calibre. A library, and file manipulation program, it is open source and extensible. It makes it easy to convert from one format to another (Kindle to ePub, or LRM to ePub, and many other options.)
A nice touch is that in Calibre you can better setup the ISBN, the cover images, and get data on the book from public databases. I used Calibre to convert a collection of Doc Savage stories from the lrm format (the original Sony Reader format) to ePub, and to add good cover pictures.
In fact, most of the ebook files I look at in Sigil have signs of being converted/cleaned by Calibre, even some commercial books.
Doing this work, you find some things like:
Files which came from Microsoft Word – littered with the “class=msonormal” tag. Ugh. I don’t usually curse too much about microsoft office, but what it outputs for HTML that is converted into an ebook is a crime against humanity.
Most ebooks, even commercial, professionally edited and assembled ones, have horrible structure. Not proper links to the chapters, nor proper tables of contents. Commercial books are much more likely to get this right, but it is a disaster on the community sourced works. I am working up a process to fix that.
There are some truly shitty OCR engines out there. Even high priced, high performance engines have trouble, the second tier is atrocious. Someone once grumbled on Slashdot why there weren’t any good (free) open source OCR engines, and the answer is that because it is friggin hard, and it often becomes a lifetime’s work to tune and improve the algorithm, so the good ones are not in a hurry to be given away.
I rarely make a mission to fix an ebook, but when I do, I want to leave something that is a better experience to read.
(For the record, if there is a place to buy a book, I will always buy it, but much of what I read is esoteric, or out of print, so I am forced into alternatives. )