Building a self-hosted "Instapaper"

Twitter _ @pr1001_ @cataspanglish It_s the sc ...

(updated 21/05/12)

I love Instapaper and (to a lesser extent, now that it is Pocket) Read it Later – “simple” apps that do what they say on the tin, saving stuff you want to read on the Internets for, well, later, formatting text in an easy to read way, and giving offline access to them. Great for the school run on the U-Bahn or the smallest room, those places where I get most of my reading done. What I don’t love is that my reading list is stored on a server somewhere that I have no control over, to be data-mined, sold off or cross-referenced to build a picture of me. Also these apps change constantly and require upgrades to operating systems and hardware.

So I’ve realised that I want the functionality, but I also want the information stored somewhere I have access to, not in the nebulous third party cloud. In fact, as much as possible, I’m moving all my stuff to somewhere I control as more and more attacks and takedowns happen, and as services are bought, sold or disappear taking my data with them or simply getting rid of it. Pete Ashton created his own “version” of Tumblr for similar reasons, but I guess doing an “Instaper” would be a bit more complex. I guess I’m looking for something like *diaspora or status.net

Has anyone got any ideas on how it could be done?

So I’ve got some answers via twitter & skype:

@jonhickman chirped in on twitter –

my half asleep pre coffee mind is thinking a bookmarklet that runs a script that scrapes text and dumps it somewhere…

the easiest ways to do that use services you’re trying to avoid but you could do it a harder way 🙂

in the meantime I suggest read later lists in PinBoard – at least they have a clear commercial relationship with their users

I think they have ifttt channels so could you use that to email them? Then d/l when in wifi. I’ll have a look later

@codehead skyped in –

How ’bout saving a page from Pocket to Dropbox? using ifttt – Ultimately, the page would end up in your HD.

you wouldn’t bypass the cloud, but you’d end up with complete control over the actual archive.

Check http://www.zeldman.com/2011/02/11/readability-2-0-is-disruptive-two-ways/

Interestingly enough, Readability has an API

http://www.readability.com/publishers/api

So technically it’s possible to leave the heavy lifting to Readability itself

(however cloudy it might be)

Oh, looks like the API is for-pay, too

http://www.readability.com/about/terms#view-apiGuidelines

BUT pages sticking to Readability’s guidelines should be really easy to scrape:

http://www.readability.com/publishers/guidelines/

In short, we could build an extension for this, but we maybe subject to the concerns outlined in Zeldman’s article.

… and of course  @pr1001 brought up the subject of algorithms in the image above

Anymore for anymore???

@stef sez: trivial using open source Readability. Eg. https://github.com/basis-technology-corp/Java-readability and a little database for state – read/unread/fave/etc. heroku app in JRuby perhaps, little API in sinatra. Done.

So it seams like it  can be done – thanks for all the help peeps!