Capturing Web content from Firefox to Org

Emacs is a powerful tool but it’s better to use other means for Web browsing, such as Firefox. Now the question is how to transfer pieces of Web content from Firefox to Org mode. Org mode already provides means for communication with external applications. org-protocol.el is a general mechanism for importing information to Org mode via emacsclient, but its setup is not instant and I hadn’t bothered to configure it until I met org-protocol-capture-html. The screenshot of the captured content converted to Org markup was irresistible so I decided to give capturing Web content another try.

That attempt reminded me that I hadn’t ranted about software setup and bugs for quite long time here. I’m not going to fix that now, it suffices to say that utilizing a relatively simple function shouldn’t require advanced technical knowledge and/or several hours of googling and experimenting; I really can’t imagine how a non-advanced user could get that thing run without losing his patience at early stages of the process. Well, so I’ll try to make a summary of how I got it working.

Emacs part

I assume you already can use Org mode and emacsclient.

Plain text capture

Add org-protocol to org-modules variable.

Define entry for capturing Web content in org-capture-templates variable, e.g.:

(add-to-list 'org-capture-templates
             ("w" "Web site" entry (file "~/org/notes.org")
              "* %?\n%c\n%:initial"))

Of course, this is just an example. Look at org-protocol.el documentation for another example and for explanation what %:initial means.

If you’d like to use a letter different from w for the template, you can do so but you must replace it in Firefox bookmarklets and helpers below. See also org-protocol-default-template-key variable.

Capture with HTML conversion

First, configure plain text capture as described above. Then fetch org-protocol-capture-html.el from its home page and put it into your site-lisp directory. Add the following lines to your ~/.emacs or other Emacs initialization file:

(require 'org-protocol)
(require 'org-protocol-capture-html)

Note that org-protocol must be already loaded at the time org-protocol-capture-html is loaded, otherwise the corresponding subprotocol won’t be registered.

Firefox part

The easy way

Install Org-capture for Firefox. It allows capturing content without the need to register org-protocol: handler in Firefox. However, org-protocol-capture-html or other custom captures won’t work this way.

The advanced way

The following installs universal capturing mechanism via org-protocol: handler in Firefox. It works independently (with or without it) of Org-capture Firefox extension mentioned above.

Register org-protocol: handler as described in MozillaZine Knowledge Base (replace foo with org-protocol). One important thing they forgot to emphasize is that you must use real link to invoke the application dialog, typing org-protocol:something into the address bar doesn’t work. For your convenience, I provide an org-protocol link here. Select something like /usr/bin/emacsclient in the Firefox dialog as the application handling org-protocol.

Then define your capturing bookmarklets. If you don’t have Bookmarks Toolbar enabled, enable it by right clicking on a Firefox toolbar and selecting Bookmarks Toolbar. Then create new bookmark in Bookmarks Toolbar section and insert the following code as its URL:

javascript:location.href='org-protocol:/capture:/w/'+encodeURIComponent(location.href)+'/'+encodeURIComponent(document.title)+'/'+encodeURIComponent(window.getSelection())

This is for plain text capture. If you want HTML capture, define another toolbar bookmark and use the code from org-protocol-capture-html home page (it’s also available in the introductory comments in org-protocol-capture-html.el) as its URL. Just make sure that:

  • The bookmark URL starts with javascript:.
  • Pandoc is installed.

Now you can (optionally) select part of an HTML page and press one of the newly created bookmark buttons in Bookmarks Toolbar. If everything is set up correctly, the selected part of the page (or just page URL and title if nothing is selected) should appear in your Emacs capture buffer.

Getting rid of Bookmarks Toolbar

If you don’t use Bookmarks Toolbar in Firefox, you probably don’t want to waste screen space on it just for Org capture bookmarks. The remedy is easy, invoke Firefox Customize and drag Bookmarks Toolbar to another place. Alternatively, you can use Custom Buttons Firefox extension.

Notes

Some Web pages can’t be captured, I don’t know why. I have more important things to do than playing with Org and Firefox further.

Another useful Org related Firefox extension is Copy as Org-mode. It doesn’t capture content via org-protocol but can copy some objects such as page or link URLs to kill ring, in the Org format. This is what I used to insert links into this article! It’s easier with this nice helper than performing all the copy&paste&edit by hand.

3 thoughts on “Capturing Web content from Firefox to Org

  1. Hey, I noticed your blog post. Thanks for writing about org-protocol-capture-html. I’m sorry that it’s complicated to set up, but I’m afraid that’s just the way org-protocol is. Since everything has to go through a URL and MIME-protocol handlers, it can only be simplified so much.

    However, I tried to write the instructions as clearly and succinctly as I could. As far as I know, if you follow them as they are written, it will work. So I feel a bit disappointed that you felt like you had to re-explain the steps on your blog. I also see that you have said to register a protocol handler with Firefox, which I don’t think is the best way to do it, since that only works with Firefox. Instead, if you register a MIME handler with the system, any program can call org-protocol.

    Anyway, if you feel like the directions on the org-protocol-capture-html readme are not sufficient or not clear enough, I’d be glad for some feedback on how to improve them. My goal was to fix the problem of people having to rewrite the instructions over and over in many different places. Please feel free to open an issue on the tracker and we can work together to improve them. :)

    • I wasn’t able to make org-protocol-capture-html working by following just your instructions, I spent a lot of time searching for some steps, especially the Firefox part was tricky. So I’ve written this post, partially for the purpose of having the process documented for myself. I’m glad to know that you’d like to add missing pieces to your documentation, so I opened new issue on it: https://github.com/alphapapa/org-protocol-capture-html/issues/9

  2. Pingback: Aprendiendo GNU Emacs y org-mode (VIII) – Quijote Libre

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>