26 July 2011

Opening specific sections of a pdf from a browser


There are lots of information/resources out there and even programs (not free) that do lots of things regarding this.


== This is my problem
I have some pdfs that I need to open from a web, opening the specific question.

There are many ways to open a pdf, for more info see this doc
http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_open_parameters.pdf

Opening on an specific page is easy, but if the document is evolving you want to open referencing a section as the pages will probably change.


== Looking for a solution
The question is, how do I extract/collect/list the named destinations from the pdf file?  Named destination is the technical name for those "bookmarks" inside the document.

Again, for this there are tools out there, and some of them quite expensive. But I work on GNU/Linux and I know there is nothing I cannot do myself when I use this OS, as the OS gives me full controll over everything (blah blah ... I won't bore you more).


== Solution
I tried different simple approaches and the solution was ... on the source, the pdf file itself. You can open it with "less" and see the structure.

In my case, I can extract the named destinations with this command.


# egrep --text -o   "/GoTo/.*|/Title.*"        file.pdf

Output example

/Title (Configure .....)
/GoTo/D (_OPENTOPIC_TOC_PROCESSING_d0e79544) >>
/Title (Setting Up .....)
/GoTo/D (_OPENTOPIC_TOC_PROCESSING_d0e79474) >>
/Title (View .....)
/GoTo/D (_OPENTOPIC_TOC_PROCESSING_d0e81295) >>
/Title (Running .....)
/GoTo/D (_OPENTOPIC_TOC_PROCESSING_d0e81225) >>
/Title (Index)
/GoTo/D (ID_INDEX_00-0F-EA-40-0D-4D) >>

So to go to the section Index you just need to go to http://path/file.pdf#ID_INDEX_00-0F-EA-40-0D-4D


Too Cool for Internet Explorer