iOS SDK PDF Limitations


Introduction:


The purpose of this article is to measure PDF limitations faced while developing iOS apps like the ability to search, zoom, text wrapping (reflow), text selection, text extraction (Arabic), license of PDF libraries (if any).

Notes:

  • We performed our research on both Arabic and English PDF
  • I will reference an Arabic/English PDF file called “Al3oshr Al a7’er”, this file can be found here
  • Conclusion can be found at the end of this article.


Features & Limitations:

  1. View/Zoom/Scroll: I was able to load and view PDFs correctly with zooming/scrolling ability for all PDF I got. I could do this using two methods:
    1. Using UIWebView
    2. Using Core Graphics iOS APIs and libraries built on top of it like this one.
    3. VFR PDF Reader
    4. QLPreviewController
  2. Text Extraction ( and Selection/Copy and Paste): I was able to correctly select and copy/paste PDF “pages” that contains only English letters. But some PDFs like “Al3oshr” that contained Arabic (with English) letters failed to read its text. Text I got are of strange encoding. I tried to extract text using 2 methods:
    1. Javascript
    2. Using pasteboard
    3. I also found this code but did not try it as it does not mention text encoding
    4. I found a very helpful presentation at:
      The presentation described the cause of the problem which is related to how PDF stores Arabic text. Text is stored as glyphs (drawn characters) and each glyph has an encoding specific to that PDF and that encoding is a number that is stored in Font object that should maps to Unicode. The process is a bit complicated as spaces are not stored (PDF draw a glyph (or more) in a x,y so spaces are not stored).
      My main trials was on “Al3oshr” file which returned strange characters (upon copy and paste).
      I also tried an Arabic PDF I created using OpenOffice and it was read correctly.
      I also tried this
      PDF and I read Arabic correctly
      I also tried this
      one and I could not read Arabic.
  3. Text Search: If we where able to extract text correctly, searching can be done smoothly (see “Text Extraction).
    1. I tried the commercial library FastPDFKit  and failed to search with it in “Al3oshr” file. I also contacted their support (about Al3oshr file ) and they replied “searching and extraction of Arabic text is supported, as long as certain requirements are met – it depends on the encoding and how the text is stored in the page stream. Hopefully, the document you submitted will help us improve the library.”
    2. Also PDFKitten does not support non-latin characters and it mention that PDF specs are huge…
    3. This post contains a lot of links that describe most of problems
  4. Text Re-flow “Text Re-flow means that texts get automatically adjusted to fit the screen regardless of zoom level selected”)
    1. Initial search says PDF wrapping not possible, I learned that from this thread. You may test the apps they mentioned. I tested Stanza and it does do any wrapping to PDFs though it does so with epub.
    2. On the other hand there other apps that say that they provide text Reflow feature (even on Android), like:
      1. ezPDF Reader
      2. GoodReader for iPhone
      3. Even Android has Text Reflow apps.
    3. In general it is hard to perform Re-flow as it is said in this answer as it depends on how text is stored in the PDF (as stated by Adobe itself)
  5. Text Highlighting: Quite hard as stated in this thread


Conclusion:

  1. View/Zoom/Scroll: Can be easily implemented for all PDFs
  2. Text Extraction/Selection/Copy & Paste: Can be easily implemented for pure English PDFs, but hard or PDFs that contains non-Unicode Arabic, like “Al3oshr “PDF.
  3. Text Search: Same as Text Extraction
  4. Text Reflow (Wrapping): Implementation is hard and not feasible for all files. A work around for this is to use Landscape orientation while running on iPhone/iPod Touch. iPad will not need reflow.
  5. Text Highlighting: Implementation is hard.
Advertisements