November 7, 2019

Why We Need an Open, Universal Standard for Editing Documents

James Mawson interviews Dr. Tamir Hassan

If you work any kind of desk job

You Might Spend Way Too Long Fiddling With Documents

I mean, sure, the simple stuff is mostly easy..

..but by the time you’re moving data in and out of tables, preparing charts and images in one program to put in another, all those little tweaks and corrections?

That’s a massive time sink.

One guy who thinks this takes way too long is a guy who has spent far longer on documents than any of us. He’s Dr. Tamir Hassan, who’s been neck deep in document engineering across universities, government and the private sector for nearly two decades.

He’s now spearheading the Editable PDF Initiative – a one man mission to simplify and improve the lives of everyone who works with documents.

I caught up with him to learn more.

Q: This isn’t exactly your first rodeo, is it?

Your 2010 doctoral thesis was on extracting data from PDF, you worked on document technology in academia and then at HP Labs, you’re active in the PDF Association.

What specifically got you this interested in this stuff? What were you doing before that to lead you here?

A: Yes, that’s right. I have always been fascinated by the interplay of technology and design, and what makes documents so special is that their visual appearance is a function of both rules and design decisions, which need to be applied correctly in order to communicate effectively. This is what got me into this rather niche area in the first place.

Since getting into computers in my early teens, I have found the tools used to author, design and edit documents, and by this I mean word processors, desktop publishers and other software for charts, graphics, etc. to be inefficient and tedious to use, particularly as the content becomes longer and more complex.

Fast forward 25 years or so, and the situation hasn’t really changed at all. We now have PDF as the de facto standard for final form documents, but despite dozens of updates in the meantime, programs such as Word and InDesign still essentially function in the same way.

Q: How far have you gotten with Editable PDF?

How would you describe the stage you are at now?

A: The project is at an initial proof-of-concept stage and a paper was published at last year’s DocEng conference.

Q: How are you supporting this work?

A: As the work is unfunded, it has remained a side project up to now, but I am hoping it will gain traction in the future.

I am also a co-founder of a start-up in a related area, Recognito AI, which specialises in locating and extracting data from documents, particularly from tables, by detecting their visual structure. There’s a lot of overlap between these two areas (the visual structure is also a prerequisite for editing PDFs) and I can see Recognito’s work supporting that of Editable PDF once things get off the ground.

Q: What is the main problem that Editable PDF solves?

A: Whenever you share a document with somebody else, be it a letter, a report, a poster or a newsletter, you have to make a choice.

If it’s in its final state, you can send it as a PDF, which gives you a cast iron guarantee that it will look and print flawlessly.

Otherwise, you have to use the application’s native format, such as Word or InDesign, which isn’t as robust; the technical differences between platforms and program versions might cause the text to reflow, causing changes in the layout.

“Editable PDF is a new file format that gives you the best of both worlds.

As it is based on PDF, it is just as robust.

Yet it allows the document to be edited across a variety of applications.”

Q: So how does it work?

A: Well, if you’ve been using computers for long enough, you might remember the days when a word processor was just that; you could write text with it, and nothing more. Illustrations were drawn using a vector graphics program and it was all put together using DTP software.

But with the advent of GUIs, word processors started getting graphics functions, desktop publishers started getting better text editing tools and the overlap between the various applications has been very significant for decades.

Despite this, each program uses a different internal model and you can’t just open a Word document in InDesign to make fine typographical adjustments, as they are not even supported in Word’s model.

So, the first part of Editable PDF is to standardize these models; instead of a Word text box, an InDesign text frame and a Scribus text frame, you have a standard Editable PDF text frame which is fully specified and documented, just like PDF is today.

The second key component of Editable PDF is its robust typography.

As I mentioned above, technical issues between platforms can cause text to reflow differently, causing major changes in the layout. PDF, on the other hand, stores the position of each character and each line break explicitly, making it possible to average out any differences in character width across the whole line when editing the text on a different platform.

Furthermore, if a font is completely missing, PDF has a mechanism to synthesize a replacement font with exactly the same metrics, which enables non-destructive editing. This means that even documents using licensed fonts, such as most corporate fonts, can be shared with collaborators outside of the organization; they will simply see a different font, but everything will still fit the same way as before.

Q: How would Editable PDF relate to the PDFs we use now?

Would it be right to consider this an extension of the PDF standard? A variation?

How well could or should existing software read these Editable PDF documents?

A: PDF already has several sub-formats, the closest being PDF/UA, which includes structure for accessibility purposes. But the road to becoming an official standard recognized by ISO is a long one and we’re right at the beginning.

In the meantime, variation might indeed be a better term; Editable PDFs are valid PDFs that contain additional information that can be used or ignored by the programs used to open them.

In practice, this means that an Editable PDF can be opened (and e.g. printed) by all existing PDF applications and libraries, and the installed base of PDF viewers such as Acrobat and FoxIt is really huge. This currently puts Editable PDF at a huge advantage compared to Web-based printable document formats, although it's not difficult to imagine how both worlds might converge in the near future.

Q: You’re currently looking for contributors. Who are you most keen to see get involved? Is this open only to individuals, or can organisations take part too?

A: Absolutely! The project is open to anyone, individual or organisation, large or small, who shares its goals and sees the potential of a visually robust document format that can be edited in an application-agnostic, standardized way.

With the right tools built around an open ecosystem, the potential business impact could be huge, and I would therefore particularly like to see the involvement of not only open source contributors, but also commercial software vendors who see the benefits of developing products for an open market.

Unfortunately, the barrier to entry for document editors is huge and, in fact, only Google Docs has been able to make a significant mark on the landscape in the last decade or so. But whereas they have addressed the online collaboration problem, they have ignored all the other limitations that the current products and file formats have.

Given their reputation as a technology innovator, I am disappointed that they haven't done more in this space. They certainly have enough clout to really make a difference.

Q: One of the big barriers to the wider use of Linux as a desktop operating system is that so many businesses feel they need Microsoft Word.

So it intrigued me to hear a dude who lives and breathes documents show up and describe Word as a vortex of time and productivity - that it’s time for something much better.

To what extent should businesses really be that happy with Word, LibreOffice, Google Docs and so on?

Is there anything these applications do relatively well? What are they really bad at?

A: Well, if you're doing something very simple, like writing a letter, there are few complaints that can be made about a word processor such as Word. But as soon as you start working on something more complicated, perhaps adding some image or charts or adding longer passages of structured text, things start to get unnecessarily difficult.

Perhaps you would like to edit the text in Word, but make fine adjustments to the layout in InDesign? It's very difficult to switch between different programs today.

Editable PDF, a universally editable format, will let you do exactly that.

But perhaps Editable PDF’s biggest benefit is its portability. As it's based on PDF, your document will display exactly the same way if you send it to someone else. But, unlike a normal PDF, you will be able to edit it. This is not possible today.

“We’re talking all the time about self driving cars and intelligent robots.

But we've ignored the most banal of issues: the documents that practically all businesses are running on.

A truly portable, editable document format is still not here yet.”

Q: Businesses and other organisations might feel they have a bit invested in existing document standards: design and layout work, templates, macros and automation, systems and processes, and so on.

What might it mean for these sunk costs to transition to a new standard?

A: Editable PDF is a document format that can be used to represent pretty much any 2D content from any application, just like PDF today. Once it catches on, vendors of existing apps will start to support it and little will need to be changed from the user’s side in terms of workflows.

Certainly the cost involved in training users every time a newer version of Word etc. is released is much higher.

Once a critical mass is using Editable PDF, users will then be able to simplify many workflows due to its universal nature.

Q: What have been your goals for Editable PDF in 2019?

A: My main goal for 2019 has been to spread the word and start drafting a model for the format, as well as developing the underlying framework beyond the initial prototype.

Once this is ready, I am hoping to get valuable feedback from the community, as this is a collaborative project after all, where all participants should benefit mutually.

Up to now, the response has been modest outside of academic circles and the response to my talk at Libre Graphics in Saarbrücken gave me the impression that open source projects such as Scribus would be unwilling to adapt their internal model to be inline with Editable PDF. I see this as being very short-sighted, as having their own non-standard file format is the main reason hampering the uptake of such programs in the first place.

Q: For anyone who's interested in your work, what's the best way they can follow along with it?

A: For the time being, the website is the best way to be kept up to date, and any news will be posted there.

I am also planning a blog and mailing list in due course.

Of course, if anyone is interested in collaboration or even just a chat on the topic, please just drop me an email.

Wow! Thank you so very much for that Dr. Hassan.

If you have any thoughts or questions, please feel welcome to leave a comment below.