emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] minimal testing setup for pdf export?


From: Tim Cross
Subject: Re: [O] minimal testing setup for pdf export?
Date: Sat, 31 Aug 2019 11:46:38 +1000
User-agent: mu4e 1.3.4; emacs 27.0.50

To be a little precise, Org mode PDF documents created using the default
Latex classes are not going to meet minimal accessibility standards. The
extent to which they can be accessed using accessibility software will
depend largely on the structure of the underlying PDF. I have not
investigated other workflows for generating PDFs from org (for example,
what happens if you go to some other intermediate format, like HTML or
markdown etc) and then to PDF using a different tool to generate the
final PDF. Likewise, I don't know if some of the TeX pdf generators are
better than others (this is partly why it gets complicated - there are
multiple workflows to generate PDFs from Latex). There are people
working on additional latex packages to address this accessibility
requirement. However, either they are only at experimental status or
require significant configuration and setup or require the author to
manually add additional data, making them inappropriate for org-mode.

With PDFs, the level of accessibility does depend a lot on the structure
of the underlying document. Even PDFs without full tagging can be
reasonably accessible if the structure of the data in the PDF is
straight-forward i.e. not lots of tables, multi-column, lots
of footnotes and internal references etc. If the data flow in the
document is reasonably 'linear', then it isn't too bad. If the document
has lots of embedded postscript or any image like data, that will not be
accessible and will not be tagged adequately. Likewise, I've found stuff
generated in math mode is typically inaccessible.

The various text extraction tools, like pdftotext are able to extract
the text. However, because it cannot determine the structure with any
accuracy, it can tend to be somewhat jumbled and have a bit of
'garbage'. Again, how good/bad this is depends on the underlying PDF
structure.

You might find the following links useful -

https://www.tug.org/twg/accessibility/

https://tug.org/pipermail/accessibility/2016q4/000005.html

You might be able to improve the accessibility of PDFs generated from
Latex by adding some of the (mostly experimental) additional packages to
your Org mode setup. However, this will probably have some unfortunate
side effects or corner cases (which is why I don't just recommend adding
these packages as defaults in org itself).

There has been an item on my todo list to experiment with this stuff for
a long time, but I just never seem to get that far down the list. If you
do find some configurations which help make the PDFs more accessible, I
would be happy to try adding them to my setup for further testing. We
may find some additional packages which improve the situation and don't
have unacceptable impact on general org performance and stability. I'm
confident that if we can demonstrate this, having these additions added
to org-mode defaults would be possible.

Tim

Jude DaShiell <address@hidden> writes:

> Okay, orgmode pdf files will be inaccessible for the foreseeable future.
>  Has anyone had any luck extracting text however mangled from one of
> these with pdftotext or similar tools?  If that's not possible that will
> be another good thing to know.
>
> On Sat, 31 Aug 2019, Tim Cross wrote:
>
>> Date: Fri, 30 Aug 2019 19:31:32
>> From: Tim Cross <address@hidden>
>> To: address@hidden
>> Cc: Nick Dokos <address@hidden>
>> Subject: Re: [O] minimal testing setup for pdf export?
>>
>>
>> I think the main thing which needs to be in the PDF is structure
>> 'tagging'. Unfortunately, making truly accessible PDFs is the one area
>> I've found where the 'TeX suite is weak. I was tracking some discussions
>> about this on  the various TeX and Latex lists and it seems that to add
>> the necessary information needed to create accessible PDFs requires a
>> major redesign of TeX internals.
>>
>> It has been a while since I looked at this, but I do believe there are
>> some add-on latex packages which can help a bit, but creating PDFs which
>> meet minimal accessibility requirement tests is currently not possible.
>>
>> IIRC the speech-disabling feature is not part of the PDF spec. This is
>> somethinhg added by Adobe (along with other DRM support). This is no
>> 'switch' so to speak in plain PDF documents as the PDF spec predates
>> considerations like TTS or even accessibility.
>>
>>
>> Jude DaShiell <address@hidden> writes:
>>
>> > most of the books sold on google play books are speech-disabled by
>> > publishers.  The adobe accessibility site has speech-enabled
>> > accessibility examples.  I think it's a matter of a single control that
>> > is either enabled or disabled.  Oh, the IRS has speech-enabled pdf tax
>> > forms anyone can download.  I nearly forgot about that one.  The 1099R
>> > form is a short one so it ought to be pretty quick to find the setting
>> > in one of those forms.
>> >
>> > On Fri, 30 Aug 2019, Nick Dokos wrote:
>> >
>> >> Date: Fri, 30 Aug 2019 16:07:49
>> >> From: Nick Dokos <address@hidden>
>> >> To: address@hidden
>> >> Subject: Re: [O] minimal testing setup for pdf export?
>> >>
>> >> Jude DaShiell <address@hidden> writes:
>> >>
>> >> > It would be helpful if when pdf get exported from orgmode they have
>> >> > speech enabled by default.
>> >> >
>> >>
>> >> Not sure that org mode can do anything about, since it's LaTeX that 
>> >> produces
>> >> the PDF. That said, I'm not sure what needs to be done: what's the 
>> >> difference
>> >> between a speech-enabled PDF and a non-speech-enabled one?
>> >>
>> >>
>>
>>
>>


--
Tim Cross



reply via email to

[Prev in Thread] Current Thread [Next in Thread]