emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[O] *markup*, /markup/ and _markup_ true semantics [Was: Re: Ox-html: Re


From: Garreau\, Alexandre
Subject: [O] *markup*, /markup/ and _markup_ true semantics [Was: Re: Ox-html: Replace <b> with <strong> and <i> with <em>]
Date: Fri, 26 Oct 2018 07:24:00 +0200
User-agent: Gnus (5.13), GNU Emacs 25.1.1 (i686-pc-linux-gnu, GTK+ Version 3.22.11) of 2017-09-15, modified by Debian

Sorry, just found out that interesting (to me) thread I shouldn’t have
let go:

On 2018-10-25 at 08:00, Tim Cross wrote:
> Kaushal Modi <address@hidden> writes:
>> […]
>> - b and i are not deprecated
>> - b and strong are both valid but their use depends on the writer's
>> context (but Org mode has just one mark for either "*")
>> - i and em are both valid but their use depends on the writer's
>> context (but Org mode has just one mark for either "/").
>>
>> […]
>> 
>> From "em" docs[em], in the NOTE section there:
>>> The em element isn’t a generic "italics" element. Sometimes, text
>>> is intended to stand out from the rest of the paragraph, as if it
>>> was in a different mood or voice. For this, the i element is more
>>> appropriate.
>>
>> […]
>>
>> I guess there's no need to change what "*" and "/" do right now in
>> ox-html, as there doesn't seem "one right way" to do things here.
>>
>> And folks strongly wanting to use <strong> and <em> for bold and
>> italic can customize org-html-text-markup-alist.
>>
>> HTML experts, please chime in.
>
> I'll start by stating I'm definitely not an HTML expert.

I don’t exactely know what an expert is, at least I’m not a
professional, but I have passed some time figuring out various HTML
specs semantic meaning.

More especially, I’ve a big interest in semantics and typography, and
past many time on my now deleted-crecreated-then-lost github account,
and mail, to convince people to switch to more semantical markuping (oh,
and to use complex CSS selectors rather than classes, and stop using
<div> and <span> at all) and better typography (such as curly quotes,
simple quotes inside quotes, and many things specific to french).

> The problem with b and i is that they specify how rather than what and
> don't always make sense for all possible media types. For example, what
> does 'bold' or 'italic' mean for a screen reader?

Italic means often pronounced with a different pitch afair.  Bold
probably means prounced differently too but I don’t know how this is
pronounced iirc.  I need to recheck with orca and firefox addons (I’ll
do for a next mail).  That might be change accross screenreaders so I
might have to find some friend having a windows computer with NVDA, JAWS
or some other non-free program to either ask or check.

I believe the most correct handling for screen readers would be to use
the appropriate language from the attribute lang or xml:lang of <i> tag,
otherwise slower and slightly higher pitch, and for <b> the exact same
higher pitch as caps, without changing speed, plus adding it to an
easily reachable “keyword-list”, just as <dfn>.

Fyi : both italic, bold, and underline, have been invented in typography
as special ways of *purposely* making text harder to read.  Both the
intent and result is that the reader taking more time to read something
in italic, for instance, will memorize it better, and have more free
time to think about it, hence increasing the importance of this
something.

In the following “from far” means when you look at the global document
and are not focusing reading a particular part of it.  It doesn’t mean
you are at a far distance and you can still read it, like it is for
uppercase.

Italic is the best way, the most readable, as it’s only seen when
reading, near the text, but not “from far” and doesn’t break structure,
flowing, or “typographic grey” (“gris typographique”, I’m not aware of
the english term).  It is hence commonly used for emphasis (best usage:
if ever it gets long, it gets hard to read, but that reflects the fact
original meaning was hard to grasp or hear or say originally), citation
of artistical work names (such as books: conventional usage, but still
okay, as these are mostly short anyway), and quotations (discouraged
usage as they can get long (and thus unreadable) and quote marks cover
this, *not* to be used *along* with them, never, as it is terribly
redundant and almost no serious professional printer do that).

Bold is sometimes harder to read, and sometimes, if not too bold,
easier, however it’s really easy to “notice” its text from looking afar:
therefore it’s normally *exclusively* recommanded for text structures,
whose *role* is to purposely cut in parts the text, that is: *outlines*.
However, in an attempt of pseudo-backward compatibility and “but look
everybody was okay since the beginning”, by the W3C, another usage for
bold than in outlines has been found: keywords.  These are *meant* to be
seen from far, are usually small (one word), and yet wouldn’t alter text
structure, and might not be candidate for <dfn> (however most time they
should).

Underline is to be banned from everywhere, theorically.  It is an
especially simple and awful way of making text unreadable: it cuts the
legs of non-zero-ascent letters (making as hard to read as italic) *and*
is easy to lookup from far, yet you can notice the underline without
having the word easily and quickly grasped when seen from far, like
bold.  Iirc it has been invented for typewriters because italic wasn’t
available, for which it is the poorest candidate ever.  It is also used
in manuscript text, as people actually trying to manually write in
italics or bold are nowadays few and others are often unable to do so.
Most time I saw it used manuscriptly to anotate and highlight text.
Conventions has been developed around this: in typewriter as well as
manuscript text, you normally *only* use it for artistic works names
(instead of italic), and blue hyperlinks.  It is sad it has developed as
a such important convention but it is done, clear, and well established.

The W3C meaning of “added text” seems quite somewhat artificial to me,
as it is not more conventional to use it for “added changes” than any
other typographic convention.  However it is necessarily *one of these*,
as it is commonly used to highlight and anotate text (however the <mark>
tag is here for that, in HTML).

> I do believe we should move away from b/i to strong/em as I think these
> are the correct semantic tags to use and are generally what is
> preferred. This means they are also likely to already have appropriate
> 'styling' in many 'canned' styles and valid consistent interpretations
> for different media types. 

This is unsemantic (and is giving org markup a presentational rather
than semantic role, so I strongly oppose this) and could break true
accessibility.  I’d say ideally what we should have is more markup to be
compatible with HTML, as recently, with XHTML1, 2 and HTML5, it has
become one of the richer and most clearly defined markup language
available.  However as org, comparably to markdown and rst, is trying to
achieve some compatibility with classical clear-text markuping, such as
in email, and from what semantics I detected, I’d say the following :
– tag “*” with <em>, maybe find cases where “<b>” might be appropriate
  (for keywords, typically): I’d say an interesting experiment would,
  for some given languages (such as english, to begin) detect if an
  article (“the”, “a”, “an”…) is part of the markup: then it’s not a
  keyword (hence <em>), if it’s *preceding* the markup, then more
  probably it is a markup (but not necessarily) ;
— tag “/” with <cite>, as this match the most accurate and commonly
  meaning of “/”, “_” might be appropriate as well, but may be redundant
  (so a safe (potentially usable as buffer-local) custom var would do
  better).  However there are some cases where “/” would be more
  appropriate as <i> (I’d say the vast majority of occurences are words
  from foreign languages, other are most often incorrect and abusive
  usage of “/”);
— tag “_” as either <cite>, if correct var is of the correct value, or
  <ins>, *only* if near “+” markup.  Otherwise, as org only use “[]” for
  hyperlinks, I don’t know.

Note that, indeed, “<strong>” has no usage.  If it was up to me it
should be banned.  Maybe its most accurate usage would be for upcase
urgent emphasis-text: *URGENT: READ THIS NOW OR YOU WILL DIE* (you might
use <strong> if absolutely wanting to, for upcase emphasis text, or
emphasis text containing “urgent:” or “important:”, and differently
localized versions (format-level linguistic imperialism, bla bla: note
for the same very reason this would work as is for french, but me and
many people would funnily feel more reassured, respected or whatever if
they were blessed by being in a list whose car is "fr")).

> I don't think this is something that is urgent, but it is the
> direction we should go. The only real reason for sooner rather than
> later is that we can probably simplify some of the exporters and
> ensure any new exporters are correct and won't need to be change
> retrospectively.

This has to be a semantics work to be reported on *all* semantic
backends.  As there are “accessibility” workaround for almost all
formats (even PDF, which is understandable as it got important and
widely used, while normally meant only for printing, hence display, not
semantics (but you know, these days, you can put javascript in these…)),
this may mean “every backend”.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]