Text Handling in libipuz
Text Handling in libipuz
The ipuz spec state that HTML is accepted for a set of puzzle fields. It doesn’t specify which HTML tags are valid and instead leaves that up to the client to implement. It does, however, suggest that entities are used to encode special characters.
In order to make this more useful by GLib-based applications, libipuz
does a best-effort attempt at parsing html-encoded strings, and
converting them to
PangoMarkup
. It has
the following semantics:
- All API calls that accept and output text expect valid UTF-8.
- Some API calls specify that they accept or output marked up strings
(such as
ipuz_puzzle_set_title()
). For these, the text passed in should be validPangoMarkup
, or plain text. - When loading from an .ipuz file, HTML text is converted to
PangoMarkup
. Common tags (such as <span>, <b> or <i>) are preserved. All other HTML tags are silently discarded. - Wherever appropriate for
PangoMarkup
, Entities are converted to unicode characters. <br> tags are converted to newlines. - We use
GMarkup
to parse the text. Consequentially, unbalanced tags will be rejected. For instance, <br> must be followed by a </br> or must be self-closed (eg. <br />). - If
GMarkup
can’t parse a string, then the result will be escaped and passed in verbatim. This is rarely the right behavior.