Struct
IpuzCharset
Description [src]
struct IpuzCharset {
/* No available fields */
}
An opaque, immutable data structure that stores an ordered count of unicode characters.
Charsets are surprisingly versatile. Fundamentally, they have a
unique mapping between a gunichar
and a guint
. They can be used
to keep track of the number of unicode characters in a puzzle or to
represent a set of valid characters.
They are constructed from an IpuzCharsetBuilder
, or a
simple one can be created via ipuz_charset_deserialize()
. A
common case has a list of characters with a count of one to be used
to filter text — such as an alphabet.
Charsets have designed to be used in areas that are performance critical. As a result, tradeoffs have been made to keep them as fast as possible.
Examples:
An example of creating a charset from an existing string:
g_autoptr (IpuzCharset) charset = NULL;
charset = ipuz_charset_deserialize ("ABCDEEE");
// Show that charset contains three 'E's
g_assert_cmpint (ipuz_charset_get_char_count (charset, g_utf8_get_char ("E")),
==,
3);
A second example of creating an alphabet filter:
IpuzCharsetBuilder *builder;
g_autoptr (IpuzCharset) charset = NULL;
g_autoptr (GString) filtered = NULL;
builder = ipuz_charset_builder_new_for_language ("en");
// builder is consumed with this call
charset = ipuz_charset_builder_build (builder);
g_assert_true (ipuz_charset_check_text (charset, "ENGLISH"));
g_assert_false (ipuz_charset_check_text (charset, "ESPAÑOL!"));
// Filter string to only include english alphabet characters
filtered = g_string_new (NULL);
for (gchar *p = "ESPAÑOL!"; p[0]; p = g_utf8_next_char (p))
{
gunichar c = g_utf8_get_char (p);
if (ipuz_charset_get_char_count (charset, c) > 0)
g_string_append_unichar (filtered, c);
}
// Make sure characters are filtered out
g_assert_cmpstr (filtered->str, ==, "ESPAOL");
Iteration
To iterate through a charset, one can do:
for (guint i = 0; i < ipuz_charset_get_n_chars (charset); i++)
{
IpuzCharsetValue value;
ipuz_charset_get_value (charset, i, &value);
// do something with value
}
Limitations
Like the rest of libipuz, the charset operates on unicode characters rather than clusters. This means that glyphs with multiple code-points can’t be stored in a charset.
Instance methods
ipuz_charset_check_text
Checks to see if all the characters in text
are contained within
self
. This can be used to quickly assertain if a string is valid
to be used within a puzzle.
ipuz_charset_get_n_chars
Returns the number of different types of characters stored in
self
. This is a constant-time operation.
ipuz_charset_get_value
Finds the value of a self
at the given index. On success, TRUE
will be returned and value
will be filled in with both the
character and its count.
ipuz_charset_serialize
Concatenates all the unique characters stored in self
in the order
in which they would be returned by ipuz_charset_get_char_index()
.