Struct

IpuzCharset

Description [src]

struct IpuzCharset {
  /* No available fields */
}

An opaque, immutable data structure that stores an ordered count of unicode characters.

Charsets are surprisingly versatile. Fundamentally, they have a unique mapping between a gunichar and a guint. They can be used to keep track of the number of unicode characters in a puzzle or to represent a set of valid characters.

They are constructed from an IpuzCharsetBuilder, or a simple one can be created via ipuz_charset_deserialize(). A common case has a list of characters with a count of one to be used to filter text — such as an alphabet.

Charsets have designed to be used in areas that are performance critical. As a result, tradeoffs have been made to keep them as fast as possible.

Examples:

An example of creating a charset from an existing string:

g_autoptr (IpuzCharset) charset = NULL;

charset = ipuz_charset_deserialize ("ABCDEEE");

// Show that charset contains three 'E's
g_assert_cmpint (ipuz_charset_get_char_count (charset, g_utf8_get_char ("E")),
                 ==,
                 3);

A second example of creating an alphabet filter:

IpuzCharsetBuilder *builder;
g_autoptr (IpuzCharset) charset = NULL;
g_autoptr (GString) filtered = NULL;

builder = ipuz_charset_builder_new_for_language ("en");

// builder is consumed with this call
charset = ipuz_charset_builder_build (builder);

g_assert_true (ipuz_charset_check_text (charset, "ENGLISH"));
g_assert_false (ipuz_charset_check_text (charset, "ESPAÑOL!"));

// Filter string to only include english alphabet characters
filtered = g_string_new (NULL);
for (gchar *p = "ESPAÑOL!"; p[0]; p = g_utf8_next_char (p))
  {
    gunichar c = g_utf8_get_char (p);
    if (ipuz_charset_get_char_count (charset, c) > 0)
      g_string_append_unichar (filtered, c);
  }

// Make sure characters are filtered out
g_assert_cmpstr (filtered->str, ==, "ESPAOL");

Iteration

To iterate through a charset, one can do:

for (guint i = 0; i < ipuz_charset_get_n_chars (charset); i++)
  {
    IpuzCharsetValue value;

    ipuz_charset_get_value (charset, i, &value);
    // do something with value
  }

Limitations

Like the rest of libipuz, the charset operates on unicode characters rather than clusters. This means that glyphs with multiple code-points can’t be stored in a charset.

Functions

ipuz_charset_deserialize

Creates a new character set by deserializing from a string.

Instance methods

ipuz_charset_check_text

Checks to see if all the characters in text are contained within self. This can be used to quickly assertain if a string is valid to be used within a puzzle.

ipuz_charset_equal

Returns TRUE if charset1 and charset2 have exactly the same contents.

ipuz_charset_get_char_count

Returns the count of c. If c is not in self, then 0 is returned.

ipuz_charset_get_char_index

Returns the index of c.

ipuz_charset_get_n_chars

Returns the number of different types of characters stored in self. This is a constant-time operation.

ipuz_charset_get_total_count

Returns the cummulative count of all the characters stored in self.

ipuz_charset_get_value

Finds the value of a self at the given index. On success, TRUE will be returned and value will be filled in with both the character and its count.

ipuz_charset_ref

Refs the character set.

ipuz_charset_serialize

Concatenates all the unique characters stored in self in the order in which they would be returned by ipuz_charset_get_char_index().

ipuz_charset_unref

Unrefs a charset, which will be freed when the reference count becomes 0.