NeoMutt  2022-04-29-249-gaae397
Teaching an old dog new tricks
DOXYGEN
UTF-7 Manipulation

Convert strings to/from utf7/utf8.

Modified UTF-7 is described in RFC 3501 section 5.1.3. Regular UTF-7 is decribed in RFC 2152.

In modified UTF-7:

  • printable ascii 0x20-0x25 and 0x27-0x7e represents itself.
  • "&" (0x26) is represented by the two-octet sequence "&-"
  • other values use the UTF-16 representation of the code point and encode it using a modified version of BASE64.
  • BASE64 mode is enabled by "&" and disabled by "-".

Note that UTF-16:

  • Represents U+0000-U+D7FF and U+E000-U+FFFF directly as the binary 2-byte value.
  • Reserves U+D800-U+DFFF (so they aren't valid code points.)
  • Values above U+FFFF need to be encoded using a surrogate pair of two 16-bit values:
    • subtract 0x10000 from the code point
    • take the top 10 bits and add 0xd800 to get the first (high) pair.
    • take the bottom 10 bits and add 0xdc00 for the second (low) pair.

Data

Data Description Links
B64Chars Characters of the Base64 encoding
Index64u Lookup table for Base64 encoding/decoding

Functions

Function Description Links
imap_utf_decode() Decode email from UTF-8 to local charset
imap_utf_encode() Encode email from local charset to UTF-8
utf7_to_utf8() Convert data from RFC2060's UTF-7 to UTF-8
utf8_to_utf7() Convert data from UTF-8 to RFC2060's UTF-7