NeoMutt  2020-04-24
Teaching an old dog new tricks
DOXYGEN
charset.h File Reference

Conversion between different character encodings. More...

#include <iconv.h>
#include <stdbool.h>
#include <stdio.h>
#include <wchar.h>
+ Include dependency graph for charset.h:
+ This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  FgetConv
 Cursor for converting a file's encoding. More...
 
struct  FgetConvNot
 A dummy converter. More...
 
struct  MimeNames
 MIME name lookup entry. More...
 

Macros

#define MUTT_ICONV_HOOK_FROM   1
 apply charset-hooks to fromcode More...
 
#define mutt_ch_is_utf8(str)   mutt_ch_chscmp(str, "utf-8")
 
#define mutt_ch_is_us_ascii(str)   mutt_ch_chscmp(str, "us-ascii")
 

Enumerations

enum  LookupType { MUTT_LOOKUP_CHARSET, MUTT_LOOKUP_ICONV }
 Types of character set lookups. More...
 

Functions

void mutt_ch_canonical_charset (char *buf, size_t buflen, const char *name)
 Canonicalise the charset of a string. More...
 
const char * mutt_ch_charset_lookup (const char *chs)
 Look for a replacement character set. More...
 
int mutt_ch_check (const char *s, size_t slen, const char *from, const char *to)
 Check whether a string can be converted between encodings. More...
 
bool mutt_ch_check_charset (const char *cs, bool strict)
 Does iconv understand a character set? More...
 
char * mutt_ch_choose (const char *fromcode, const char *charsets, const char *u, size_t ulen, char **d, size_t *dlen)
 Figure the best charset to encode a string. More...
 
bool mutt_ch_chscmp (const char *cs1, const char *cs2)
 Are the names of two character sets equivalent? More...
 
int mutt_ch_convert_nonmime_string (char **ps)
 Try to convert a string using a list of character sets. More...
 
int mutt_ch_convert_string (char **ps, const char *from, const char *to, int flags)
 Convert a string between encodings. More...
 
int mutt_ch_fgetconv (struct FgetConv *fc)
 Convert a file's character set. More...
 
void mutt_ch_fgetconv_close (struct FgetConv **fc)
 Close an fgetconv handle. More...
 
struct FgetConvmutt_ch_fgetconv_open (FILE *fp, const char *from, const char *to, int flags)
 Prepare a file for charset conversion. More...
 
char * mutt_ch_fgetconvs (char *buf, size_t buflen, struct FgetConv *fc)
 Convert a file's charset into a string buffer. More...
 
char * mutt_ch_get_default_charset (void)
 Get the default character set. More...
 
char * mutt_ch_get_langinfo_charset (void)
 Get the user's choice of character set. More...
 
size_t mutt_ch_iconv (iconv_t cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft, const char **inrepls, const char *outrepl, int *iconverrno)
 Change the encoding of a string. More...
 
const char * mutt_ch_iconv_lookup (const char *chs)
 Look for a replacement character set. More...
 
iconv_t mutt_ch_iconv_open (const char *tocode, const char *fromcode, int flags)
 Set up iconv for conversions. More...
 
bool mutt_ch_lookup_add (enum LookupType type, const char *pat, const char *replace, struct Buffer *err)
 Add a new character set lookup. More...
 
void mutt_ch_lookup_remove (void)
 Remove all the character set lookups. More...
 
void mutt_ch_set_charset (const char *charset)
 Update the records for a new character set. More...
 

Variables

char * C_AssumedCharset
 Config: If a message is missing a character set, assume this character set. More...
 
char * C_Charset
 Config: Default character set for displaying text on screen. More...
 
bool CharsetIsUtf8
 Is the user's current character set utf-8? More...
 
wchar_t ReplacementChar
 When a Unicode character can't be displayed, use this instead. More...
 
const struct MimeNames PreferredMimeNames []
 Lookup table of preferred charsets. More...
 

Detailed Description

Conversion between different character encodings.

Authors
  • Thomas Roessler

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Definition in file charset.h.

Macro Definition Documentation

◆ MUTT_ICONV_HOOK_FROM

#define MUTT_ICONV_HOOK_FROM   1

apply charset-hooks to fromcode

Definition at line 81 of file charset.h.

◆ mutt_ch_is_utf8

#define mutt_ch_is_utf8 (   str)    mutt_ch_chscmp(str, "utf-8")

Definition at line 106 of file charset.h.

◆ mutt_ch_is_us_ascii

#define mutt_ch_is_us_ascii (   str)    mutt_ch_chscmp(str, "us-ascii")

Definition at line 107 of file charset.h.

Enumeration Type Documentation

◆ LookupType

enum LookupType

Types of character set lookups.

Enumerator
MUTT_LOOKUP_CHARSET 

Alias for another character set.

MUTT_LOOKUP_ICONV 

Character set conversion.

Definition at line 75 of file charset.h.

76 {
79 };
Character set conversion.
Definition: charset.h:78
Alias for another character set.
Definition: charset.h:77

Function Documentation

◆ mutt_ch_canonical_charset()

void mutt_ch_canonical_charset ( char *  buf,
size_t  buflen,
const char *  name 
)

Canonicalise the charset of a string.

Parameters
bufBuffer for canonical character set name
buflenLength of buffer
nameName to be canonicalised

This first ties off any charset extension such as "//TRANSLIT", canonicalizes the charset and re-adds the extension

Definition at line 344 of file charset.c.

345 {
346  if (!buf || !name)
347  return;
348 
349  char in[1024], scratch[1024];
350 
351  mutt_str_strfcpy(in, name, sizeof(in));
352  char *ext = strchr(in, '/');
353  if (ext)
354  *ext++ = '\0';
355 
356  if ((mutt_str_strcasecmp(in, "utf-8") == 0) ||
357  (mutt_str_strcasecmp(in, "utf8") == 0))
358  {
359  mutt_str_strfcpy(buf, "utf-8", buflen);
360  goto out;
361  }
362 
363  /* catch some common iso-8859-something misspellings */
364  size_t plen;
365  if ((plen = mutt_str_startswith(in, "8859", CASE_IGNORE)) && (in[plen] != '-'))
366  snprintf(scratch, sizeof(scratch), "iso-8859-%s", in + plen);
367  else if ((plen = mutt_str_startswith(in, "8859-", CASE_IGNORE)))
368  snprintf(scratch, sizeof(scratch), "iso-8859-%s", in + plen);
369  else if ((plen = mutt_str_startswith(in, "iso8859", CASE_IGNORE)) && (in[plen] != '-'))
370  snprintf(scratch, sizeof(scratch), "iso_8859-%s", in + plen);
371  else if ((plen = mutt_str_startswith(in, "iso8859-", CASE_IGNORE)))
372  snprintf(scratch, sizeof(scratch), "iso_8859-%s", in + plen);
373  else
374  mutt_str_strfcpy(scratch, in, sizeof(scratch));
375 
376  for (size_t i = 0; PreferredMimeNames[i].key; i++)
377  {
378  if (mutt_str_strcasecmp(scratch, PreferredMimeNames[i].key) == 0)
379  {
380  mutt_str_strfcpy(buf, PreferredMimeNames[i].pref, buflen);
381  goto out;
382  }
383  }
384 
385  mutt_str_strfcpy(buf, scratch, buflen);
386 
387  /* for cosmetics' sake, transform to lowercase. */
388  for (char *p = buf; *p; p++)
389  *p = tolower(*p);
390 
391 out:
392  if (ext && *ext)
393  {
394  mutt_str_strcat(buf, buflen, "/");
395  mutt_str_strcat(buf, buflen, ext);
396  }
397 }
const char * key
Definition: charset.h:68
static size_t plen
Length of cached packet.
Definition: pgppacket.c:39
const char * name
Definition: pgpmicalg.c:46
const struct MimeNames PreferredMimeNames[]
Lookup table of preferred charsets.
Definition: charset.c:92
size_t mutt_str_strfcpy(char *dest, const char *src, size_t dsize)
Copy a string into a buffer (guaranteeing NUL-termination)
Definition: string.c:776
Ignore case when comparing strings.
Definition: string2.h:68
char * p
Definition: charset.h:47
size_t mutt_str_startswith(const char *str, const char *prefix, enum CaseSensitivity cs)
Check whether a string starts with a prefix.
Definition: string.c:168
char * mutt_str_strcat(char *buf, size_t buflen, const char *s)
Concatenate two strings.
Definition: string.c:395
int mutt_str_strcasecmp(const char *a, const char *b)
Compare two strings ignoring case, safely.
Definition: string.c:654
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_charset_lookup()

const char* mutt_ch_charset_lookup ( const char *  chs)

Look for a replacement character set.

Parameters
chsCharacter set to lookup
Return values
ptrReplacement character set (if a 'charset-hook' matches)
NULLNo matching hook

Look through all the 'charset-hook's. If one matches return the replacement character set.

Definition at line 531 of file charset.c.

532 {
534 }
static const char * lookup_charset(enum LookupType type, const char *cs)
Look for a preferred character set name.
Definition: charset.c:274
static char * chs
Definition: gnupgparse.c:72
Alias for another character set.
Definition: charset.h:77
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_check()

int mutt_ch_check ( const char *  s,
size_t  slen,
const char *  from,
const char *  to 
)

Check whether a string can be converted between encodings.

Parameters
[in]sString to check
[in]slenLength of the string to check
[in]fromCurrent character set
[in]toTarget character set
Return values
0Success
-1Error in iconv_open()
>0Errno as set by iconv()

Definition at line 710 of file charset.c.

711 {
712  if (!s || !from || !to)
713  return -1;
714 
715  int rc = 0;
716  iconv_t cd = mutt_ch_iconv_open(to, from, 0);
717  if (cd == (iconv_t) -1)
718  return -1;
719 
720  size_t outlen = MB_LEN_MAX * slen;
721  char *out = mutt_mem_malloc(outlen + 1);
722  char *saved_out = out;
723 
724  const size_t convlen =
725  iconv(cd, (ICONV_CONST char **) &s, &slen, &out, (size_t *) &outlen);
726  if (convlen == -1)
727  rc = errno;
728 
729  FREE(&saved_out);
730  iconv_close(cd);
731  return rc;
732 }
iconv_t cd
Definition: charset.h:44
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
iconv_t mutt_ch_iconv_open(const char *tocode, const char *fromcode, int flags)
Set up iconv for conversions.
Definition: charset.c:558
#define FREE(x)
Definition: memory.h:40
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_check_charset()

bool mutt_ch_check_charset ( const char *  cs,
bool  strict 
)

Does iconv understand a character set?

Parameters
csCharacter set to check
strictCheck strictly by using iconv
Return values
trueCharacter set is valid

If strict is false, then finding a matching character set in PreferredMimeNames will be enough. If strict is true, or the charset is not in PreferredMimeNames, then iconv() with be run.

Definition at line 811 of file charset.c.

812 {
813  if (!cs)
814  return false;
815 
816  if (mutt_ch_is_utf8(cs))
817  return true;
818 
819  if (!strict)
820  {
821  for (int i = 0; PreferredMimeNames[i].key; i++)
822  {
823  if ((mutt_str_strcasecmp(PreferredMimeNames[i].key, cs) == 0) ||
824  (mutt_str_strcasecmp(PreferredMimeNames[i].pref, cs) == 0))
825  {
826  return true;
827  }
828  }
829  }
830 
831  iconv_t cd = mutt_ch_iconv_open(cs, cs, 0);
832  if (cd != (iconv_t)(-1))
833  {
834  iconv_close(cd);
835  return true;
836  }
837 
838  return false;
839 }
#define mutt_ch_is_utf8(str)
Definition: charset.h:106
const char * key
Definition: charset.h:68
const struct MimeNames PreferredMimeNames[]
Lookup table of preferred charsets.
Definition: charset.c:92
iconv_t cd
Definition: charset.h:44
iconv_t mutt_ch_iconv_open(const char *tocode, const char *fromcode, int flags)
Set up iconv for conversions.
Definition: charset.c:558
int mutt_str_strcasecmp(const char *a, const char *b)
Compare two strings ignoring case, safely.
Definition: string.c:654
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_choose()

char* mutt_ch_choose ( const char *  fromcode,
const char *  charsets,
const char *  u,
size_t  ulen,
char **  d,
size_t *  dlen 
)

Figure the best charset to encode a string.

Parameters
[in]fromcodeOriginal charset of the string
[in]charsetsColon-separated list of potential charsets to use
[in]uString to encode
[in]ulenLength of the string to encode
[out]dIf not NULL, point it to the converted string
[out]dlenIf not NULL, point it to the length of the d string
Return values
ptrBest performing charset
NULLNone could be found

Definition at line 1029 of file charset.c.

1031 {
1032  if (!fromcode)
1033  return NULL;
1034 
1035  char *e = NULL, *tocode = NULL;
1036  size_t elen = 0, bestn = 0;
1037  const char *q = NULL;
1038 
1039  for (const char *p = charsets; p; p = q ? q + 1 : 0)
1040  {
1041  q = strchr(p, ':');
1042 
1043  size_t n = q ? q - p : strlen(p);
1044  if (n == 0)
1045  continue;
1046 
1047  char *t = mutt_mem_malloc(n + 1);
1048  memcpy(t, p, n);
1049  t[n] = '\0';
1050 
1051  char *s = mutt_str_substr_dup(u, u + ulen);
1052  const int rc = d ? mutt_ch_convert_string(&s, fromcode, t, 0) :
1053  mutt_ch_check(s, ulen, fromcode, t);
1054  if (rc)
1055  {
1056  FREE(&t);
1057  FREE(&s);
1058  continue;
1059  }
1060  size_t slen = mutt_str_strlen(s);
1061 
1062  if (!tocode || (n < bestn))
1063  {
1064  bestn = n;
1065  FREE(&tocode);
1066  tocode = t;
1067  if (d)
1068  {
1069  FREE(&e);
1070  e = s;
1071  }
1072  else
1073  FREE(&s);
1074  elen = slen;
1075  }
1076  else
1077  {
1078  FREE(&t);
1079  FREE(&s);
1080  }
1081  }
1082  if (tocode)
1083  {
1084  if (d)
1085  *d = e;
1086  if (dlen)
1087  *dlen = elen;
1088 
1089  char canonical_buf[1024];
1090  mutt_ch_canonical_charset(canonical_buf, sizeof(canonical_buf), tocode);
1091  mutt_str_replace(&tocode, canonical_buf);
1092  }
1093  return tocode;
1094 }
int mutt_ch_convert_string(char **ps, const char *from, const char *to, int flags)
Convert a string between encodings.
Definition: charset.c:747
size_t mutt_str_strlen(const char *a)
Calculate the length of a string, safely.
Definition: string.c:692
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:344
int mutt_ch_check(const char *s, size_t slen, const char *from, const char *to)
Check whether a string can be converted between encodings.
Definition: charset.c:710
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
char * p
Definition: charset.h:47
void mutt_str_replace(char **p, const char *s)
Replace one string with another.
Definition: string.c:455
int n
Definition: acutest.h:492
#define FREE(x)
Definition: memory.h:40
char * mutt_str_substr_dup(const char *begin, const char *end)
Duplicate a sub-string.
Definition: string.c:605
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_chscmp()

bool mutt_ch_chscmp ( const char *  cs1,
const char *  cs2 
)

Are the names of two character sets equivalent?

Parameters
cs1First character set
cs2Second character set
Return values
trueNames are equivalent
falseNames differ

Charsets may have extensions that mutt_ch_canonical_charset() leaves intact; we expect 'cs2' to originate from neomutt code, not user input (i.e. 'cs2' does not have any extension) we simply check if the shorter string is a prefix for the longer.

Definition at line 411 of file charset.c.

412 {
413  if (!cs1 || !cs2)
414  return false;
415 
416  char buf[256];
417 
418  mutt_ch_canonical_charset(buf, sizeof(buf), cs1);
419 
420  int len1 = mutt_str_strlen(buf);
421  int len2 = mutt_str_strlen(cs2);
422 
423  return mutt_str_strncasecmp(((len1 > len2) ? buf : cs2),
424  ((len1 > len2) ? cs2 : buf), MIN(len1, len2)) == 0;
425 }
#define MIN(a, b)
Definition: memory.h:31
size_t mutt_str_strlen(const char *a)
Calculate the length of a string, safely.
Definition: string.c:692
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:344
int mutt_str_strncasecmp(const char *a, const char *b, size_t l)
Compare two strings ignoring case (to a maximum), safely.
Definition: string.c:682
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_convert_nonmime_string()

int mutt_ch_convert_nonmime_string ( char **  ps)

Try to convert a string using a list of character sets.

Parameters
[in,out]psString to be converted
Return values
0Success
-1Error

Work through C_AssumedCharset looking for a character set conversion that works. Failing that, try mutt_ch_get_default_charset().

Definition at line 300 of file charset.c.

301 {
302  if (!ps)
303  return -1;
304 
305  const char *c1 = NULL;
306 
307  for (const char *c = C_AssumedCharset; c; c = c1 ? c1 + 1 : 0)
308  {
309  char *u = *ps;
310  size_t ulen = mutt_str_strlen(*ps);
311 
312  if (!u || !*u)
313  return 0;
314 
315  c1 = strchr(c, ':');
316  size_t n = c1 ? c1 - c : mutt_str_strlen(c);
317  if (n == 0)
318  return 0;
319  char *fromcode = mutt_mem_malloc(n + 1);
320  mutt_str_strfcpy(fromcode, c, n + 1);
321  char *s = mutt_str_substr_dup(u, u + ulen);
322  int m = mutt_ch_convert_string(&s, fromcode, C_Charset, 0);
323  FREE(&fromcode);
324  FREE(&s);
325  if (m == 0)
326  {
327  return 0;
328  }
329  }
332  return -1;
333 }
char * C_AssumedCharset
Config: If a message is missing a character set, assume this character set.
Definition: charset.c:52
int mutt_ch_convert_string(char **ps, const char *from, const char *to, int flags)
Convert a string between encodings.
Definition: charset.c:747
size_t mutt_str_strlen(const char *a)
Calculate the length of a string, safely.
Definition: string.c:692
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
size_t mutt_str_strfcpy(char *dest, const char *src, size_t dsize)
Copy a string into a buffer (guaranteeing NUL-termination)
Definition: string.c:776
int n
Definition: acutest.h:492
#define FREE(x)
Definition: memory.h:40
char * mutt_ch_get_default_charset(void)
Get the default character set.
Definition: charset.c:433
char * C_Charset
Config: Default character set for displaying text on screen.
Definition: charset.c:53
#define MUTT_ICONV_HOOK_FROM
apply charset-hooks to fromcode
Definition: charset.h:81
char * mutt_str_substr_dup(const char *begin, const char *end)
Duplicate a sub-string.
Definition: string.c:605
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_convert_string()

int mutt_ch_convert_string ( char **  ps,
const char *  from,
const char *  to,
int  flags 
)

Convert a string between encodings.

Parameters
[in,out]psString to convert
[in]fromCurrent character set
[in]toTarget character set
[in]flagsFlags, e.g. MUTT_ICONV_HOOK_FROM
Return values
0Success
-1Invalid arguments or failure to open an iconv channel
errnoFailure in iconv conversion

Parameter flags is given as-is to mutt_ch_iconv_open(). See there for its meaning and usage policy.

Definition at line 747 of file charset.c.

748 {
749  if (!ps)
750  return -1;
751 
752  char *s = *ps;
753 
754  if (!s || !*s)
755  return 0;
756 
757  if (!to || !from)
758  return -1;
759 
760  const char *repls[] = { "\357\277\275", "?", 0 };
761  int rc = 0;
762 
763  iconv_t cd = mutt_ch_iconv_open(to, from, flags);
764  if (cd == (iconv_t) -1)
765  return -1;
766 
767  size_t len;
768  const char *ib = NULL;
769  char *buf = NULL, *ob = NULL;
770  size_t ibl, obl;
771  const char **inrepls = NULL;
772  const char *outrepl = NULL;
773 
774  if (mutt_ch_is_utf8(to))
775  outrepl = "\357\277\275";
776  else if (mutt_ch_is_utf8(from))
777  inrepls = repls;
778  else
779  outrepl = "?";
780 
781  len = strlen(s);
782  ib = s;
783  ibl = len + 1;
784  obl = MB_LEN_MAX * ibl;
785  buf = mutt_mem_malloc(obl + 1);
786  ob = buf;
787 
788  mutt_ch_iconv(cd, &ib, &ibl, &ob, &obl, inrepls, outrepl, &rc);
789  iconv_close(cd);
790 
791  *ob = '\0';
792 
793  FREE(ps);
794  *ps = buf;
795 
796  mutt_str_adjust(ps);
797  return rc;
798 }
#define mutt_ch_is_utf8(str)
Definition: charset.h:106
void mutt_str_adjust(char **p)
Shrink-to-fit a string.
Definition: string.c:498
size_t ibl
Definition: charset.h:50
size_t mutt_ch_iconv(iconv_t cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft, const char **inrepls, const char *outrepl, int *iconverrno)
Change the encoding of a string.
Definition: charset.c:611
iconv_t cd
Definition: charset.h:44
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
iconv_t mutt_ch_iconv_open(const char *tocode, const char *fromcode, int flags)
Set up iconv for conversions.
Definition: charset.c:558
char * ib
Definition: charset.h:49
#define FREE(x)
Definition: memory.h:40
char * ob
Definition: charset.h:48
const char ** inrepls
Definition: charset.h:51
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_fgetconv()

int mutt_ch_fgetconv ( struct FgetConv fc)

Convert a file's character set.

Parameters
fcFgetConv handle
Return values
numNext character in the converted file
EOFError

A file is read into a buffer and its character set is converted. Each call to this function will return one converted character. The buffer is refilled automatically when empty.

Definition at line 901 of file charset.c.

902 {
903  if (!fc)
904  return EOF;
905  if (fc->cd == (iconv_t) -1)
906  return fgetc(fc->fp);
907  if (!fc->p)
908  return EOF;
909  if (fc->p < fc->ob)
910  return (unsigned char) *(fc->p)++;
911 
912  /* Try to convert some more */
913  fc->p = fc->bufo;
914  fc->ob = fc->bufo;
915  if (fc->ibl)
916  {
917  size_t obl = sizeof(fc->bufo);
918  iconv(fc->cd, (ICONV_CONST char **) &fc->ib, &fc->ibl, &fc->ob, &obl);
919  if (fc->p < fc->ob)
920  return (unsigned char) *(fc->p)++;
921  }
922 
923  /* If we trusted iconv a bit more, we would at this point
924  * ask why it had stopped converting ... */
925 
926  /* Try to read some more */
927  if ((fc->ibl == sizeof(fc->bufi)) ||
928  (fc->ibl && (fc->ib + fc->ibl < fc->bufi + sizeof(fc->bufi))))
929  {
930  fc->p = 0;
931  return EOF;
932  }
933  if (fc->ibl)
934  memcpy(fc->bufi, fc->ib, fc->ibl);
935  fc->ib = fc->bufi;
936  fc->ibl += fread(fc->ib + fc->ibl, 1, sizeof(fc->bufi) - fc->ibl, fc->fp);
937 
938  /* Try harder this time to convert some */
939  if (fc->ibl)
940  {
941  size_t obl = sizeof(fc->bufo);
942  mutt_ch_iconv(fc->cd, (const char **) &fc->ib, &fc->ibl, &fc->ob, &obl,
943  fc->inrepls, 0, NULL);
944  if (fc->p < fc->ob)
945  return (unsigned char) *(fc->p)++;
946  }
947 
948  /* Either the file has finished or one of the buffers is too small */
949  fc->p = 0;
950  return EOF;
951 }
char bufi[512]
Definition: charset.h:45
size_t ibl
Definition: charset.h:50
size_t mutt_ch_iconv(iconv_t cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft, const char **inrepls, const char *outrepl, int *iconverrno)
Change the encoding of a string.
Definition: charset.c:611
FILE * fp
Definition: charset.h:43
iconv_t cd
Definition: charset.h:44
char * ib
Definition: charset.h:49
char * p
Definition: charset.h:47
char bufo[512]
Definition: charset.h:46
char * ob
Definition: charset.h:48
const char ** inrepls
Definition: charset.h:51
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_fgetconv_close()

void mutt_ch_fgetconv_close ( struct FgetConv **  fc)

Close an fgetconv handle.

Parameters
[out]fcfgetconv handle

Definition at line 881 of file charset.c.

882 {
883  if (!fc || !*fc)
884  return;
885 
886  if ((*fc)->cd != (iconv_t) -1)
887  iconv_close((*fc)->cd);
888  FREE(fc);
889 }
#define FREE(x)
Definition: memory.h:40
+ Here is the caller graph for this function:

◆ mutt_ch_fgetconv_open()

struct FgetConv* mutt_ch_fgetconv_open ( FILE *  fp,
const char *  from,
const char *  to,
int  flags 
)

Prepare a file for charset conversion.

Parameters
fpFILE ptr to prepare
fromCurrent character set
toDestination character set
flagsFlags, e.g. MUTT_ICONV_HOOK_FROM
Return values
ptrfgetconv handle

Parameter flags is given as-is to mutt_ch_iconv_open().

Definition at line 851 of file charset.c.

852 {
853  struct FgetConv *fc = NULL;
854  iconv_t cd = (iconv_t) -1;
855 
856  if (from && to)
857  cd = mutt_ch_iconv_open(to, from, flags);
858 
859  if (cd != (iconv_t) -1)
860  {
861  static const char *repls[] = { "\357\277\275", "?", 0 };
862 
863  fc = mutt_mem_malloc(sizeof(struct FgetConv));
864  fc->p = fc->bufo;
865  fc->ob = fc->bufo;
866  fc->ib = fc->bufi;
867  fc->ibl = 0;
868  fc->inrepls = mutt_ch_is_utf8(to) ? repls : repls + 1;
869  }
870  else
871  fc = mutt_mem_malloc(sizeof(struct FgetConvNot));
872  fc->fp = fp;
873  fc->cd = cd;
874  return fc;
875 }
#define mutt_ch_is_utf8(str)
Definition: charset.h:106
char bufi[512]
Definition: charset.h:45
size_t ibl
Definition: charset.h:50
A dummy converter.
Definition: charset.h:57
FILE * fp
Definition: charset.h:43
iconv_t cd
Definition: charset.h:44
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
iconv_t mutt_ch_iconv_open(const char *tocode, const char *fromcode, int flags)
Set up iconv for conversions.
Definition: charset.c:558
char * ib
Definition: charset.h:49
char * p
Definition: charset.h:47
char bufo[512]
Definition: charset.h:46
Cursor for converting a file&#39;s encoding.
Definition: charset.h:41
char * ob
Definition: charset.h:48
const char ** inrepls
Definition: charset.h:51
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_fgetconvs()

char* mutt_ch_fgetconvs ( char *  buf,
size_t  buflen,
struct FgetConv fc 
)

Convert a file's charset into a string buffer.

Parameters
bufBuffer for result
buflenLength of buffer
fcFgetConv handle
Return values
ptrSuccess, result buffer
NULLError

Read a file into a buffer, converting the character set as it goes.

Definition at line 963 of file charset.c.

964 {
965  if (!buf)
966  return NULL;
967 
968  size_t r;
969  for (r = 0; (r + 1) < buflen;)
970  {
971  const int c = mutt_ch_fgetconv(fc);
972  if (c == EOF)
973  break;
974  buf[r++] = (char) c;
975  if (c == '\n')
976  break;
977  }
978  buf[r] = '\0';
979 
980  if (r > 0)
981  return buf;
982 
983  return NULL;
984 }
int mutt_ch_fgetconv(struct FgetConv *fc)
Convert a file&#39;s character set.
Definition: charset.c:901
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_get_default_charset()

char* mutt_ch_get_default_charset ( void  )

Get the default character set.

Return values
ptrName of the default character set
Warning
This returns a pointer to a static buffer. Do not free it.

Definition at line 433 of file charset.c.

434 {
435  static char fcharset[128];
436  const char *c = C_AssumedCharset;
437  const char *c1 = NULL;
438 
439  if (c)
440  {
441  c1 = strchr(c, ':');
442  mutt_str_strfcpy(fcharset, c, c1 ? (c1 - c + 1) : sizeof(fcharset));
443  return fcharset;
444  }
445  return strcpy(fcharset, "us-ascii");
446 }
char * C_AssumedCharset
Config: If a message is missing a character set, assume this character set.
Definition: charset.c:52
size_t mutt_str_strfcpy(char *dest, const char *src, size_t dsize)
Copy a string into a buffer (guaranteeing NUL-termination)
Definition: string.c:776
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_get_langinfo_charset()

char* mutt_ch_get_langinfo_charset ( void  )

Get the user's choice of character set.

Return values
ptrCharset string

Get the canonical character set used by the user's locale. The caller must free the returned string.

Definition at line 455 of file charset.c.

456 {
457  char buf[1024] = { 0 };
458 
459  mutt_ch_canonical_charset(buf, sizeof(buf), nl_langinfo(CODESET));
460 
461  if (buf[0] != '\0')
462  return mutt_str_strdup(buf);
463 
464  return mutt_str_strdup("iso-8859-1");
465 }
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:344
char * mutt_str_strdup(const char *str)
Copy a string, safely.
Definition: string.c:380
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_iconv()

size_t mutt_ch_iconv ( iconv_t  cd,
const char **  inbuf,
size_t *  inbytesleft,
char **  outbuf,
size_t *  outbytesleft,
const char **  inrepls,
const char *  outrepl,
int *  iconverrno 
)

Change the encoding of a string.

Parameters
[in]cdIconv conversion descriptor
[in,out]inbufBuffer to convert
[in,out]inbytesleftLength of buffer to convert
[in,out]outbufBuffer for the result
[in,out]outbytesleftLength of result buffer
[in]inreplsInput replacement characters
[in]outreplOutput replacement characters
[out]iconverrnoErrno if iconv() fails, 0 if it succeeds
Return values
numCharacters converted

Like iconv, but keeps going even when the input is invalid If you're supplying inrepls, the source charset should be stateless; if you're supplying an outrepl, the target charset should be.

Definition at line 611 of file charset.c.

614 {
615  size_t rc = 0;
616  const char *ib = *inbuf;
617  size_t ibl = *inbytesleft;
618  char *ob = *outbuf;
619  size_t obl = *outbytesleft;
620 
621  while (true)
622  {
623  errno = 0;
624  const size_t ret1 = iconv(cd, (ICONV_CONST char **) &ib, &ibl, &ob, &obl);
625  if (ret1 != (size_t) -1)
626  rc += ret1;
627  if (iconverrno)
628  *iconverrno = errno;
629 
630  if (ibl && obl && (errno == EILSEQ))
631  {
632  if (inrepls)
633  {
634  /* Try replacing the input */
635  const char **t = NULL;
636  for (t = inrepls; *t; t++)
637  {
638  const char *ib1 = *t;
639  size_t ibl1 = strlen(*t);
640  char *ob1 = ob;
641  size_t obl1 = obl;
642  iconv(cd, (ICONV_CONST char **) &ib1, &ibl1, &ob1, &obl1);
643  if (ibl1 == 0)
644  {
645  ib++;
646  ibl--;
647  ob = ob1;
648  obl = obl1;
649  rc++;
650  break;
651  }
652  }
653  if (*t)
654  continue;
655  }
656  /* Replace the output */
657  if (!outrepl)
658  outrepl = "?";
659  iconv(cd, NULL, NULL, &ob, &obl);
660  if (obl)
661  {
662  int n = strlen(outrepl);
663  if (n > obl)
664  {
665  outrepl = "?";
666  n = 1;
667  }
668  memcpy(ob, outrepl, n);
669  ib++;
670  ibl--;
671  ob += n;
672  obl -= n;
673  rc++;
674  iconv(cd, NULL, NULL, NULL, NULL); /* for good measure */
675  continue;
676  }
677  }
678  *inbuf = ib;
679  *inbytesleft = ibl;
680  *outbuf = ob;
681  *outbytesleft = obl;
682  return rc;
683  }
684 }
size_t ibl
Definition: charset.h:50
iconv_t cd
Definition: charset.h:44
char * ib
Definition: charset.h:49
int n
Definition: acutest.h:492
#define EILSEQ
Definition: charset.c:49
char * ob
Definition: charset.h:48
const char ** inrepls
Definition: charset.h:51
+ Here is the caller graph for this function:

◆ mutt_ch_iconv_lookup()

const char* mutt_ch_iconv_lookup ( const char *  chs)

Look for a replacement character set.

Parameters
chsCharacter set to lookup
Return values
ptrReplacement character set (if a 'iconv-hook' matches)
NULLNo matching hook

Look through all the 'iconv-hook's. If one matches return the replacement character set.

Definition at line 695 of file charset.c.

696 {
698 }
Character set conversion.
Definition: charset.h:78
static const char * lookup_charset(enum LookupType type, const char *cs)
Look for a preferred character set name.
Definition: charset.c:274
static char * chs
Definition: gnupgparse.c:72
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_iconv_open()

iconv_t mutt_ch_iconv_open ( const char *  tocode,
const char *  fromcode,
int  flags 
)

Set up iconv for conversions.

Parameters
tocodeCurrent character set
fromcodeTarget character set
flagsFlags, e.g. MUTT_ICONV_HOOK_FROM
Return values
ptriconv handle for the conversion

Like iconv_open, but canonicalises the charsets, applies charset-hooks, recanonicalises, and finally applies iconv-hooks. Parameter flags=0 skips charset-hooks, while MUTT_ICONV_HOOK_FROM applies them to fromcode. Callers should use flags=0 when fromcode can safely be considered true, either some constant, or some value provided by the user; MUTT_ICONV_HOOK_FROM should be used only when fromcode is unsure, taken from a possibly wrong incoming MIME label, or such. Misusing MUTT_ICONV_HOOK_FROM leads to unwanted interactions in some setups.

Note
By design charset-hooks should never be, and are never, applied to tocode.
The top-well-named MUTT_ICONV_HOOK_FROM acts on charset-hooks, not at all on iconv-hooks.

Definition at line 558 of file charset.c.

559 {
560  char tocode1[128];
561  char fromcode1[128];
562  const char *tocode2 = NULL, *fromcode2 = NULL;
563  const char *tmp = NULL;
564 
565  iconv_t cd;
566 
567  /* transform to MIME preferred charset names */
568  mutt_ch_canonical_charset(tocode1, sizeof(tocode1), tocode);
569  mutt_ch_canonical_charset(fromcode1, sizeof(fromcode1), fromcode);
570 
571  /* maybe apply charset-hooks and recanonicalise fromcode,
572  * but only when caller asked us to sanitize a potentially wrong
573  * charset name incoming from the wild exterior. */
574  if (flags & MUTT_ICONV_HOOK_FROM)
575  {
576  tmp = mutt_ch_charset_lookup(fromcode1);
577  if (tmp)
578  mutt_ch_canonical_charset(fromcode1, sizeof(fromcode1), tmp);
579  }
580 
581  /* always apply iconv-hooks to suit system's iconv tastes */
582  tocode2 = mutt_ch_iconv_lookup(tocode1);
583  tocode2 = tocode2 ? tocode2 : tocode1;
584  fromcode2 = mutt_ch_iconv_lookup(fromcode1);
585  fromcode2 = fromcode2 ? fromcode2 : fromcode1;
586 
587  /* call system iconv with names it appreciates */
588  cd = iconv_open(tocode2, fromcode2);
589  if (cd != (iconv_t) -1)
590  return cd;
591 
592  return (iconv_t) -1;
593 }
const char * mutt_ch_iconv_lookup(const char *chs)
Look for a replacement character set.
Definition: charset.c:695
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:344
iconv_t cd
Definition: charset.h:44
const char * mutt_ch_charset_lookup(const char *chs)
Look for a replacement character set.
Definition: charset.c:531
#define MUTT_ICONV_HOOK_FROM
apply charset-hooks to fromcode
Definition: charset.h:81
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_lookup_add()

bool mutt_ch_lookup_add ( enum LookupType  type,
const char *  pat,
const char *  replace,
struct Buffer err 
)

Add a new character set lookup.

Parameters
typeType of character set, e.g. MUTT_LOOKUP_CHARSET
patPattern to match
replaceReplacement string
errBuffer for error message
Return values
trueLookup added to list
falseRegex string was invalid

Add a regex for a character set and a replacement name.

Definition at line 478 of file charset.c.

480 {
481  if (!pat || !replace)
482  return false;
483 
484  regex_t *rx = mutt_mem_malloc(sizeof(regex_t));
485  int rc = REG_COMP(rx, pat, REG_ICASE);
486  if (rc != 0)
487  {
488  regerror(rc, rx, err->data, err->dsize);
489  FREE(&rx);
490  return false;
491  }
492 
493  struct Lookup *l = lookup_new();
494  l->type = type;
495  l->replacement = mutt_str_strdup(replace);
496  l->regex.pattern = mutt_str_strdup(pat);
497  l->regex.regex = rx;
498  l->regex.pat_not = false;
499 
500  TAILQ_INSERT_TAIL(&Lookups, l, entries);
501 
502  return true;
503 }
static struct Lookup * lookup_new(void)
Create a new Lookup.
Definition: charset.c:240
regex_t * regex
compiled expression
Definition: regex3.h:91
bool pat_not
do not match
Definition: regex3.h:92
char * replacement
Alternative charset to use.
Definition: charset.c:74
struct Regex regex
Regular expression.
Definition: charset.c:73
Regex to String lookup table.
Definition: charset.c:70
size_t dsize
Length of data.
Definition: buffer.h:37
#define REG_COMP(preg, regex, cflags)
Compile a regular expression.
Definition: regex3.h:53
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
char * data
Pointer to data.
Definition: buffer.h:35
static struct LookupList Lookups
Definition: charset.c:79
#define TAILQ_INSERT_TAIL(head, elm, field)
Definition: queue.h:802
enum LookupType type
Lookup type.
Definition: charset.c:72
char * mutt_str_strdup(const char *str)
Copy a string, safely.
Definition: string.c:380
#define FREE(x)
Definition: memory.h:40
char * pattern
printable version
Definition: regex3.h:90
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_lookup_remove()

void mutt_ch_lookup_remove ( void  )

Remove all the character set lookups.

Empty the list of replacement character set names.

Definition at line 510 of file charset.c.

511 {
512  struct Lookup *l = NULL;
513  struct Lookup *tmp = NULL;
514 
515  TAILQ_FOREACH_SAFE(l, &Lookups, entries, tmp)
516  {
517  TAILQ_REMOVE(&Lookups, l, entries);
518  lookup_free(&l);
519  }
520 }
#define TAILQ_FOREACH_SAFE(var, head, field, tvar)
Definition: queue.h:728
Regex to String lookup table.
Definition: charset.c:70
static void lookup_free(struct Lookup **ptr)
Free a Lookup.
Definition: charset.c:249
#define TAILQ_REMOVE(head, elm, field)
Definition: queue.h:834
static struct LookupList Lookups
Definition: charset.c:79
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_set_charset()

void mutt_ch_set_charset ( const char *  charset)

Update the records for a new character set.

Parameters
charsetNew character set

Check if this character set is utf-8 and pick a suitable replacement character for unprintable characters.

Note
This calls bind_textdomain_codeset() which will affect future message translations.

Definition at line 996 of file charset.c.

997 {
998  char buf[256];
999 
1000  mutt_ch_canonical_charset(buf, sizeof(buf), charset);
1001 
1002  if (mutt_ch_is_utf8(buf))
1003  {
1004  CharsetIsUtf8 = true;
1005  ReplacementChar = 0xfffd; /* replacement character */
1006  }
1007  else
1008  {
1009  CharsetIsUtf8 = false;
1010  ReplacementChar = '?';
1011  }
1012 
1013 #if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(ENABLE_NLS)
1014  bind_textdomain_codeset(PACKAGE, buf);
1015 #endif
1016 }
#define mutt_ch_is_utf8(str)
Definition: charset.h:106
wchar_t ReplacementChar
When a Unicode character can&#39;t be displayed, use this instead.
Definition: charset.c:58
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:344
bool CharsetIsUtf8
Is the user&#39;s current character set utf-8?
Definition: charset.c:63
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

Variable Documentation

◆ C_AssumedCharset

char* C_AssumedCharset

Config: If a message is missing a character set, assume this character set.

Definition at line 52 of file charset.c.

◆ C_Charset

char* C_Charset

Config: Default character set for displaying text on screen.

Definition at line 53 of file charset.c.

◆ CharsetIsUtf8

bool CharsetIsUtf8

Is the user's current character set utf-8?

Definition at line 63 of file charset.c.

◆ ReplacementChar

wchar_t ReplacementChar

When a Unicode character can't be displayed, use this instead.

Definition at line 58 of file charset.c.

◆ PreferredMimeNames

const struct MimeNames PreferredMimeNames[]

Lookup table of preferred charsets.

The following list has been created manually from the data under: http://www.isi.edu/in-notes/iana/assignments/character-sets Last update: 2000-09-07

Note
It includes only the subset of character sets for which a preferred MIME name is given.

Definition at line 92 of file charset.c.