NeoMutt  2019-11-11
Teaching an old dog new tricks
DOXYGEN
charset.h File Reference

Conversion between different character encodings. More...

#include <iconv.h>
#include <stdbool.h>
#include <stdio.h>
#include <wchar.h>
+ Include dependency graph for charset.h:
+ This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  FgetConv
 Cursor for converting a file's encoding. More...
 
struct  FgetConvNot
 A dummy converter. More...
 
struct  MimeNames
 MIME name lookup entry. More...
 

Macros

#define MUTT_ICONV_HOOK_FROM   1
 apply charset-hooks to fromcode More...
 
#define mutt_ch_is_utf8(str)   mutt_ch_chscmp(str, "utf-8")
 
#define mutt_ch_is_us_ascii(str)   mutt_ch_chscmp(str, "us-ascii")
 

Enumerations

enum  LookupType { MUTT_LOOKUP_CHARSET, MUTT_LOOKUP_ICONV }
 Types of character set lookups. More...
 

Functions

void mutt_ch_canonical_charset (char *buf, size_t buflen, const char *name)
 Canonicalise the charset of a string. More...
 
const char * mutt_ch_charset_lookup (const char *chs)
 Look for a replacement character set. More...
 
int mutt_ch_check (const char *s, size_t slen, const char *from, const char *to)
 Check whether a string can be converted between encodings. More...
 
bool mutt_ch_check_charset (const char *cs, bool strict)
 Does iconv understand a character set? More...
 
char * mutt_ch_choose (const char *fromcode, const char *charsets, const char *u, size_t ulen, char **d, size_t *dlen)
 Figure the best charset to encode a string. More...
 
bool mutt_ch_chscmp (const char *cs1, const char *cs2)
 Are the names of two character sets equivalent? More...
 
int mutt_ch_convert_nonmime_string (char **ps)
 Try to convert a string using a list of character sets. More...
 
int mutt_ch_convert_string (char **ps, const char *from, const char *to, int flags)
 Convert a string between encodings. More...
 
int mutt_ch_fgetconv (struct FgetConv *fc)
 Convert a file's character set. More...
 
void mutt_ch_fgetconv_close (struct FgetConv **fc)
 Close an fgetconv handle. More...
 
struct FgetConvmutt_ch_fgetconv_open (FILE *fp, const char *from, const char *to, int flags)
 Prepare a file for charset conversion. More...
 
char * mutt_ch_fgetconvs (char *buf, size_t buflen, struct FgetConv *fc)
 Convert a file's charset into a string buffer. More...
 
char * mutt_ch_get_default_charset (void)
 Get the default character set. More...
 
char * mutt_ch_get_langinfo_charset (void)
 Get the user's choice of character set. More...
 
size_t mutt_ch_iconv (iconv_t cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft, const char **inrepls, const char *outrepl, int *iconverrno)
 Change the encoding of a string. More...
 
const char * mutt_ch_iconv_lookup (const char *chs)
 Look for a replacement character set. More...
 
iconv_t mutt_ch_iconv_open (const char *tocode, const char *fromcode, int flags)
 Set up iconv for conversions. More...
 
bool mutt_ch_lookup_add (enum LookupType type, const char *pat, const char *replace, struct Buffer *err)
 Add a new character set lookup. More...
 
void mutt_ch_lookup_remove (void)
 Remove all the character set lookups. More...
 
void mutt_ch_set_charset (const char *charset)
 Update the records for a new character set. More...
 

Variables

char * C_AssumedCharset
 Config: If a message is missing a character set, assume this character set. More...
 
char * C_Charset
 Config: Default character set for displaying text on screen. More...
 
bool CharsetIsUtf8
 Is the user's current character set utf-8? More...
 
wchar_t ReplacementChar
 When a Unicode character can't be displayed, use this instead. More...
 
const struct MimeNames PreferredMimeNames []
 Lookup table of preferred charsets. More...
 

Detailed Description

Conversion between different character encodings.

Authors
  • Thomas Roessler

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Definition in file charset.h.

Macro Definition Documentation

◆ MUTT_ICONV_HOOK_FROM

#define MUTT_ICONV_HOOK_FROM   1

apply charset-hooks to fromcode

Definition at line 81 of file charset.h.

◆ mutt_ch_is_utf8

#define mutt_ch_is_utf8 (   str)    mutt_ch_chscmp(str, "utf-8")

Definition at line 106 of file charset.h.

◆ mutt_ch_is_us_ascii

#define mutt_ch_is_us_ascii (   str)    mutt_ch_chscmp(str, "us-ascii")

Definition at line 107 of file charset.h.

Enumeration Type Documentation

◆ LookupType

enum LookupType

Types of character set lookups.

Enumerator
MUTT_LOOKUP_CHARSET 

Alias for another character set.

MUTT_LOOKUP_ICONV 

Character set conversion.

Definition at line 75 of file charset.h.

76 {
79 };
Character set conversion.
Definition: charset.h:78
Alias for another character set.
Definition: charset.h:77

Function Documentation

◆ mutt_ch_canonical_charset()

void mutt_ch_canonical_charset ( char *  buf,
size_t  buflen,
const char *  name 
)

Canonicalise the charset of a string.

Parameters
bufBuffer for canonical character set name
buflenLength of buffer
nameName to be canonicalised

This first ties off any charset extension such as "//TRANSLIT", canonicalizes the charset and re-adds the extension

Definition at line 345 of file charset.c.

346 {
347  if (!buf || !name)
348  return;
349 
350  char in[1024], scratch[1024];
351 
352  mutt_str_strfcpy(in, name, sizeof(in));
353  char *ext = strchr(in, '/');
354  if (ext)
355  *ext++ = '\0';
356 
357  if ((mutt_str_strcasecmp(in, "utf-8") == 0) ||
358  (mutt_str_strcasecmp(in, "utf8") == 0))
359  {
360  mutt_str_strfcpy(buf, "utf-8", buflen);
361  goto out;
362  }
363 
364  /* catch some common iso-8859-something misspellings */
365  size_t plen;
366  if ((plen = mutt_str_startswith(in, "8859", CASE_IGNORE)) && (in[plen] != '-'))
367  snprintf(scratch, sizeof(scratch), "iso-8859-%s", in + plen);
368  else if ((plen = mutt_str_startswith(in, "8859-", CASE_IGNORE)))
369  snprintf(scratch, sizeof(scratch), "iso-8859-%s", in + plen);
370  else if ((plen = mutt_str_startswith(in, "iso8859", CASE_IGNORE)) && (in[plen] != '-'))
371  snprintf(scratch, sizeof(scratch), "iso_8859-%s", in + plen);
372  else if ((plen = mutt_str_startswith(in, "iso8859-", CASE_IGNORE)))
373  snprintf(scratch, sizeof(scratch), "iso_8859-%s", in + plen);
374  else
375  mutt_str_strfcpy(scratch, in, sizeof(scratch));
376 
377  for (size_t i = 0; PreferredMimeNames[i].key; i++)
378  {
379  if (mutt_str_strcasecmp(scratch, PreferredMimeNames[i].key) == 0)
380  {
381  mutt_str_strfcpy(buf, PreferredMimeNames[i].pref, buflen);
382  goto out;
383  }
384  }
385 
386  mutt_str_strfcpy(buf, scratch, buflen);
387 
388  /* for cosmetics' sake, transform to lowercase. */
389  for (char *p = buf; *p; p++)
390  *p = tolower(*p);
391 
392 out:
393  if (ext && *ext)
394  {
395  mutt_str_strcat(buf, buflen, "/");
396  mutt_str_strcat(buf, buflen, ext);
397  }
398 }
const char * key
Definition: charset.h:68
static size_t plen
Length of cached packet.
Definition: pgppacket.c:38
const char * name
Definition: pgpmicalg.c:45
const struct MimeNames PreferredMimeNames[]
Lookup table of preferred charsets.
Definition: charset.c:93
size_t mutt_str_strfcpy(char *dest, const char *src, size_t dsize)
Copy a string into a buffer (guaranteeing NUL-termination)
Definition: string.c:750
Ignore case when comparing strings.
Definition: string2.h:68
char * p
Definition: charset.h:47
size_t mutt_str_startswith(const char *str, const char *prefix, enum CaseSensitivity cs)
Check whether a string starts with a prefix.
Definition: string.c:168
char * mutt_str_strcat(char *buf, size_t buflen, const char *s)
Concatenate two strings.
Definition: string.c:395
int mutt_str_strcasecmp(const char *a, const char *b)
Compare two strings ignoring case, safely.
Definition: string.c:628
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_charset_lookup()

const char* mutt_ch_charset_lookup ( const char *  chs)

Look for a replacement character set.

Parameters
chsCharacter set to lookup
Return values
ptrReplacement character set (if a 'charset-hook' matches)
NULLNo matching hook

Look through all the 'charset-hook's. If one matches return the replacement character set.

Definition at line 532 of file charset.c.

533 {
535 }
static const char * lookup_charset(enum LookupType type, const char *cs)
Look for a preferred character set name.
Definition: charset.c:275
static char * chs
Definition: gnupgparse.c:72
Alias for another character set.
Definition: charset.h:77
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_check()

int mutt_ch_check ( const char *  s,
size_t  slen,
const char *  from,
const char *  to 
)

Check whether a string can be converted between encodings.

Parameters
[in]sString to check
[in]slenLength of the string to check
[in]fromCurrent character set
[in]toTarget character set
Return values
0Success
-1Error in iconv_open()
>0Errno as set by iconv()

Definition at line 711 of file charset.c.

712 {
713  if (!s || !from || !to)
714  return -1;
715 
716  int rc = 0;
717  iconv_t cd = mutt_ch_iconv_open(to, from, 0);
718  if (cd == (iconv_t) -1)
719  return -1;
720 
721  size_t outlen = MB_LEN_MAX * slen;
722  char *out = mutt_mem_malloc(outlen + 1);
723  char *saved_out = out;
724 
725  const size_t convlen =
726  iconv(cd, (ICONV_CONST char **) &s, &slen, &out, (size_t *) &outlen);
727  if (convlen == -1)
728  rc = errno;
729 
730  FREE(&saved_out);
731  iconv_close(cd);
732  return rc;
733 }
iconv_t cd
Definition: charset.h:44
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
iconv_t mutt_ch_iconv_open(const char *tocode, const char *fromcode, int flags)
Set up iconv for conversions.
Definition: charset.c:559
#define FREE(x)
Definition: memory.h:40
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_check_charset()

bool mutt_ch_check_charset ( const char *  cs,
bool  strict 
)

Does iconv understand a character set?

Parameters
csCharacter set to check
strictCheck strictly by using iconv
Return values
trueCharacter set is valid

If strict is false, then finding a matching character set in PreferredMimeNames will be enough. If strict is true, or the charset is not in PreferredMimeNames, then iconv() with be run.

Definition at line 812 of file charset.c.

813 {
814  if (!cs)
815  return false;
816 
817  if (mutt_ch_is_utf8(cs))
818  return true;
819 
820  if (!strict)
821  {
822  for (int i = 0; PreferredMimeNames[i].key; i++)
823  {
824  if ((mutt_str_strcasecmp(PreferredMimeNames[i].key, cs) == 0) ||
825  (mutt_str_strcasecmp(PreferredMimeNames[i].pref, cs) == 0))
826  {
827  return true;
828  }
829  }
830  }
831 
832  iconv_t cd = mutt_ch_iconv_open(cs, cs, 0);
833  if (cd != (iconv_t)(-1))
834  {
835  iconv_close(cd);
836  return true;
837  }
838 
839  return false;
840 }
#define mutt_ch_is_utf8(str)
Definition: charset.h:106
const char * key
Definition: charset.h:68
const struct MimeNames PreferredMimeNames[]
Lookup table of preferred charsets.
Definition: charset.c:93
iconv_t cd
Definition: charset.h:44
iconv_t mutt_ch_iconv_open(const char *tocode, const char *fromcode, int flags)
Set up iconv for conversions.
Definition: charset.c:559
int mutt_str_strcasecmp(const char *a, const char *b)
Compare two strings ignoring case, safely.
Definition: string.c:628
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_choose()

char* mutt_ch_choose ( const char *  fromcode,
const char *  charsets,
const char *  u,
size_t  ulen,
char **  d,
size_t *  dlen 
)

Figure the best charset to encode a string.

Parameters
[in]fromcodeOriginal charset of the string
[in]charsetsColon-separated list of potential charsets to use
[in]uString to encode
[in]ulenLength of the string to encode
[out]dIf not NULL, point it to the converted string
[out]dlenIf not NULL, point it to the length of the d string
Return values
ptrBest performing charset
NULLNone could be found

Definition at line 1030 of file charset.c.

1032 {
1033  if (!fromcode)
1034  return NULL;
1035 
1036  char *e = NULL, *tocode = NULL;
1037  size_t elen = 0, bestn = 0;
1038  const char *q = NULL;
1039 
1040  for (const char *p = charsets; p; p = q ? q + 1 : 0)
1041  {
1042  q = strchr(p, ':');
1043 
1044  size_t n = q ? q - p : strlen(p);
1045  if (n == 0)
1046  continue;
1047 
1048  char *t = mutt_mem_malloc(n + 1);
1049  memcpy(t, p, n);
1050  t[n] = '\0';
1051 
1052  char *s = mutt_str_substr_dup(u, u + ulen);
1053  const int rc = d ? mutt_ch_convert_string(&s, fromcode, t, 0) :
1054  mutt_ch_check(s, ulen, fromcode, t);
1055  if (rc)
1056  {
1057  FREE(&t);
1058  FREE(&s);
1059  continue;
1060  }
1061  size_t slen = mutt_str_strlen(s);
1062 
1063  if (!tocode || (n < bestn))
1064  {
1065  bestn = n;
1066  FREE(&tocode);
1067  tocode = t;
1068  if (d)
1069  {
1070  FREE(&e);
1071  e = s;
1072  }
1073  else
1074  FREE(&s);
1075  elen = slen;
1076  }
1077  else
1078  {
1079  FREE(&t);
1080  FREE(&s);
1081  }
1082  }
1083  if (tocode)
1084  {
1085  if (d)
1086  *d = e;
1087  if (dlen)
1088  *dlen = elen;
1089 
1090  char canonical_buf[1024];
1091  mutt_ch_canonical_charset(canonical_buf, sizeof(canonical_buf), tocode);
1092  mutt_str_replace(&tocode, canonical_buf);
1093  }
1094  return tocode;
1095 }
int mutt_ch_convert_string(char **ps, const char *from, const char *to, int flags)
Convert a string between encodings.
Definition: charset.c:748
size_t mutt_str_strlen(const char *a)
Calculate the length of a string, safely.
Definition: string.c:666
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:345
int mutt_ch_check(const char *s, size_t slen, const char *from, const char *to)
Check whether a string can be converted between encodings.
Definition: charset.c:711
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
char * p
Definition: charset.h:47
void mutt_str_replace(char **p, const char *s)
Replace one string with another.
Definition: string.c:453
#define FREE(x)
Definition: memory.h:40
char * mutt_str_substr_dup(const char *begin, const char *end)
Duplicate a sub-string.
Definition: string.c:579
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_chscmp()

bool mutt_ch_chscmp ( const char *  cs1,
const char *  cs2 
)

Are the names of two character sets equivalent?

Parameters
cs1First character set
cs2Second character set
Return values
trueNames are equivalent
falseNames differ

Charsets may have extensions that mutt_ch_canonical_charset() leaves intact; we expect 'cs2' to originate from neomutt code, not user input (i.e. 'cs2' does not have any extension) we simply check if the shorter string is a prefix for the longer.

Definition at line 412 of file charset.c.

413 {
414  if (!cs1 || !cs2)
415  return false;
416 
417  char buf[256];
418 
419  mutt_ch_canonical_charset(buf, sizeof(buf), cs1);
420 
421  int len1 = mutt_str_strlen(buf);
422  int len2 = mutt_str_strlen(cs2);
423 
424  return mutt_str_strncasecmp(((len1 > len2) ? buf : cs2),
425  ((len1 > len2) ? cs2 : buf), MIN(len1, len2)) == 0;
426 }
#define MIN(a, b)
Definition: memory.h:31
size_t mutt_str_strlen(const char *a)
Calculate the length of a string, safely.
Definition: string.c:666
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:345
int mutt_str_strncasecmp(const char *a, const char *b, size_t l)
Compare two strings ignoring case (to a maximum), safely.
Definition: string.c:656
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_convert_nonmime_string()

int mutt_ch_convert_nonmime_string ( char **  ps)

Try to convert a string using a list of character sets.

Parameters
[in,out]psString to be converted
Return values
0Success
-1Error

Work through C_AssumedCharset looking for a character set conversion that works. Failing that, try mutt_ch_get_default_charset().

Definition at line 301 of file charset.c.

302 {
303  if (!ps)
304  return -1;
305 
306  const char *c1 = NULL;
307 
308  for (const char *c = C_AssumedCharset; c; c = c1 ? c1 + 1 : 0)
309  {
310  char *u = *ps;
311  size_t ulen = mutt_str_strlen(*ps);
312 
313  if (!u || !*u)
314  return 0;
315 
316  c1 = strchr(c, ':');
317  size_t n = c1 ? c1 - c : mutt_str_strlen(c);
318  if (n == 0)
319  return 0;
320  char *fromcode = mutt_mem_malloc(n + 1);
321  mutt_str_strfcpy(fromcode, c, n + 1);
322  char *s = mutt_str_substr_dup(u, u + ulen);
323  int m = mutt_ch_convert_string(&s, fromcode, C_Charset, 0);
324  FREE(&fromcode);
325  FREE(&s);
326  if (m == 0)
327  {
328  return 0;
329  }
330  }
333  return -1;
334 }
char * C_AssumedCharset
Config: If a message is missing a character set, assume this character set.
Definition: charset.c:53
int mutt_ch_convert_string(char **ps, const char *from, const char *to, int flags)
Convert a string between encodings.
Definition: charset.c:748
size_t mutt_str_strlen(const char *a)
Calculate the length of a string, safely.
Definition: string.c:666
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
size_t mutt_str_strfcpy(char *dest, const char *src, size_t dsize)
Copy a string into a buffer (guaranteeing NUL-termination)
Definition: string.c:750
#define FREE(x)
Definition: memory.h:40
char * mutt_ch_get_default_charset(void)
Get the default character set.
Definition: charset.c:434
char * C_Charset
Config: Default character set for displaying text on screen.
Definition: charset.c:54
#define MUTT_ICONV_HOOK_FROM
apply charset-hooks to fromcode
Definition: charset.h:81
char * mutt_str_substr_dup(const char *begin, const char *end)
Duplicate a sub-string.
Definition: string.c:579
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_convert_string()

int mutt_ch_convert_string ( char **  ps,
const char *  from,
const char *  to,
int  flags 
)

Convert a string between encodings.

Parameters
[in,out]psString to convert
[in]fromCurrent character set
[in]toTarget character set
[in]flagsFlags, e.g. MUTT_ICONV_HOOK_FROM
Return values
0Success
-1Invalid arguments or failure to open an iconv channel
errnoFailure in iconv conversion

Parameter flags is given as-is to mutt_ch_iconv_open(). See there for its meaning and usage policy.

Definition at line 748 of file charset.c.

749 {
750  if (!ps)
751  return -1;
752 
753  char *s = *ps;
754 
755  if (!s || !*s)
756  return 0;
757 
758  if (!to || !from)
759  return -1;
760 
761  const char *repls[] = { "\357\277\275", "?", 0 };
762  int rc = 0;
763 
764  iconv_t cd = mutt_ch_iconv_open(to, from, flags);
765  if (cd == (iconv_t) -1)
766  return -1;
767 
768  size_t len;
769  const char *ib = NULL;
770  char *buf = NULL, *ob = NULL;
771  size_t ibl, obl;
772  const char **inrepls = NULL;
773  const char *outrepl = NULL;
774 
775  if (mutt_ch_is_utf8(to))
776  outrepl = "\357\277\275";
777  else if (mutt_ch_is_utf8(from))
778  inrepls = repls;
779  else
780  outrepl = "?";
781 
782  len = strlen(s);
783  ib = s;
784  ibl = len + 1;
785  obl = MB_LEN_MAX * ibl;
786  buf = mutt_mem_malloc(obl + 1);
787  ob = buf;
788 
789  mutt_ch_iconv(cd, &ib, &ibl, &ob, &obl, inrepls, outrepl, &rc);
790  iconv_close(cd);
791 
792  *ob = '\0';
793 
794  FREE(ps);
795  *ps = buf;
796 
797  mutt_str_adjust(ps);
798  return rc;
799 }
#define mutt_ch_is_utf8(str)
Definition: charset.h:106
void mutt_str_adjust(char **p)
Shrink-to-fit a string.
Definition: string.c:495
size_t ibl
Definition: charset.h:50
size_t mutt_ch_iconv(iconv_t cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft, const char **inrepls, const char *outrepl, int *iconverrno)
Change the encoding of a string.
Definition: charset.c:612
iconv_t cd
Definition: charset.h:44
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
iconv_t mutt_ch_iconv_open(const char *tocode, const char *fromcode, int flags)
Set up iconv for conversions.
Definition: charset.c:559
char * ib
Definition: charset.h:49
#define FREE(x)
Definition: memory.h:40
char * ob
Definition: charset.h:48
const char ** inrepls
Definition: charset.h:51
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_fgetconv()

int mutt_ch_fgetconv ( struct FgetConv fc)

Convert a file's character set.

Parameters
fcFgetConv handle
Return values
numNext character in the converted file
EOFError

A file is read into a buffer and its character set is converted. Each call to this function will return one converted character. The buffer is refilled automatically when empty.

Definition at line 902 of file charset.c.

903 {
904  if (!fc)
905  return EOF;
906  if (fc->cd == (iconv_t) -1)
907  return fgetc(fc->fp);
908  if (!fc->p)
909  return EOF;
910  if (fc->p < fc->ob)
911  return (unsigned char) *(fc->p)++;
912 
913  /* Try to convert some more */
914  fc->p = fc->bufo;
915  fc->ob = fc->bufo;
916  if (fc->ibl)
917  {
918  size_t obl = sizeof(fc->bufo);
919  iconv(fc->cd, (ICONV_CONST char **) &fc->ib, &fc->ibl, &fc->ob, &obl);
920  if (fc->p < fc->ob)
921  return (unsigned char) *(fc->p)++;
922  }
923 
924  /* If we trusted iconv a bit more, we would at this point
925  * ask why it had stopped converting ... */
926 
927  /* Try to read some more */
928  if ((fc->ibl == sizeof(fc->bufi)) ||
929  (fc->ibl && (fc->ib + fc->ibl < fc->bufi + sizeof(fc->bufi))))
930  {
931  fc->p = 0;
932  return EOF;
933  }
934  if (fc->ibl)
935  memcpy(fc->bufi, fc->ib, fc->ibl);
936  fc->ib = fc->bufi;
937  fc->ibl += fread(fc->ib + fc->ibl, 1, sizeof(fc->bufi) - fc->ibl, fc->fp);
938 
939  /* Try harder this time to convert some */
940  if (fc->ibl)
941  {
942  size_t obl = sizeof(fc->bufo);
943  mutt_ch_iconv(fc->cd, (const char **) &fc->ib, &fc->ibl, &fc->ob, &obl,
944  fc->inrepls, 0, NULL);
945  if (fc->p < fc->ob)
946  return (unsigned char) *(fc->p)++;
947  }
948 
949  /* Either the file has finished or one of the buffers is too small */
950  fc->p = 0;
951  return EOF;
952 }
char bufi[512]
Definition: charset.h:45
size_t ibl
Definition: charset.h:50
size_t mutt_ch_iconv(iconv_t cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft, const char **inrepls, const char *outrepl, int *iconverrno)
Change the encoding of a string.
Definition: charset.c:612
FILE * fp
Definition: charset.h:43
iconv_t cd
Definition: charset.h:44
char * ib
Definition: charset.h:49
char * p
Definition: charset.h:47
char bufo[512]
Definition: charset.h:46
char * ob
Definition: charset.h:48
const char ** inrepls
Definition: charset.h:51
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_fgetconv_close()

void mutt_ch_fgetconv_close ( struct FgetConv **  fc)

Close an fgetconv handle.

Parameters
[out]fcfgetconv handle

Definition at line 882 of file charset.c.

883 {
884  if (!fc || !*fc)
885  return;
886 
887  if ((*fc)->cd != (iconv_t) -1)
888  iconv_close((*fc)->cd);
889  FREE(fc);
890 }
#define FREE(x)
Definition: memory.h:40
+ Here is the caller graph for this function:

◆ mutt_ch_fgetconv_open()

struct FgetConv* mutt_ch_fgetconv_open ( FILE *  fp,
const char *  from,
const char *  to,
int  flags 
)

Prepare a file for charset conversion.

Parameters
fpFILE ptr to prepare
fromCurrent character set
toDestination character set
flagsFlags, e.g. MUTT_ICONV_HOOK_FROM
Return values
ptrfgetconv handle

Parameter flags is given as-is to mutt_ch_iconv_open().

Definition at line 852 of file charset.c.

853 {
854  struct FgetConv *fc = NULL;
855  iconv_t cd = (iconv_t) -1;
856 
857  if (from && to)
858  cd = mutt_ch_iconv_open(to, from, flags);
859 
860  if (cd != (iconv_t) -1)
861  {
862  static const char *repls[] = { "\357\277\275", "?", 0 };
863 
864  fc = mutt_mem_malloc(sizeof(struct FgetConv));
865  fc->p = fc->bufo;
866  fc->ob = fc->bufo;
867  fc->ib = fc->bufi;
868  fc->ibl = 0;
869  fc->inrepls = mutt_ch_is_utf8(to) ? repls : repls + 1;
870  }
871  else
872  fc = mutt_mem_malloc(sizeof(struct FgetConvNot));
873  fc->fp = fp;
874  fc->cd = cd;
875  return fc;
876 }
#define mutt_ch_is_utf8(str)
Definition: charset.h:106
char bufi[512]
Definition: charset.h:45
size_t ibl
Definition: charset.h:50
A dummy converter.
Definition: charset.h:57
FILE * fp
Definition: charset.h:43
iconv_t cd
Definition: charset.h:44
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
iconv_t mutt_ch_iconv_open(const char *tocode, const char *fromcode, int flags)
Set up iconv for conversions.
Definition: charset.c:559
char * ib
Definition: charset.h:49
char * p
Definition: charset.h:47
char bufo[512]
Definition: charset.h:46
Cursor for converting a file&#39;s encoding.
Definition: charset.h:41
char * ob
Definition: charset.h:48
const char ** inrepls
Definition: charset.h:51
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_fgetconvs()

char* mutt_ch_fgetconvs ( char *  buf,
size_t  buflen,
struct FgetConv fc 
)

Convert a file's charset into a string buffer.

Parameters
bufBuffer for result
buflenLength of buffer
fcFgetConv handle
Return values
ptrSuccess, result buffer
NULLError

Read a file into a buffer, converting the character set as it goes.

Definition at line 964 of file charset.c.

965 {
966  if (!buf)
967  return NULL;
968 
969  size_t r;
970  for (r = 0; (r + 1) < buflen;)
971  {
972  const int c = mutt_ch_fgetconv(fc);
973  if (c == EOF)
974  break;
975  buf[r++] = (char) c;
976  if (c == '\n')
977  break;
978  }
979  buf[r] = '\0';
980 
981  if (r > 0)
982  return buf;
983 
984  return NULL;
985 }
int mutt_ch_fgetconv(struct FgetConv *fc)
Convert a file&#39;s character set.
Definition: charset.c:902
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_get_default_charset()

char* mutt_ch_get_default_charset ( void  )

Get the default character set.

Return values
ptrName of the default character set
Warning
This returns a pointer to a static buffer. Do not free it.

Definition at line 434 of file charset.c.

435 {
436  static char fcharset[128];
437  const char *c = C_AssumedCharset;
438  const char *c1 = NULL;
439 
440  if (c)
441  {
442  c1 = strchr(c, ':');
443  mutt_str_strfcpy(fcharset, c, c1 ? (c1 - c + 1) : sizeof(fcharset));
444  return fcharset;
445  }
446  return strcpy(fcharset, "us-ascii");
447 }
char * C_AssumedCharset
Config: If a message is missing a character set, assume this character set.
Definition: charset.c:53
size_t mutt_str_strfcpy(char *dest, const char *src, size_t dsize)
Copy a string into a buffer (guaranteeing NUL-termination)
Definition: string.c:750
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_get_langinfo_charset()

char* mutt_ch_get_langinfo_charset ( void  )

Get the user's choice of character set.

Return values
ptrCharset string

Get the canonical character set used by the user's locale. The caller must free the returned string.

Definition at line 456 of file charset.c.

457 {
458  char buf[1024] = { 0 };
459 
460  mutt_ch_canonical_charset(buf, sizeof(buf), nl_langinfo(CODESET));
461 
462  if (buf[0] != '\0')
463  return mutt_str_strdup(buf);
464 
465  return mutt_str_strdup("iso-8859-1");
466 }
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:345
char * mutt_str_strdup(const char *str)
Copy a string, safely.
Definition: string.c:380
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_iconv()

size_t mutt_ch_iconv ( iconv_t  cd,
const char **  inbuf,
size_t *  inbytesleft,
char **  outbuf,
size_t *  outbytesleft,
const char **  inrepls,
const char *  outrepl,
int *  iconverrno 
)

Change the encoding of a string.

Parameters
[in]cdIconv conversion descriptor
[in,out]inbufBuffer to convert
[in,out]inbytesleftLength of buffer to convert
[in,out]outbufBuffer for the result
[in,out]outbytesleftLength of result buffer
[in]inreplsInput replacement characters
[in]outreplOutput replacement characters
[out]iconverrnoErrno if iconv() fails, 0 if it succeeds
Return values
numCharacters converted

Like iconv, but keeps going even when the input is invalid If you're supplying inrepls, the source charset should be stateless; if you're supplying an outrepl, the target charset should be.

Definition at line 612 of file charset.c.

615 {
616  size_t rc = 0;
617  const char *ib = *inbuf;
618  size_t ibl = *inbytesleft;
619  char *ob = *outbuf;
620  size_t obl = *outbytesleft;
621 
622  while (true)
623  {
624  errno = 0;
625  const size_t ret1 = iconv(cd, (ICONV_CONST char **) &ib, &ibl, &ob, &obl);
626  if (ret1 != (size_t) -1)
627  rc += ret1;
628  if (iconverrno)
629  *iconverrno = errno;
630 
631  if (ibl && obl && (errno == EILSEQ))
632  {
633  if (inrepls)
634  {
635  /* Try replacing the input */
636  const char **t = NULL;
637  for (t = inrepls; *t; t++)
638  {
639  const char *ib1 = *t;
640  size_t ibl1 = strlen(*t);
641  char *ob1 = ob;
642  size_t obl1 = obl;
643  iconv(cd, (ICONV_CONST char **) &ib1, &ibl1, &ob1, &obl1);
644  if (ibl1 == 0)
645  {
646  ib++;
647  ibl--;
648  ob = ob1;
649  obl = obl1;
650  rc++;
651  break;
652  }
653  }
654  if (*t)
655  continue;
656  }
657  /* Replace the output */
658  if (!outrepl)
659  outrepl = "?";
660  iconv(cd, NULL, NULL, &ob, &obl);
661  if (obl)
662  {
663  int n = strlen(outrepl);
664  if (n > obl)
665  {
666  outrepl = "?";
667  n = 1;
668  }
669  memcpy(ob, outrepl, n);
670  ib++;
671  ibl--;
672  ob += n;
673  obl -= n;
674  rc++;
675  iconv(cd, NULL, NULL, NULL, NULL); /* for good measure */
676  continue;
677  }
678  }
679  *inbuf = ib;
680  *inbytesleft = ibl;
681  *outbuf = ob;
682  *outbytesleft = obl;
683  return rc;
684  }
685 }
size_t ibl
Definition: charset.h:50
iconv_t cd
Definition: charset.h:44
char * ib
Definition: charset.h:49
#define EILSEQ
Definition: charset.c:50
char * ob
Definition: charset.h:48
const char ** inrepls
Definition: charset.h:51
+ Here is the caller graph for this function:

◆ mutt_ch_iconv_lookup()

const char* mutt_ch_iconv_lookup ( const char *  chs)

Look for a replacement character set.

Parameters
chsCharacter set to lookup
Return values
ptrReplacement character set (if a 'iconv-hook' matches)
NULLNo matching hook

Look through all the 'iconv-hook's. If one matches return the replacement character set.

Definition at line 696 of file charset.c.

697 {
699 }
Character set conversion.
Definition: charset.h:78
static const char * lookup_charset(enum LookupType type, const char *cs)
Look for a preferred character set name.
Definition: charset.c:275
static char * chs
Definition: gnupgparse.c:72
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_iconv_open()

iconv_t mutt_ch_iconv_open ( const char *  tocode,
const char *  fromcode,
int  flags 
)

Set up iconv for conversions.

Parameters
tocodeCurrent character set
fromcodeTarget character set
flagsFlags, e.g. MUTT_ICONV_HOOK_FROM
Return values
ptriconv handle for the conversion

Like iconv_open, but canonicalises the charsets, applies charset-hooks, recanonicalises, and finally applies iconv-hooks. Parameter flags=0 skips charset-hooks, while MUTT_ICONV_HOOK_FROM applies them to fromcode. Callers should use flags=0 when fromcode can safely be considered true, either some constant, or some value provided by the user; MUTT_ICONV_HOOK_FROM should be used only when fromcode is unsure, taken from a possibly wrong incoming MIME label, or such. Misusing MUTT_ICONV_HOOK_FROM leads to unwanted interactions in some setups.

Note
By design charset-hooks should never be, and are never, applied to tocode.
The top-well-named MUTT_ICONV_HOOK_FROM acts on charset-hooks, not at all on iconv-hooks.

Definition at line 559 of file charset.c.

560 {
561  char tocode1[128];
562  char fromcode1[128];
563  const char *tocode2 = NULL, *fromcode2 = NULL;
564  const char *tmp = NULL;
565 
566  iconv_t cd;
567 
568  /* transform to MIME preferred charset names */
569  mutt_ch_canonical_charset(tocode1, sizeof(tocode1), tocode);
570  mutt_ch_canonical_charset(fromcode1, sizeof(fromcode1), fromcode);
571 
572  /* maybe apply charset-hooks and recanonicalise fromcode,
573  * but only when caller asked us to sanitize a potentially wrong
574  * charset name incoming from the wild exterior. */
575  if (flags & MUTT_ICONV_HOOK_FROM)
576  {
577  tmp = mutt_ch_charset_lookup(fromcode1);
578  if (tmp)
579  mutt_ch_canonical_charset(fromcode1, sizeof(fromcode1), tmp);
580  }
581 
582  /* always apply iconv-hooks to suit system's iconv tastes */
583  tocode2 = mutt_ch_iconv_lookup(tocode1);
584  tocode2 = tocode2 ? tocode2 : tocode1;
585  fromcode2 = mutt_ch_iconv_lookup(fromcode1);
586  fromcode2 = fromcode2 ? fromcode2 : fromcode1;
587 
588  /* call system iconv with names it appreciates */
589  cd = iconv_open(tocode2, fromcode2);
590  if (cd != (iconv_t) -1)
591  return cd;
592 
593  return (iconv_t) -1;
594 }
const char * mutt_ch_iconv_lookup(const char *chs)
Look for a replacement character set.
Definition: charset.c:696
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:345
iconv_t cd
Definition: charset.h:44
const char * mutt_ch_charset_lookup(const char *chs)
Look for a replacement character set.
Definition: charset.c:532
#define MUTT_ICONV_HOOK_FROM
apply charset-hooks to fromcode
Definition: charset.h:81
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_lookup_add()

bool mutt_ch_lookup_add ( enum LookupType  type,
const char *  pat,
const char *  replace,
struct Buffer err 
)

Add a new character set lookup.

Parameters
typeType of character set, e.g. MUTT_LOOKUP_CHARSET
patPattern to match
replaceReplacement string
errBuffer for error message
Return values
trueLookup added to list
falseRegex string was invalid

Add a regex for a character set and a replacement name.

Definition at line 479 of file charset.c.

481 {
482  if (!pat || !replace)
483  return false;
484 
485  regex_t *rx = mutt_mem_malloc(sizeof(regex_t));
486  int rc = REG_COMP(rx, pat, REG_ICASE);
487  if (rc != 0)
488  {
489  regerror(rc, rx, err->data, err->dsize);
490  FREE(&rx);
491  return false;
492  }
493 
494  struct Lookup *l = lookup_new();
495  l->type = type;
496  l->replacement = mutt_str_strdup(replace);
497  l->regex.pattern = mutt_str_strdup(pat);
498  l->regex.regex = rx;
499  l->regex.pat_not = false;
500 
501  TAILQ_INSERT_TAIL(&Lookups, l, entries);
502 
503  return true;
504 }
regex_t * regex
compiled expression
Definition: regex3.h:60
bool pat_not
do not match
Definition: regex3.h:61
char * replacement
Alternative charset to use.
Definition: charset.c:75
struct Regex regex
Regular expression.
Definition: charset.c:74
Regex to String lookup table.
Definition: charset.c:71
size_t dsize
Length of data.
Definition: buffer.h:37
#define REG_COMP(preg, regex, cflags)
Compile a regular expression.
Definition: regex3.h:52
void * mutt_mem_malloc(size_t size)
Allocate memory on the heap.
Definition: memory.c:90
char * data
Pointer to data.
Definition: buffer.h:35
static struct LookupList Lookups
Definition: charset.c:80
#define TAILQ_INSERT_TAIL(head, elm, field)
Definition: queue.h:803
enum LookupType type
Lookup type.
Definition: charset.c:73
struct Lookup * lookup_new(void)
Create a new Lookup.
Definition: charset.c:241
char * mutt_str_strdup(const char *str)
Copy a string, safely.
Definition: string.c:380
#define FREE(x)
Definition: memory.h:40
char * pattern
printable version
Definition: regex3.h:59
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_lookup_remove()

void mutt_ch_lookup_remove ( void  )

Remove all the character set lookups.

Empty the list of replacement character set names.

Definition at line 511 of file charset.c.

512 {
513  struct Lookup *l = NULL;
514  struct Lookup *tmp = NULL;
515 
516  TAILQ_FOREACH_SAFE(l, &Lookups, entries, tmp)
517  {
518  TAILQ_REMOVE(&Lookups, l, entries);
519  lookup_free(&l);
520  }
521 }
#define TAILQ_FOREACH_SAFE(var, head, field, tvar)
Definition: queue.h:729
Regex to String lookup table.
Definition: charset.c:71
void lookup_free(struct Lookup **ptr)
Free a Lookup.
Definition: charset.c:250
#define TAILQ_REMOVE(head, elm, field)
Definition: queue.h:821
static struct LookupList Lookups
Definition: charset.c:80
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ mutt_ch_set_charset()

void mutt_ch_set_charset ( const char *  charset)

Update the records for a new character set.

Parameters
charsetNew character set

Check if this character set is utf-8 and pick a suitable replacement character for unprintable characters.

Note
This calls bind_textdomain_codeset() which will affect future message translations.

Definition at line 997 of file charset.c.

998 {
999  char buf[256];
1000 
1001  mutt_ch_canonical_charset(buf, sizeof(buf), charset);
1002 
1003  if (mutt_ch_is_utf8(buf))
1004  {
1005  CharsetIsUtf8 = true;
1006  ReplacementChar = 0xfffd; /* replacement character */
1007  }
1008  else
1009  {
1010  CharsetIsUtf8 = false;
1011  ReplacementChar = '?';
1012  }
1013 
1014 #if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(ENABLE_NLS)
1015  bind_textdomain_codeset(PACKAGE, buf);
1016 #endif
1017 }
#define mutt_ch_is_utf8(str)
Definition: charset.h:106
wchar_t ReplacementChar
When a Unicode character can&#39;t be displayed, use this instead.
Definition: charset.c:59
void mutt_ch_canonical_charset(char *buf, size_t buflen, const char *name)
Canonicalise the charset of a string.
Definition: charset.c:345
bool CharsetIsUtf8
Is the user&#39;s current character set utf-8?
Definition: charset.c:64
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

Variable Documentation

◆ C_AssumedCharset

char* C_AssumedCharset

Config: If a message is missing a character set, assume this character set.

Definition at line 53 of file charset.c.

◆ C_Charset

char* C_Charset

Config: Default character set for displaying text on screen.

Definition at line 54 of file charset.c.

◆ CharsetIsUtf8

bool CharsetIsUtf8

Is the user's current character set utf-8?

Definition at line 64 of file charset.c.

◆ ReplacementChar

wchar_t ReplacementChar

When a Unicode character can't be displayed, use this instead.

Definition at line 59 of file charset.c.

◆ PreferredMimeNames

const struct MimeNames PreferredMimeNames[]

Lookup table of preferred charsets.

The following list has been created manually from the data under: http://www.isi.edu/in-notes/iana/assignments/character-sets Last update: 2000-09-07

Note
It includes only the subset of character sets for which a preferred MIME name is given.

Definition at line 93 of file charset.c.