Unicode and e-mail

Unicode and e-mail

Many E-mail clients now offer some support for Unicode in E-mail bodies. Most do not send in Unicode by default, but as time passes, more and more systems are likely to be set up with fonts capable of displaying the full range of Unicode characters (or at least the set likely to be of interest to the user).Fact|date=April 2008

Unicode support for E-mail subject lines and E-mail addresses is more problematic, because several different standards need to be used to retrofit the handling of non-ASCII data to the originally ASCII-only E-mail protocol:
* RFC 2047 provides support for encoding non-ASCII values such as real names and subject lines in E-mail headers
* RFC 3490 provides support for encoding non-ASCII domain names

However, mailbox names (the part of the E-mail address before the '@' sign) are still limited to a subset of ASCII printable characters by RFC 2822.

Unicode support in message bodies

HTML e-mail can use HTML entities to use characters from anywhere in Unicode even if the HTML source text for the e-mail is in a legacy encoding. For details of this see Unicode and HTML. The rest of this article will deal with e-mail messages where the actual raw text (whether HTML markup or plain text) is in an encoding that covers the whole of Unicode.

As with all encodings apart from US-ASCII, when using Unicode text in e-mail, MIME must be used to specify that a Unicode transformation format is being used for the text. To use Unicode in email headers, the Unicode text has to be encoded using a MIME "Encoded-Word" with a unicode encoding as the charset.

UTF-7, although sometimes considered deprecated, has an advantage over other Unicode encodings in that it does not require a transfer encoding to fit within the seven-bit limits of many legacy Internet mail servers. UTF-8 and UTF-16 on the other hand must be transfer encoded in base64 or quoted-printable to allow safe transmission across seven-bit mail servers (i.e., those that do not advertise 8BITMIME).

Unicode in various mail clients

Evolution

View > Character Encoding > Unicode
Tools > Settings > Mail Preferences and Composer Preferences > Check default Character Encoding to Unicode

Mozilla Thunderbird

View > Character Encoding > Unicode
Tools > Options… > Fonts > Outgoing Mail / Incoming Mail (change to Unicode)

For Mac: Preferences > Display > Formatting > Fonts… > Character Encoding (bottom of the window).

MS Outlook

Outlook supports sending mail in UTF-7 and UTF-8 but does not do so by default . When replying, Outlook uses the same encoding as the message it is replying to. All Unicode characters can be entered in the edit box, but ones not available in the selected encoding will be silently replaced (usually with a question mark: "?") when sending the message.

Lotus Notes

Notes can send Unicode also:

# From the menu, select File -> Preferences -> User Preferences.
# under Basics -> Additional Options -> Tick Enable UNICODE Display
# Click Mail, then Internet.
# Under "Multilingual Internet mail," choose an option.

Scribe/InScribe

Scribe will display Unicode with default settings. But you can override the charset specified in the headers by right clicking on the body and using the "Change Charset" menu to select a new charset. You can also configure preferred charsets for 8-bit text and us-ascii in the receive options.When sending a suitable legacy charset (8-bit, e.g. ISO-8859-?? or Windows-???) is chosen automatically - however, if the message has a complicated script or a mixture of scripts, UTF-8 will be used by default. You can set a preferred legacy charset in the sending options panel to override the default charset choice. Characters not available in the current font will be substituted from another font installed on the system (if available).

See also

*Comparison of e-mail clients
*List of Unicode fonts
*Free software Unicode fonts

External links

* [http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id= SIL's freeware fonts, editors and documentation]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Unicode character property — Unicode assigns character properties to each code point.[1] These properties can be used to handle characters (code points) in processes, like in line breaking, script direction right to left or applying controls. Slightly inconsequently, some… …   Wikipedia

  • Unicode equivalence — is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character… …   Wikipedia

  • Mail (application) — Mail Mail 5.0 under Mac OS X Lion Developer(s) Apple Inc …   Wikipedia

  • Unicode — For the 1889 Universal Telegraphic Phrase book, see Commercial code (communications). The Unicode official logo since October 2009 …   Wikipedia

  • Unicode font — A Unicode font (also known as UCS font and Unicode typeface) is a computer font that contains a wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc., which are collectively mapped into the standard Universal… …   Wikipedia

  • Unicode symbols — v · Character Types Scripts Unihan ideographs, etc. Phonetic characters Punctuation and separators Diacritics and other marks Symbols Numerals Compatibility characters …   Wikipedia

  • Unicode — est une norme informatique, développée par le Consortium Unicode, qui vise à permettre le codage de texte écrit en donnant à tout caractère de n’importe quel système d’écriture un nom et un identifiant numérique, et ce de manière unifiée, quelle… …   Wikipédia en Français

  • Mapping of Unicode characters — Unicode’s Universal Character Set has a potential capacity to support over 1 million characters. Each UCS character is mapped to a code point which is an integer between 0 and 1,114,111 used to represent each character within the internal logic… …   Wikipedia

  • Phonetic symbols in Unicode — Unicode supports several phonetic scripts and notations through the existing writing systems and the addition of extra blocks with phonetic characters. These phonetic extras are derived of an existing script, usually Latin, Greek or Cyrillic. In… …   Wikipedia

  • Unicode — Logo von Unicode Unicode [ˈjuːnɪkoʊd] ist ein internationaler Standard, in dem langfristig für jedes sinntragende Schriftzeichen oder Textelement aller bekannten Schriftkulturen und Zeichensysteme ein digitaler Code festgelegt wird. Ziel ist es,… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”