Unicode logo

A guide to unicode in SMS

 

Unicode is a character encoding system that allows you to send characters used in different languages. 

If you use Unicode characters, your texts messages will be restricted to just 70 characters, not the standard 160.

Mostly people use Unicode by mistake, by copying messages from Word or Sheets.

The comma and apostrophe are the most common culprits. The standard and Unicode versions look almost identical. See below.

What is unicode?

Unicode was devised as a single encoding protocol that would solve the problem of hundreds of different code sets having evolved over time.

Before unicode, encoding different characters was a complete mess with no standard that worked across all languages and platforms.

If you wanted to display text consistently, you had to understand all the different encoding protocols that were in play. You had to deploy unwieldy programmes just to handle the interpretation of all the different encodings and the results were often inconsistent.

The unicode protocol solved all these issues and has been hugely successful.

Unicode allows all the characters used in every language to be encoded under one consistent character set.

It also enables other characters to be encoded which allows the sending of emojis in text messages.

The number of emojis that can you send increases as the Unicode Consortium expands the range.

A selection of unicode characters

The Unicode Consortium

The unicode consortium was established with the purpose of maintaining and publishing the unicode standard.

It’s a US based, not-for-profit whose stated aims are to 

…enable people around the world to use computers in any language, by providing freely-available specifications and data to form the foundation for software internationalization in all major operating systems, search engines, applications, and the World Wide Web.

 

How do you send an SMS using unicode?

Using the sms api from The SMS Works, you don’t need to take any additional steps to identify non standard characters that are not part of the GSM character set

If we identify non-standard characters or emojis we will automatically identify them and encode them as unicode.

No further action is needed and delivery reports will be generated in the same way.

How many characters are used to send a text in unicode?

If a text is sent using the standard GSM character set, then 160 characters is the maximum text length. If we identify that the message contains non-standard unicode characters, then the number of available characters is reduced to just 70.

If you’re planning to use emojis or other non-standard characters you’ll need to be aware that your text will probably use more than one text credit so could cost double what you were expecting.

How can I tell if my message contains non standard characters before sending a message?

We’ve developed a handy unicode character detector.

Simply type or copy and paste your message in the window and any unicode characters will be highlighted.

You’ll then be able to delete the offending characters and test again to double check.

Unicode character detector tool

Do not copy and paste you text message from Word, Google docs or any other word processing system.

It’s tempting to use a standard word processing package to compose your texts, then simply copy and paste them into your application for sending.

This is always a bad idea.

The apostrophe and comma when copied, are actually in a rich text format which is barely noticeable.

A rich text apostrophe has a curve to the left, whereas a GSM character set apostrophe is a more straight and down character. It’s a subtle difference.

Here’s a blow up version of both so you can easily see the difference.

comma and apostrophe in unicode

Unicode replacement tool

To help customers avoid using more SMS credits than intended, we have an Unicode character replacement tool.

When this is activated in the customer’s account, the most common Unicode characters are replaced with the GSM equivalent. 

The tool is free to all customers and can help significantly reduce unintended overspend.

Our SMS analytics platform also helps identify how many texts are being sent with Unicode, so that action can be taken to remove Unicode from future texts.

Related articles

Sending emojis in texts Is this a good idea? What do people think of emojis and do they cost more to send?

Unicode character detector Eliminate costly unicode characters from your texts with this simple tool.

What’s the maximum length of a text message? Just how long can a text be?

author avatar
Henry Cazalet Managing Director
Co-founder and Director of The SMS Works, a low cost and powerful SMS API for developers. About Henry