A guide to unicode in SMS

What is unicode?

Unicode was devised as a single encoding protocol that would solve the problem of hundreds of different code sets having evolved over time.

Before unicode, encoding different characters was a complete mess with no standard that worked across all languages and platforms.

If you wanted to display text consistently, you had to understand all the different encoding protocols that were in play. You had to deploy unwieldy programmes just to handle the interpretation of all the different encodings and the results were often inconsistent.

The unicode protocol solved all these issues and has been hugely successful.

Unicode allows all the characters used in every language to be encoded under one consistent character set.

It also enables other characters to be encoded which allows the sending of emojis in text messages.

The number of emojis that can you send increases as the Unicode Consortium expands the range.

Unicode characters

The Unicode Consortium

The unicode consortium was established with the purpose of maintaining and publishing the unicode standard.

It’s a US based, not-for-profit whose stated aims are to 

...enable people around the world to use computers in any language, by providing freely-available specifications and data to form the foundation for software internationalization in all major operating systems, search engines, applications, and the World Wide Web.


How do you send an SMS using unicode?

Using the sms api from The SMS Works, you don’t need to take any additional steps to identify non standard characters that are not part of the GSM character set

If we identify non-standard characters or emojis we will automatically identify them and encode them as unicode.

No further action is needed and delivery reports will be generated in the same way.

How many characters are used to send a text in unicode?

If a text is sent using the standard GSM character set, then 160 characters is the maximum text length. If we identify that the message contains non-standard unicode characters, then the number of available characters is reduced to just 70.

If you’re planning to use emojis or other non-standard characters you’ll need to be aware that your text will probably use more than one text credit so could cost double what you were expecting.

How can I tell if my message contains non standard characters before sending a message?

We’ve developed a handy unicode character detector.

Simply type or copy and paste your message in the window and any unicode characters will be highlighted.

You’ll then be able to delete the offending characters and test again to double check.

unicode character detector

Do not copy and paste you text message from Word, Google docs or any other word processing system.

It’s tempting to use a standard word processing package to compose your texts, then simply copy and paste them into your application for sending.

This is always a bad idea.

The apostrophe and comma when copied, are actually in a rich text format which is barely noticeable.

A rich text apostrophe has a curve to the left, whereas a GSM character set apostrophe is a more straight and down character. It's a subtle difference.

Here’s a blow up version of both so you can easily see the difference.

apostrophe in unicode

