现在的位置: 首页 > 综合 > 正文

Concatenated SMS Messages and Character Counts

2013年08月24日 ⁄ 综合 ⁄ 共 5032字 ⁄ 字号 评论关闭

Figuring out maximum character counts for standard SMS messages is
really quite simple. However, the maximum character counts for concatenated SMS messagesis
a bit more complicated. Throw character
encodings
 into the mix, and everything can become very muddled.

Encodings

Languages which use a Latin-based
alphabet
 (such as English, Spanish, French, etc.) usually use phones supporting the GSM character
encoding
 . The GSM character
encoding uses 7 bits to represent each character (similar to ASCII).
This contrasts with non-Latin-based alphabet languages (such as Chinese, Arabic, Sinhala, Mongolian, etc.) which usually use phones supporting Unicode.
The specific character encoding utilized by these phones is usually UTF-16 or UCS-2.
Both UTF-16
and UCS-2
use 16 bits to represent each character. For the sake of simplicity, I will refer to the Latin-based alphabet and non-Latin-based alphabet languages in this post as “GSM” and “Unicode” languages respectively.

Standard SMS Messages

Standard SMS messages
have a maximum payload of 140 bytes (1120 bits).

Since GSM phones
use a 7-bit character encoding, this allows a maximum of 160 characters per standard SMS message:

1120 bits / (7 bits/character) = 160 characters

For Unicode phones, which use a 16-bit character encoding, this allows a maximum of 70 characters per standard SMS message:

1120 bits / (16 bits/character) = 70 characters

Concatenated SMS Messages

Things get a little bit more complex with concatenated SMS messages.
Concatenated SMSmessages
allow a phone to send messages longer than 160 GSM characters.
The sender creates their message as normal, but without the 140 byte limit. Behind the scenes, the phone detects the message length. If the message is less than or equal to 140 bytes, the phone sends a standard SMS message.
However, if the message is greater than 140 bytes characters, the phone automatically divides the longer message into multiple, shorter SMSmessages
which are then transmitted to the recipient separately.

The recipient’s phone takes these multiple, shorter SMS messages
and recombines them into the original message which was sent. Because the individual segments of the complete message need to be recombined in this way, this is referred to as ‘concatenated SMS’.
In order to achieve this seamless delivery, additional information is added to each individual concatenated SMS message.
This additional information, referred to as the user data header (UDH), provides identification and ordering information. For example, the UDH could
relate the three individual concatenated SMS messages
to each other, and indicate the order for recombination.

The UDH takes
up 6 bytes (48 bits) of a normal SMS message
payload. This reduces the space for actual message data in concatenated SMS messages:

1120 bits - 48 bits = 1072 bits

As a result, each individual concatenated SMS message
can only contain 1072 bits of message data. This plays an important role in determining how many individual concatenatedSMS messages
will be sent based on the actual message data length.

SMS payload diagram

Because GSM phones
use a 7-bit character encoding, each individual concatenated SMSmessage
can hold 153 characters:

1072 bits / (7 bits/character) = 153 characters

(Note: 153 characters * 7 bits/character = 1071 bits. However, the extra bit can’t be used to represent a full character, so it is added as added as padding so that the actual 7-bit encoding data begins on a septet boundary—the 50th bit.)

Unicode phones use a 16-bit character encoding, so each individual concatenated SMSmessage
can hold 67 characters:

1072 bits / (16 bits/character) = 67 characters

Character Count Thresholds

The character limits for individual concatenated SMS messages
results in various thresholds for which additional individual concatenated SMS messages
will be required to support sending a larger overall message:

GSM encoding:

  • 1 standard SMS message
    = up to 160 characters
  • 2 concatenated SMS messages
    = up to 306 characters
  • 3 concatenated SMS messages
    = up to 459 characters
  • 4 concatenated SMS messages
    = up to 612 characters
  • 5 concatenated SMS messages
    = up to 765 characters
  • etc. (153 x number of individual concatenated SMS messages)

UTF-16
encoding:

  • 1 standard SMS message
    = up to 70 characters
  • 2 concatenated SMS messages
    = up to 134 characters
  • 3 concatenated SMS messages
    = up to 201 characters
  • 4 concatenated SMS messages
    = up to 268 characters
  • 5 concatenated SMS messages
    = up to 335 characters
  • etc. (67 x number of individual concatenated SMS messages)

Implications

These thresholds are an important consideration for a number of reasons including billing, and the programmatic interfacing with SMS gateways.

Generally, telephone companies count individual concatenated SMS messages
separately even though they are being recombined at the phone into a single message. This means aGSM encoded
message containing 180 characters could potentially invoke a charge for twoSMS messages,
even if the sender/recipient only sees a single message.

When interfacing with a telephone company’s SMS gateway
programmatically, there may be limits on the number of individual concatenated SMS messages
which can sent as part of a single message. For example, Clickatell’s documentation states that messages sent through their API should
not contain more than 3 concatenated SMS segments.
This may require limiting the number of character input in a web application or service which sends SMSmessages
via an API in
such a manner.

While it may seem elementary, it is important to point out that SMS messages
are always in one particular encoding; i.e. fully GSM or
fully UTF-16.
For example, a period character (”.”) takes up 7-bits in a GSM
SMS
 message. The same character may exist in a Unicode SMSmessage,
but takes up 16-bits, even it is representing the same character.

抱歉!评论已关闭.