Compact Header Rev 2

The compact header is a way to encode information into a small number of bits. The header is used to transmit information about the sender and the message. The header is broken down into several fields that are encoded into bits. The bits are then converted into utf-8 characters for transmission. The receiver will then convert the utf-8 characters back into bits and decode the bits into the original data.

Process

Convert input values of text or numbers into chunks of bits based on the type of data. Each field is treated differently as the information is of different types. Convert bits to an array of utf-8 characters for transmission. The receiver converts the utf-8 characters back into bits and decodes the bits into the original data.
A document here talks about the use of bytes that build a utf-8 character. Turns out if the first byte is greater than 127 then the utf-8 character becomes a multi byte character. Because of this, the number of utf-8 characters can change based on the value of the first byte of a new utf-8 character.
This also means that it's possible to lose a character in the translation process. For example if the last character expects three bytes but there is only one byte left then the character will be lost.
Some testing will be required to determine how robust this method is.

Compact Header

type	input	bit length	bits
version number		8	0000 0001
time stamp		20
Sender Type:		2	00
sender id		48	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
gridsquare		28	0000 0000 0000 0000 0000 0000 0000
name		48	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Input String

Input Binary Stream

Input Binary Stream to UTF-8 (transmitted)

Output Binary Stream

Output String

There are 5 fields in the compact header. Version to allow for a new interpretation of the header when better methods are found. A Timestamp to identify when a message was sent. The Sender type to determine how the sender field is decoded. The sender field which is either a callsign or phone number. The gridsquare field which is a 8 character grid square. The name field which is the last field for the sender's name.
To ensure that decoding is consistent, a terminating character ~ is added to the end of each field.
A field can be of indeterminate length. For example a callsign can be 2 to 6 characters. A phone number can be 7 to 11 numbers. And a name can be 2 to 20 characters.
If the name is shorter than 20 characters then the rest of the field's encoded binary is filled with 1s other wise a utf-8 character for '00000000' is an un-copyable character.

Version Number

1 byte number value. The version number allows for a range from 1 to 255. In the input above we show 1 which is represented as 1 byte 0000 0001 in binary.

Time Stamp

20 bit number value. The timestamp is a 20 bit value that represents every three seconds since the beginning of each month as UTC time. The 0th hour begins in the 12th time zone in New Zealand to ensure that the day begins at the same time for the majority of the population.

Sender Type

2 bit nibble number value. The sender type is a 2 bit value that determines how the sender field is decoded. The sender type can be a ham call sign, a GMRS call sign, or a phone number.

Sender ID

48 bit letter value. The sender ID is a 48 bit value that represents the sender's call sign or phone number. The sender ID is decoded based on the sender type.

Gridsquare

The gridsquare calculation is only accurate to a 6 digit grid square. The dxzone website has a good explanation of the gridsquare calculation. The gridsquare consists of 6 characters, Two letters, two numbers and two more letters. Letters are converted into 5 bit binary values and numbers are converted into 4 bit binary values. The binary values are then added to the bitstream. This means 10 + 8 + 10, which equals 28 bits.
For the sake of readability the last two letters are converted to lowercase when decoded. For encoding all letters are uppercase.

Name

48 bit letter value. The name is a 48 bit value that represents the sender's name. The name is decoded into a utf-8 string.

Terminating Character

The '~' character is used to terminate each field that can be of indeterminate length. Call signs, phone numbers and names can be of varying lengths. The '~' character is used to terminate the field and move to the next field. The rest of the unused part of each field is filled with 1s other wise a utf-8 character for '00000000' is an un-copyable character.