C0 and C1 control codes

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

C0 codes are the range 00_HEX–1F_HEX and the default C0 set was originally defined in ISO 646 (ASCII). C1 codes are the range 80_HEX–9F_HEX and the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used.

C0 controls

ASCII defined 32 control characters, plus a necessary extra character for the DEL character, 7F_HEX or 01111111_BIN (needed to punch out all the holes on a paper tape and erase it).

This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals.

Only a few codes have maintained their use: BEL, ESC, and the "Format Effector" (FE_n) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the C string terminator. Some data transfer protocols such as ANPA-1312, Kermit, and XMODEM do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (IS_n) such as the Unix info format and Python's splitlines string method.

The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for the renamed controls (the old name is the one matching the abbreviation).

ASCII control codes, originally defined in ANSI X3.4.
Caret notation	Decimal	Hexadecimal	Abbreviations	Symbol	Name	C escape	Description
^@	0	00	NUL	␀	Null	\0	Does nothing. The code of blank paper tape, and also used for padding to slow transmission.
^A	1	01	TC₁, SOH	␁	Start of Heading		First character of the heading of a message.
^B	2	02	TC₂, STX	␂	Start of Text		Terminates the header and starts the message text.
^C	3	03	TC₃, ETX	␃	End of Text		Ends the message text, starts a footer (up to the next TC character).
^D	4	04	TC₄, EOT	␄	End of Transmission		Ends the transmission of one or more messages. May place terminals on standby.
^E	5	05	TC₅, ENQ, WRU	␅	Enquiry		Trigger a response at the receiving end, to see if it is still present.
^F	6	06	TC₆, ACK	␆	Acknowledge		Indication of successful receipt of a message.
^G	7	07	BEL	␇	Bell, Alert	\a	Call for attention from an operator.
^H	8	08	FE₀, BS	␈	Backspace	\b	Move one position leftwards. Next character may overprint or replace the character that was there.
^I	9	09	FE₁, HT	␉	Character Tabulation, Horizontal Tabulation	\t	Move right to the next tab stop.
^J	10	0A	FE₂, LF	␊	Line Feed	\n	Move down to the same position on the next line (some devices also moved to the left column).
^K	11	0B	FE₃, VT	␋	Line Tabulation, Vertical Tabulation	\v	Move down to the next vertical tab stop.
^L	12	0C	FE₄, FF	␌	Form Feed	\f	Move down to the top of the next page.
^M	13	0D	FE₅, CR	␍	Carriage Return	\r	Move to column zero while staying on the same line.
^N	14	0E	SO, LS₀	␎	Shift Out		Switch to an alternative character set.
^O	15	0F	SI, LS₁	␏	Shift In		Return to regular character set after SO.
^P	16	10	TC₇, DC₀, DLE	␐	Data Link Escape		Cause a limited number of contiguously following characters to be interpreted in some different way.
^Q	17	11	DC₁, XON	␑	Device Control One		Turn on (DC₁ and DC₂) or off (DC₃ and DC₄) devices. Teletype used these for the paper tape reader and the paper tape punch. The first use became the de facto standard for software flow control.
^R	18	12	DC₂, TAPE	␒	Device Control Two
^S	19	13	DC₃, XOFF	␓	Device Control Three
^T	20	14	DC₄, ~~TAPE~~	␔	Device Control Four
^U	21	15	TC₈, NAK	␕	Negative Acknowledge		Negative response to a sender, such as a detected error.
^V	22	16	TC₉, SYN	␖	Synchronous Idle		Sent in synchronous transmission systems when no other character is being transmitted.
^W	23	17	TC₁₀, ETB	␗	End of Transmission Block		End of a transmission block of data when data are divided into such blocks for transmission purposes.
^X	24	18	CAN	␘	Cancel		Indicates that the data preceding it are in error or are to be disregarded.
^Y	25	19	EM	␙	End of medium		Indicates on paper or magnetic tapes that the end of the usable portion of the tape had been reached.
^Z	26	1A	SUB	␚	Substitute		Replaces a character that was found to be invalid or in error. Should be ignored.
^[	27	1B	ESC	␛	Escape	\e	Alters the meaning of a limited number of following bytes. Nowadays this is almost always introduces an ANSI escape sequence.
^\	28	1C	IS₄, FS	␜	File Separator		Can be used as delimiters to mark fields of data structures. US is the lowest level, while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it. SP (space) could be considered an even lower level.
^]	29	1D	IS₃, GS	␝	Group Separator
^^	30	1E	IS₂, RS	␞	Record Separator
^_	31	1F	IS₁, US	␟	Unit Separator
While not technically part of the C0 control character range, the following two characters can be thought of as having some characteristics of control characters.
	32	20	SP	␠	Space		Move right one character position.
^?	127	7F	DEL	␡	Delete		Should be ignored. Used to delete characters on punched tape by punching out all the holes.

Teletype labelled the key WRU for 'who are you?'
The name BELL is assigned by Unicode to the unrelated emoji character 🔔 (U+1F514). While C0 and C1 control characters were not formally named by the Unicode standard itself at the time, this collided with existing use of BELL as the name of this control character in software following the previous versions of UTS#18 (the Unicode Regular Expressions standard), e.g. in Perl. Unicode now accepts ALERT and BEL (but not BELL) as formal aliases for the control character, although the code chart still lists BELL as the ISO 6429 alias, and the corresponding control picture code point is called SYMBOL FOR BELL. Perl subsequently switched to using BELL for the emoji in version 5.18.
ISO/IEC 2022 (ECMA-35) refers to these as LS0 and LS1 in 8-bit environments, and as SI and SO in 7-bit environments.
The first, 1963 edition of ASCII classified DLE as a device control, rather than a transmission control, and gave it the abbreviation DC0 ("device control reserved for data link escape").
The ' \e' escape sequence is not part of ISO C and many other language specifications. However, it is understood by several compilers, including GCC.

C1 controls

In 1973, ECMA-35 and ISO 2022 attempted to define a method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and vice versa. In a 7-bit environment, the Shift Out (SO) would change the meaning of the 96 bytes 0x20 through 0x7F (ie all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range 0x80 through 0x9F could not be printed in a 7-bit environment, thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes. To allow a 7-bit environment to use these new controls, the sequences ESC @ through ESC _ were to be considered equivalent. The later ISO 8859 standards abandoned support for 7-bit codes, but preserved this range of control characters.

The first C1 control code set to be registered for use with ISO 2022 was DIN 31626, a specialised set for bibliographic use which was registered in 1979.

The more common general-use ISO/IEC 6429 set was registered in 1983, although the ECMA-48 specification upon which it was based had been first published in 1976 and JIS X 0211 (formerly JIS C 6323). Symbolic names defined by RFC 1345 and early drafts of ISO 10646, but not in ISO/IEC 6429 (PAD, HOP and SGC) are also used.

Except for SS2 and SS3 in EUC-JP text, and NEL in text transcoded from EBCDIC, the 8-bit forms of these codes were almost never used. CSI, DCS and OSC are used to control text terminals and terminal emulators, but almost always by using their 7-bit escape code representations. Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of Windows-1252 or Mac OS Roman.

ISO/IEC 6429 and RFC 1345 C1 control codes
ESC+	Decimal	Hex	Abbr	Name	Description
@	128	80	PAD	Padding Character	Proposed mechanism to encode non-ASCII characters. This use was removed in later drafts. Is nonetheless used by the internal-use two-byte fixed-length form of the ISO-2022-based Extended Unix Code (EUC) for left-padding single byte characters in code sets 1 and 3, whereas NUL serves the same function for code sets 0 and 2. This is not done in the usual "packed" EUC format.
A	129	81	HOP	High Octet Preset	Proposed as a means of introducing a sequence of ISO 2022 compliant multiple byte characters with the same first byte without repeating said first byte, thus reducing length; this behaviour was never part of a standard or published implementation. Its name was nonetheless retained as an RFC 1345 standard code-point name.
B	130	82	BPH	Break Permitted Here	Follows a graphic character where a line break is permitted. Roughly equivalent to a soft hyphen except that the means for indicating a line break is not necessarily a hyphen. See also zero-width space.
C	131	83	NBH	No Break Here	Follows the graphic character that is not to be broken. See also word joiner.
D	132	84	IND	Index	Move the active position one line down, to eliminate ambiguity about the meaning of LF. Deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429 (1986 and 1991 respectively for ECMA-48).
E	133	85	NEL	Next Line	Equivalent to CR+LF. Used to mark end-of-line on some IBM mainframes.
F	134	86	SSA	Start of Selected Area	Used by block-oriented terminals. xterm supports a compatibility mode in which `ESC F` moves to the lower-left corner of the screen, since certain software assumes this behaviour.
G	135	87	ESA	End of Selected Area
H	136	88	HTS	Character Tabulation Set Horizontal Tabulation Set	Causes a tab stop to be set at the active position.
I	137	89	HTJ	Character Tabulation With Justification Horizontal Tabulation With Justification	Right-justify the text since the last tab against the next tab stop.
J	138	8A	VTS	Line Tabulation Set Vertical Tabulation Set	Set a vertical tab stop.
K	139	8B	PLD	Partial Line Forward Partial Line Down	Used to produce subscripts and superscripts in ISO/IEC 6429, e.g., in a printer. Subscripts use `PLD text PLU` while superscripts use `PLU text PLD`.
L	140	8C	PLU	Partial Line Backward Partial Line Up
M	141	8D	RI	Reverse Line Feed Reverse Index
N	142	8E	SS2	Single-Shift 2	Next character invokes a graphic character from the G2 or G3 graphic sets respectively.
O	143	8F	SS3	Single-Shift 3
P	144	90	DCS	Device Control String	Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C). This may be used by variable-length control sequences for text terminals and terminal emulators, such as terminfo queries.
Q	145	91	PU1	Private Use 1	Reserved for a function without standardized meaning for private use as required, subject to the prior agreement of the sender and the recipient of the data.
R	146	92	PU2	Private Use 2
S	147	93	STS	Set Transmit State
T	148	94	CCH	Cancel character	Destructive backspace, intended to eliminate ambiguity about meaning of BS.
U	149	95	MW	Message Waiting
V	150	96	SPA	Start of Protected Area	Used by block-oriented terminals.
W	151	97	EPA	End of Protected Area	Used by block-oriented terminals.
X	152	98	SOS	Start of String	Followed by a control string terminated by ST (0x9C) which, in contrast to those initiated by DCS, OSC, PM or APC, may contain any character except SOS or ST.
Y	153	99	SGC, SGCI	Single Graphic Character Introducer	Was used to encode a single multiple-byte character without switching out of a HOP mode, or to allow access to the entire character set from UCS-3, UCS-2 or Latin-1. Would be followed by a UCS-4 representation of a character.
Z	154	9A	SCI	Single Character Introducer	To be followed by a single printable character (0x20 through 0x7E) or format effector (0x08 through 0x0D). The intent was to provide a means by which a control function or a graphic character that would be available regardless of which graphic or control sets were in use could be defined. Definitions of what the following byte would invoke was never implemented in an international standard.
[	155	9B	CSI	Control Sequence Introducer	Used to introduce control sequences that take parameters.
\	156	9C	ST	String Terminator	Terminates a variable-length control string initiated by DCS, SOS, OSC, PM or APC.
]	157	9D	OSC	Operating System Command	Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C). These three control codes were intended for use to allow in-band signaling of protocol information, but are rarely used for that purpose. Some terminal emulators, including xterm, support OSC sequences for setting the window title and reconfiguring the available colour palette. They may also support terminating an OSC sequence with BEL as a non-standard alternative to the standard ST. APC is sometimes used to transmit Kermit commands, although this may be disabled or filtered for security reasons.
^	158	9E	PM	Privacy Message
_	159	9F	APC	Application Program Command

In early versions the range excluded SP and DEL
Not part of ISO/IEC 6429 (ECMA-48)
Not part of the first edition of ISO/IEC 6429.

Other control code sets

The ISO/IEC 2022 (ECMA-35) extension mechanism allowed escape sequences to change the C0 and C1 sets. The standard C0 control character set shown above is chosen with the sequence ESC ! @ and the above C1 set chosen with the sequence ESC " C.

Several official and unofficial alternatives have been defined, but this is pretty much obsolete. Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability. The standard requires that ESC, SP, DEL, and the C1 SS2 and SS3, cannot be changed. It also specifies that if a C0 set included transmission control (TC_n) codes, they must be encoded at their ASCII locations and could not be put in a C1 set, and any new transmission controls must be in a C1 set.

Other C0 control code sets

ANPA-1312, a text markup language used for news transmission, replaces several C0 control characters.
IPTC 7901, the newer international version of the above, has its own variations.
Videotex has a completely different set.
Teletext also defines a set similar to Videotex.
T.61/T.51, and others replaced EM and GS with SS2 and SS3 so these functions could be used in a 7-bit environment.
Some sets replaced FS with SS2, (same as ANPA-1312).
The now-withdrawn JIS C 6225, designated JIS X 0207 in later sources. replaced FS with CEX or "Control Extension" which introduces control sequences for vertical text behaviour, superscripts and subscripts and for transmitting custom character graphics.

Replacement C1 character sets

A specialized C1 control code set is registered for bibliographic use (including string collation), such as by MARC-8.
Various specialised C1 control code sets are registered for use by Videotex formats.
EBCDIC defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to Unicode (or to ISO 8859), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA). Although the New Line (NL) does translate to the ISO/IEC 6429 NEL (although it is often swapped with LF, following UNIX line ending convention), the remainder of the control codes do not correspond. For example, the EBCDIC control SPS and the ECMA-48 control PLU are both used to begin a superscript or end a subscript, but are not mapped to one another. Extended-ASCII-mapped EBCDIC can therefore be regarded as having its own C1 set, although it is not registered with the ISO-IR registry for ISO/IEC 2022.

Unicode

Unicode inherits its first 256 code points from ISO 8859-1, hence also the 65 code points described above, giving them the general category Cc (control). These are:

U+0000–U+001F (C0 controls) and U+007F (DEL) assigned to the Basic Latin block, and
U+0080–U+009F (C1 controls) assigned to the Latin-1 Supplement block.

Unicode only specifies semantics for the C0 format controls HT, LF, VT, FF, and CR, (note BS is missing); the C0 information separators FS, GS, RS, US (and SP); and the C1 control NEL. The rest of the codes are transparent to Unicode and their meanings are left to higher-level protocols, with ISO/IEC 6429 suggested as a default.

Unicode includes many additional format effector characters besides these, such as marks, embeds, isolates and pops for explicit bidirectional formatting, and the zero-width joiner and non-joiner for controlling ligature use. However these are given the general category Cf (format) rather than Cc.