Primitive Type char [−]
Character manipulation (char
type, Unicode Scalar Value)
This module provides the CharExt
trait, as well as its
implementation for the primitive char
type, in order to allow
basic character manipulation.
A char
actually represents a
Unicode Scalar
Value, as it can
contain any Unicode code point except high-surrogate and low-surrogate code
points.
As such, only values in the ranges [0x0,0xD7FF] and [0xE000,0x10FFFF]
(inclusive) are allowed. A char
can always be safely cast to a u32
;
however the converse is not always true due to the above range limits
and, as such, should be performed via the from_u32
function.
Methods
impl char
fn is_digit(self, radix: u32) -> bool
Checks if a char
parses as a numeric digit in the given radix.
Compared to is_numeric()
, this function only recognizes the characters
0-9
, a-z
and A-Z
.
Return value
Returns true
if c
is a valid digit under radix
, and false
otherwise.
Panics
Panics if given a radix > 36.
Examples
fn main() { let c = '1'; assert!(c.is_digit(10)); assert!('f'.is_digit(16)); }let c = '1'; assert!(c.is_digit(10)); assert!('f'.is_digit(16));
fn to_digit(self, radix: u32) -> Option<u32>
Converts a character to the corresponding digit.
Return value
If c
is between '0' and '9', the corresponding value between 0 and
9. If c
is 'a' or 'A', 10. If c
is 'b' or 'B', 11, etc. Returns
none if the character does not refer to a digit in the given radix.
Panics
Panics if given a radix outside the range [0..36].
Examples
fn main() { let c = '1'; assert_eq!(c.to_digit(10), Some(1)); assert_eq!('f'.to_digit(16), Some(15)); }let c = '1'; assert_eq!(c.to_digit(10), Some(1)); assert_eq!('f'.to_digit(16), Some(15));
fn escape_unicode(self) -> EscapeUnicode
Returns an iterator that yields the hexadecimal Unicode escape of a
character, as char
s.
All characters are escaped with Rust syntax of the form \\u{NNNN}
where NNNN
is the shortest hexadecimal representation of the code
point.
Examples
fn main() { for i in '❤'.escape_unicode() { println!("{}", i); } }for i in '❤'.escape_unicode() { println!("{}", i); }
This prints:
\
u
{
2
7
6
4
}
Collecting into a String
:
let heart: String = '❤'.escape_unicode().collect(); assert_eq!(heart, r"\u{2764}");
fn escape_default(self) -> EscapeDefault
Returns an iterator that yields the 'default' ASCII and
C++11-like literal escape of a character, as char
s.
The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:
- Tab, CR and LF are escaped as '\t', '\r' and '\n' respectively.
- Single-quote, double-quote and backslash chars are backslash- escaped.
- Any other chars in the range [0x20,0x7e] are not escaped.
- Any other chars are given hex Unicode escapes; see
escape_unicode
.
Examples
fn main() { for i in '"'.escape_default() { println!("{}", i); } }for i in '"'.escape_default() { println!("{}", i); }
This prints:
\
"
Collecting into a String
:
let quote: String = '"'.escape_default().collect(); assert_eq!(quote, "\\\"");
fn len_utf8(self) -> usize
Returns the number of bytes this character would need if encoded in UTF-8.
Examples
fn main() { let n = 'ß'.len_utf8(); assert_eq!(n, 2); }let n = 'ß'.len_utf8(); assert_eq!(n, 2);
fn len_utf16(self) -> usize
Returns the number of 16-bit code units this character would need if encoded in UTF-16.
Examples
fn main() { let n = 'ß'.len_utf16(); assert_eq!(n, 1); }let n = 'ß'.len_utf16(); assert_eq!(n, 1);
fn encode_utf8(self, dst: &mut [u8]) -> Option<usize>
: pending decision about Iterator/Writer/Reader
Encodes this character as UTF-8 into the provided byte buffer, and then returns the number of bytes written.
If the buffer is not large enough, nothing will be written into it and a
None
will be returned. A buffer of length four is large enough to
encode any char
.
Examples
In both of these examples, 'ß' takes two bytes to encode.
#![feature(unicode)] fn main() { let mut b = [0; 2]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, Some(2)); }let mut b = [0; 2]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, Some(2));
A buffer that's too small:
#![feature(unicode)] fn main() { let mut b = [0; 1]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None); }let mut b = [0; 1]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None);
fn encode_utf16(self, dst: &mut [u16]) -> Option<usize>
: pending decision about Iterator/Writer/Reader
Encodes this character as UTF-16 into the provided u16
buffer, and
then returns the number of u16
s written.
If the buffer is not large enough, nothing will be written into it and a
None
will be returned. A buffer of length 2 is large enough to encode
any char
.
Examples
In both of these examples, 'ß' takes one u16
to encode.
let mut b = [0; 1]; let result = 'ß'.encode_utf16(&mut b); assert_eq!(result, Some(1));
A buffer that's too small:
#![feature(unicode)] fn main() { let mut b = [0; 0]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None); }let mut b = [0; 0]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None);
fn is_alphabetic(self) -> bool
Returns whether the specified character is considered a Unicode alphabetic code point.
fn is_xid_start(self) -> bool
: mainly needed for compiler internals
Returns whether the specified character satisfies the 'XID_Start' Unicode property.
'XID_Start' is a Unicode Derived Property specified in UAX #31, mostly similar to ID_Start but modified for closure under NFKx.
fn is_xid_continue(self) -> bool
: mainly needed for compiler internals
Returns whether the specified char
satisfies the 'XID_Continue'
Unicode property.
'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.
fn is_lowercase(self) -> bool
Indicates whether a character is in lowercase.
This is defined according to the terms of the Unicode Derived Core
Property Lowercase
.
fn is_uppercase(self) -> bool
Indicates whether a character is in uppercase.
This is defined according to the terms of the Unicode Derived Core
Property Uppercase
.
fn is_whitespace(self) -> bool
Indicates whether a character is whitespace.
Whitespace is defined in terms of the Unicode Property White_Space
.
fn is_alphanumeric(self) -> bool
Indicates whether a character is alphanumeric.
Alphanumericness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.
fn is_control(self) -> bool
Indicates whether a character is a control code point.
Control code points are defined in terms of the Unicode General
Category Cc
.
fn is_numeric(self) -> bool
Indicates whether the character is numeric (Nd, Nl, or No).
fn to_lowercase(self) -> ToLowercase
Converts a character to its lowercase equivalent.
The case-folding performed is the common or simple mapping. See
to_uppercase()
for references and more information.
Return value
Returns an iterator which yields the characters corresponding to the lowercase equivalent of the character. If no conversion is possible then the input character is returned.
fn to_uppercase(self) -> ToUppercase
Converts a character to its uppercase equivalent.
The case-folding performed is the common or simple mapping: it maps
one Unicode codepoint to its uppercase equivalent according to the
Unicode database 1. The additional SpecialCasing.txt
is not yet
considered here, but the iterator returned will soon support this form
of case folding.
A full reference can be found here 2.
Return value
Returns an iterator which yields the characters corresponding to the uppercase equivalent of the character. If no conversion is possible then the input character is returned.
fn width(self, is_cjk: bool) -> Option<usize>
: use the crates.io unicode-width
library instead
Returns this character's displayed width in columns, or None
if it is a
control character other than '\x00'
.
is_cjk
determines behavior for characters in the Ambiguous category:
if is_cjk
is true
, these are 2 columns wide; otherwise, they are 1.
In CJK contexts, is_cjk
should be true
, else it should be false
.
Unicode Standard Annex #11
recommends that these characters be treated as 1 column (i.e.,
is_cjk
= false
) if the context cannot be reliably determined.
Trait Implementations
impl PartialEq<char> for char
impl Eq for char
impl PartialOrd<char> for char
fn partial_cmp(&self, other: &char) -> Option<Ordering>
fn lt(&self, other: &char) -> bool
fn le(&self, other: &char) -> bool
fn ge(&self, other: &char) -> bool
fn gt(&self, other: &char) -> bool
impl Ord for char
impl Clone for char
fn clone(&self) -> char
fn clone_from(&mut self, source: &Self)
impl Default for char
impl<'a> Pattern<'a> for char
Searches for chars that are equal to a given char