String
While Move does not have a built-in type to represent strings, it does have two standard implementations for strings in the Standard Library. The std::string module defines a String type and methods for UTF-8 encoded strings, and the second module, std::ascii, provides an ASCII String type and its methods.
Umi Network execution environment automatically converts bytevector into
Stringin transaction inputs. So in many cases, a String does not need to be constructed in the Transaction Block.
Strings are bytes
No matter which type of string you use, it is important to know that strings are just bytes. The wrappers provided by the string and ascii modules are just that: wrappers. They do provide safety checks and methods to work with strings, but at the end of the day, they are just vectors of bytes.
Working with UTF-8 Strings
While there are two types of strings in the standard library, the string module should be considered the default. It has native implementations of many common operations, and hence is more efficient than the ascii module, which is fully implemented in Move.
Definition
The String type in the std::string module is defined as follows:
Creating a String
To create a new UTF-8 String instance, you can use the string::utf8 method. The Standard Library provides an alias .to_string() on the vector<u8> for convenience.
Common Operations
UTF8 String provides a number of methods to work with strings. The most common operations on strings are: concatenation, slicing, and getting the length. Additionally, for custom string operations, the bytes() method can be used to get the underlying byte vector.
Safe UTF-8 Operations
The default utf8 method may abort if the bytes passed into it are not valid UTF-8. If you are not sure that the bytes you are passing are valid, you should use the try_utf8 method instead. It returns an Option<String>, which contains no value if the bytes are not valid UTF-8, and a string otherwise.
Hint: the name that starts with
try_*indicates that the function returns an Option with the expected result ornoneif the operation fails. It is a common naming convention borrowed from Rust.
UTF-8 Limitations
The string module does not provide a way to access individual characters in a string. This is because UTF-8 is a variable-length encoding, and the length of a character can be anywhere from 1 to 4 bytes. Similarly, the length() method returns the number of bytes in the string, not the number of characters.
However, methods like sub_string and insert check character boundaries and will abort when the index is in the middle of a character.