Octave supports a wide range of functions for manipulating strings. Since a string is just a matrix, simple manipulations can be accomplished using standard operators. The following example shows how to replace all blank characters with underscores.
quote = ... "First things first, but not necessarily in that order"; quote( quote == " " ) = "_" ⇒ quote = First_things_first,_but_not_necessarily_in_that_order
For more complex manipulations, such as searching, replacing, and general regular expressions, the following functions come with Octave.
Remove trailing blanks and nulls from s. If s is a matrix, deblank trims each row to the length of longest string. If s is a cell array, operate recursively on each element of the cell array.
Remove leading and trailing blanks and nulls from s. If s is a matrix, strtrim trims each row to the length of longest string. If s is a cell array, operate recursively on each element of the cell array. For example:
strtrim (" abc ") ⇒ "abc" strtrim ([" abc "; " def "]) ⇒ ["abc "; " def"]
Truncate the character string s to length n. If s is a char matrix, then the number of columns is adjusted.
If s is a cell array of strings, then the operation is performed on its members and the new cell array is returned.
Return the vector of all positions in the longer of the two strings s and t where an occurrence of the shorter of the two starts. If the optional argument overlap is nonzero, the returned vector can include overlapping positions (this is the default). For example:
findstr ("ababab", "a") ⇒ [1, 3, 5] findstr ("abababa", "aba", 0) ⇒ [1, 5]See also: strfind, strmatch, strcmp, strncmp, strcmpi, strncmpi, find.
Search for the string str for occurrences of characters from the set chars. The return value, as well as the n and direction arguments behave identically as in
find
.This will be faster than using regexp in most cases.
See also: find.
Return the position of the first occurrence of the string t in the string s, or 0 if no occurrence is found. For example:
index ("Teststring", "t") ⇒ 4If direction is ‘"first"’, return the first element found. If direction is ‘"last"’, return the last element found. The
rindex
function is equivalent toindex
with direction set to ‘"last"’.Caution: This function does not work for arrays of character strings.
Return the position of the last occurrence of the character string t in the character string s, or 0 if no occurrence is found. For example:
rindex ("Teststring", "t") ⇒ 6Caution: This function does not work for arrays of character strings.
Search for pattern in the string str and return the starting index of every such occurrence in the vector idx. If there is no such occurrence, or if pattern is longer than str, then idx is the empty array
[]
.If a cell array of strings cellstr is specified then idx is a cell array of vectors, as specified above. Examples:
strfind ("abababa", "aba") ⇒ [1, 3, 5] strfind ({"abababa", "bebebe", "ab"}, "aba") ⇒ ans = { [1,1] = 1 3 5 [1,2] = [](1x0) [1,3] = [](1x0) }
Return indices of entries of A that match the string s. The second argument A may be a string matrix or a cell array of strings. If the third argument
"exact"
is not given, then s only needs to match A up to the length of s. Trailing whitespace is ignored. Results are returned as a column vector. For example:strmatch ("apple", "apple juice") ⇒ 1 strmatch ("apple", ["apple pie"; "apple juice"; "an apple"]) ⇒ [1; 2] strmatch ("apple", {"apple pie"; "apple juice"; "tomato"}) ⇒ [1; 2]See also: strfind, findstr, strcmp, strncmp, strcmpi, strncmpi, find.
Find all characters up to but not including the first character which is in the string delim. If rem is requested, it contains the remainder of the string, starting at the first delimiter. Leading delimiters are ignored. If delim is not specified, space is assumed. For example:
strtok ("this is the life") ⇒ "this" [tok, rem] = strtok ("14*27+31", "+-*/") ⇒ tok = 14 rem = *27+31
Split a single string using one or more delimiters and return a cell array of strings. Consecutive delimiters and delimiters at boundaries result in empty strings, unless strip_empty is true. The default value of strip_empty is false.
See also: strtok.
Read data from a string.
The string str is split into words that are repeatedly matched to the specifiers in format. The first word is matched to the first specifier, the second to the second specifier and so forth. If there are more words than specifiers, the process is repeated until all words have been processed.
The string format describes how the words in str should be parsed. It may contain any combination of the following specifiers:
%s
- The word is parsed as a string.
%d
%f
- The word is parsed as a number.
%*
- The word is skipped.
Parsed word corresponding to the first specifier are returned in the first output argument and likewise for the rest of the specifiers.
By default, format is "%f", meaning that numbers are read from str.
For example, the string
str = "\ Bunny Bugs 5.5\n\ Duck Daffy -7.5e-5\n\ Penguin Tux 6"can be read using
[a, b, c] = strread (str, "%s %s %f");The behavior of
strread
can be changed via property-value pairs. The following properties are recognized:
- "commentstyle"
- Parts of str are considered comments and will be skipped. value is the comment style and can be any of the following.
- "shell" Everything from
#
characters to the nearest end-line is skipped.- "c" Everything between
/*
and*/
is skipped.- "c++" Everything from
//
characters to the nearest end-line is skipped.- "matlab" Everything from
%
characters to the nearest end-line is skipped.- "delimiter"
- Any character in value will be used to split str into words.
- "emptyvalue"
- Parts of the output where no word is available is filled with value.
Replace all occurrences of the substring ptn in the string s with the string rep and return the result. For example:
strrep ("This is a test string", "is", "&%$") ⇒ "Th&%$ &%$ a test string"s may also be a cell array of strings, in which case the replacement is done for each element and a cell array is returned.
Return the substring of s which starts at character number offset and is len characters long.
If offset is negative, extraction starts that far from the end of the string. If len is omitted, the substring extends to the end of S.
For example:
substr ("This is a test string", 6, 9) ⇒ "is a test"This function is patterned after AWK. You can get the same result by s
(
offset: (
offset+
len- 1))
.
Regular expression string matching. Search for pat in str and return the positions and substrings of any matches, or empty values if there are none. Note, some features and extended options are only available when Octave is compiled with support for Perl Compatible Regular Expressions (PCRE).
The matched pattern pat can include any of the standard regex operators, including:
.
- Match any character
* + ? {}
- Repetition operators, representing
*
- Match zero or more times
+
- Match one or more times
?
- Match zero or one times
{
n}
- Match exactly n times
{
n,}
- Match n or more times
{
m,
n}
- Match between m and n times
[...] [^...]
- List operators. The pattern will match any character listed between "[" and "]". If the first character is "^" then the pattern is inverted and any character except those listed between brackets will match.
With PCRE support, escape sequences defined below can be used inside list operators. For example, a template for a floating point number might be
[-+.\d]+
. POSIX regular expressions do not use escape sequences and any backslash ‘\’ will be interpreted literally as one of the list of characters to match.()
- Grouping operator
|
- Alternation operator. Match one of a choice of regular expressions. The alternatives must be delimited by the grouping operator
()
above.^ $
- Anchoring operators. Requires pattern to occur at the start (
^
) or end ($
) of the string.In addition, the following escaped characters have special meaning. Note, it is recommended to quote pat in single quotes, rather than double quotes, to avoid the escape sequences being interpreted by Octave before being passed to
regexp
.
\b
- Match a word boundary
\B
- Match within a word
\w
- Match any word character
\W
- Match any non-word character
\<
- Match the beginning of a word
\>
- Match the end of a word
\s
- Match any whitespace character
\S
- Match any non-whitespace character
\d
- Match any digit
This sequence is only available with PCRE support. For POSIX regular expressions use the following list operator
[0-9]
.\D
- Match any non-digit
This sequence is only available with PCRE support. For POSIX regular expressions use the following list operator
[^0-9]
.The outputs of
regexp
default to the order given below
- s
- The start indices of each matching substring
- e
- The end indices of each matching substring
- te
- The extents of each matched token surrounded by
(...)
in pat- m
- A cell array of the text of each match
- t
- A cell array of the text of each token matched
- nm
- A structure containing the text of each matched named token, with the name being used as the fieldname. A named token is denoted by
(?<name>...)
and is only available with PCRE support.Particular output arguments, or the order of the output arguments, can be selected by additional opt arguments. These are strings and the correspondence between the output arguments and the optional argument are
'start' s 'end' e 'tokenExtents' te 'match' m 'tokens' t 'names' nm Additional arguments are summarized below.
- ‘once’
- Return only the first occurrence of the pattern.
- ‘matchcase’
- Make the matching case sensitive. (default)
Alternatively, use (?-i) in the pattern when PCRE is available.
- ‘ignorecase’
- Ignore case when matching the pattern to the string.
Alternatively, use (?i) in the pattern when PCRE is available.
- ‘stringanchors’
- Match the anchor characters at the beginning and end of the string. (default)
Alternatively, use (?-m) in the pattern when PCRE is available.
- ‘lineanchors’
- Match the anchor characters at the beginning and end of the line. Only available when Octave is compiled with PCRE.
Alternatively, use (?m) in the pattern when PCRE is available.
- ‘dotall’
- The pattern
.
matches all characters including the newline character. (default)Alternatively, use (?s) in the pattern when PCRE is available.
- ‘dotexceptnewline’
- The pattern
.
matches all characters except the newline character. Only available when Octave is compiled with PCRE.Alternatively, use (?-s) in the pattern when PCRE is available.
- ‘literalspacing’
- All characters in the pattern, including whitespace, are significant and are used in pattern matching. (default)
Alternatively, use (?-x) in the pattern when PCRE is available.
- ‘freespacing’
- The pattern may include arbitrary whitespace and also comments beginning with the character ‘#’. Only available when Octave is compiled with PCRE.
Alternatively, use (?x) in the pattern when PCRE is available.
Case insensitive regular expression string matching. Search for pat in str and return the positions and substrings of any matches, or empty values if there are none. See regexp, for details on the syntax of the search pattern.
See also: regexp.
Replace occurrences of pattern pat in string with repstr.
The pattern is a regular expression as documented for
regexp
. See regexp.The replacement string may contain
$i
, which substitutes for the ith set of parentheses in the match string. For example,regexprep("Bill Dunn",'(\w+) (\w+)','$2, $1')returns "Dunn, Bill"
Options in addition to those of
regexp
are
- ‘once’
- Replace only the first occurrence of pat in the result.
- ‘warnings’
- This option is present for compatibility but is ignored.
Translate a string for use in a regular expression. This might include either wildcard replacement or special character escaping. The behavior can be controlled by the op that can have the values
- "wildcard"
- The wildcard characters
.
,*
and?
are replaced with wildcards that are appropriate for a regular expression. For example:regexptranslate ("wildcard", "*.m") ⇒ ".*\.m"- "escape"
- The characters
$.?[]
, that have special meaning for regular expressions are escaped so that they are treated literally. For example:regexptranslate ("escape", "12.5") ⇒ "12\.5"
Replace TAB characters in t, with spaces. The tab width is specified by tw, or defaults to eight. The input, t, may be either a 2-D character array, or a cell array of character strings. The output is the same class as the input.
If the optional argument deblank is true, then the spaces will be removed from the end of the character data.
The following example reads a file and writes an untabified version of the same file with trailing spaces stripped.
fid = fopen ("tabbed_script.m"); text = char (fread (fid, "uchar")'); fclose (fid); fid = fopen ("untabified_script.m", "w"); text = untabify (strsplit (text, "\n"), 8, true); fprintf (fid, "%s\n", text{:}); fclose (fid);