API References#
This module provide a set of vectorized function to manipulate string, mostly mimic the main functionality of stringr package in R. As this package use pyarrow as a bridge to communicate with Rust, it ONLY work for any input that can convert to pyarray. And the resust is also a pyarry in most of cases.
- stringpy.str_c(array: Array, collapse: str = None) str#
Collapse a vector of str into a single string
- Parameters:
array (Array) – _description_
collapse (str)
Examples
>>> str_c(['abc', 'def', 'ghi']) 'abcdefghi'
>>> str_c(['abc', 'def', 'ghi'], collapse = '-') 'abc-def-ghi'
- Return type:
str
- stringpy.str_combine(*args, sep: str = None) List#
Combine multiple arrays into a one array of strings
- Parameters:
args (Array to combine)
sep (str) – separator
Examples
>>> str_combine(['a', 'b', 'c'], ['d', 'e', 'f'], sep = '-') ['a-d', 'b-e', 'c-f']
- Return type:
Array
- stringpy.str_count(array: Array, pattern: str = None) Array#
Count the number of times a pattern occurs in each string
- Parameters:
array (Array)
pattern (str)
Examples
>>> str_count(['abc', 'def', 'ghi'], pattern=r'a').to_pylist() [1, 0, 0]
- Return type:
Array
- stringpy.str_detect(array: Array, pattern: str = None) Array#
Detect if each string match a pattern, return a boolean array
- Parameters:
array (Array)
pattern (str)
Examples
>>> str_detect(['abc', 'def', 'ghi'], pattern=r'a').to_pylist() [True, False, False]
- Return type:
Array
- stringpy.str_dup(array: Array, times: int | List[int] = 1) Array#
Duplicate each string in array by times
- Parameters:
array (Array)
times (int)
Examples
>>> str_dup(['abc', 'def', 'ghi'], times = 2).to_pylist() ['abcabc', 'defdef', 'ghighi']
- Return type:
Array
- stringpy.str_ends(array: Array, pattern: str = None, negate: bool = False) Array#
Detect if each string ends with a pattern, return a boolean array
- Parameters:
array (Array)
pattern (str) – Expect a literal string, not a regex, all regex special characters will be escaped
negate (bool) – Negate the result
Examples
>>> str_ends(['abc', 'def', 'ghi'], pattern=r'c').to_pylist() [True, False, False]
>>> str_ends(['ab.c', 'defc', 'ghic'], pattern=r'.c').to_pylist() [True, False, False]
- Return type:
Array
- stringpy.str_extract(array: Array, pattern: str = None, group: int = None) Array#
Extract a first matching pattern in string array
- Parameters:
array (Array)
pattern (str)
group (int) – Group number to extract, by default not use
Examples
>>> str_extract(['abc', 'def', 'ghi'], pattern=r'\w').to_pylist() ['a', 'd', 'g']
- Return type:
Array
- stringpy.str_extract_all(array: Array, pattern: str = None, group: int = None) ListArray#
Extract all matching pattern in string array, for each string input return list of matching output
- Parameters:
array (Array)
pattern (str)
group (int) – Group number to extract, by default not use
Examples
>>> str_extract_all(['abc12', 'd13ef', 'gh23i'], pattern=r'\d').to_pylist() [['1', '2'], ['1', '3'], ['2', '3']]
- Return type:
ListArray
- stringpy.str_length(array: Array) Array#
Get length of each string in array.These are the individual elements (which are often, but not always letters) For example length of “Hà Nội” will be 6
- Parameters:
array (Array)
Examples
>>> str_length(['abc', 'def', 'ghi', None ,'']).to_pylist() [3, 3, 3, None, 0]
- Return type:
Array
- stringpy.str_locate(array: Array, pattern: str = None) Array#
Locate the position of the first match of pattern in each string in array. Return -1 if not found.
- Parameters:
array (Array)
pattern (str) – Expect a literal string, not a regex, all regex special characters will be escaped
- stringpy.str_match(array: Array, pattern: str | List = None) Array#
Extract any number of match define by unnamed/named patter.
- Parameters:
array (Array)
pattern (str) – Expect a literal string, not a regex, all regex special characters will be escaped
Examples
- Return type:
Array
- stringpy.str_pad(array: Array, width: int | List[int] = None, side: str | List[str] = 'left', pad: str | List[str] = ' ') Array#
_summary_
- Parameters:
array (Array) – _description_
width (Union[int, List[int]], optional) – The width of output string, by default None
side (Union[str, List[str]], optional) – _description_, by default ‘left’
pad (str, optional) – _description_, by default ‘ ‘
- Returns:
_description_
- Return type:
Array
- stringpy.str_remove(array: Array, pattern: str = None) Array#
Remove a first matching pattern in string array
- Parameters:
array (Array)
pattern (str)
Examples
>>> str_remove(['abc 12', 'def 23', 'ghi 34'], pattern=r'\d').to_pylist() ['abc 2', 'def 3', 'ghi 4']
- Return type:
Array
- stringpy.str_remove_all(array: Array, pattern: str = None) Array#
Remove all matching pattern in string array
- Parameters:
array (Array)
pattern (str)
Examples
>>> str_remove_all(['abc 1', 'def 2', 'ghi 3'], pattern=r'\d').to_pylist() ['abc ', 'def ', 'ghi ']
- Return type:
Array
- stringpy.str_remove_ascent(array: List) Array#
Remove all accents from each string
- Parameters:
array (Array)
Examples
>>> str_remove_ascent(['sài gòn', 'thời tiết', 'cảm lạnh']).to_pylist() ['sai gon', 'thoi tiet', 'cam lanh']
- Return type:
Array
- stringpy.str_replace(array: Array, pattern: str = None, replace: str = None) Array#
Replace a first matching pattern in string array. Note that in this function, you can also use group in replace. For example, in this case, I split a string in CamelCase. >>> str_replace([‘someThing’, ‘isNot’], pattern=’(?P<g1>[a-z])(?P<g2>[A-Z])’, replace= ‘$g1 $g2’).to_pylist() [‘some Thing’, ‘is Not’]
- Note that group syntax in Rust::Regex is bit different:
(?P<group-name>) to capture group
$group-name to refer to group
- Parameters:
array (Array)
pattern (str)
replace (str)
Examples
>>> str_replace(['abc', 'def', 'ghi'], pattern=r'\w', replace = 'x').to_pylist() ['xbc', 'xef', 'xhi']
- Return type:
Array
- stringpy.str_replace_all(array: Array, pattern: str = None, replace: str = None) Array#
Replace all matching pattern in string array
- Parameters:
array (Array)
pattern (str)
replace (str)
Examples
>>> str_replace_all(['abc 122', 'def 233', 'ghi 344'], pattern=r'\d', replace = 'x').to_pylist() ['abc xxx', 'def xxx', 'ghi xxx']
- Return type:
Array
- stringpy.str_split(array: Array, pattern: str = None) ListArray#
Split each string by a pattern, return a list[array], each array in the list is correspond to a string in input array
- Parameters:
array (Array)
pattern (str)
- Return type:
ListArray
- stringpy.str_squish(array: Array) Array#
Remove all leading, trailing and in between word whitespace from each string
- Parameters:
array (Array)
Examples
>>> str_squish([' abc def', ' def ghi', 'ijk row ']).to_pylist() ['abc def', 'def ghi', 'ijk row']
- Return type:
Array
- stringpy.str_starts(array: Array, pattern: str = None, negate: bool = False) Array#
Detect if each string starts with a pattern, return a boolean array
- Parameters:
array (Array)
pattern (str) – Expect a literal string, not a regex, all regex special characters will be escaped
negate (bool) – Negate the result
Examples
>>> str_starts(['abc', 'def', 'ghi'], pattern=r'a').to_pylist() [True, False, False]
>>> str_starts(['a.bc', 'adef', 'aghi'], pattern=r'a.').to_pylist() [True, False, False]
- Return type:
Array
- stringpy.str_sub(array: Array, start: int | List[int] = None, end: int | List[int] = None) Array#
Get substring of each string in array by index, count from 0. Note that
endis exclusive and must be larger thannstart. If provide negative index, it will be counted from the end of the string. In casestartandendare out side of [0, length of string], it will be corced to the boundary.- Parameters:
array (Array)
start (int) – Start position (inclusive)
end (int) – End position (exclusive)
Examples
>>> str_sub(['abc', 'def', 'ghi'], start = 1, end = 2).to_pylist() ['b', 'e', 'h'] >>> str_sub(['Make', 'you', 'feel'], start = [1,1,1], end= [3,2,3]).to_pylist() ['ak', 'o', 'ee']
- Return type:
Array
- stringpy.str_subset(array: Array, pattern: str = None, negate: bool = False) Array#
Subset (filter) array with a pattern, return string array
- Parameters:
array (Array)
pattern (str) – Expect a literal string, not a regex, all regex special characters will be escaped
negate (bool) – Negate the result
Examples
>>> str_subset(['apple', 'banana', 'pear', 'pineapple'], pattern=r'^a').to_pylist() ['apple']
>>> str_subset(['abc', 'def', 'ghi'], pattern=r'a', negate = True).to_pylist() ['def', 'ghi']
- Return type:
Array
- stringpy.str_to_lower(array: Array) Array#
Convert each string to lower case
- Parameters:
array (Array)
Examples
>>> str_to_lower(['ABC', 'Def', 'Ghi']).to_pylist() ['abc', 'def', 'ghi']
- Return type:
Array
- stringpy.str_to_sentence(array: Array) Array#
Convert each string to sentence case
- Parameters:
array (Array)
Examples
>>> str_to_sentence(['i need you here. right now!']).to_pylist() ['I need you here. Right now!']
- Return type:
Array
- stringpy.str_to_title(array: Array) Array#
Convert each string to title case
- Parameters:
array (Array)
Examples
>>> str_to_title(['abc', 'def', 'ghi']).to_pylist() ['Abc', 'Def', 'Ghi']
- Return type:
Array
- stringpy.str_to_upper(array: Array) Array#
Convert each string to upper case
- Parameters:
array (Array)
Examples
>>> str_to_upper(['abc', 'Def', 'Ghi']).to_pylist() ['ABC', 'DEF', 'GHI']
- Return type:
Array
- stringpy.str_trim(array: Array, side='both') Array#
Remove leading and trailing whitespace from each string
- Parameters:
array (Array)
Examples
>>> str_trim([' abc def', ' def ghi', 'ijk row ']).to_pylist() ['abc def', 'def ghi', 'ijk row']
- Return type:
Array
- stringpy.str_trunc(array: Array, width: int = None, side='left', ellipsis='...') Array#
Truncate each string to a given width, note that this function does NOT support non-ascii characters yet.
- Parameters:
array (Array)
width (int)
side (str) – One of ‘left’, ‘right’, ‘center’
ellipsis (str) – Content of ellipsis that indicates content has been removed.
Examples
>>> str_trunc(['abc def', 'def ghi', 'ijk row'], width = 5).to_pylist() ['abc d...', 'def g...', 'ijk r...']
- Return type:
Array
- stringpy.str_unique(array: Array) Array#
Get unique strings in array
- Parameters:
array (Array)
Examples
>>> str_unique(['abc', 'def', 'ghi', 'abc', 'def']).to_pylist() ['abc', 'def', 'ghi']
- Return type:
Array