Python 中的正則表達式

正則表達式這個術語通常被簡稱為正則表達式。正則表達式是定義搜索模式的字元序列，主要用於在搜索引擎和文本處理器中執行查找和替換操作。

Python 通過作為標準庫的一部分捆綁的re模塊提供正則表達式功能。

原始字元串

Python re 模塊中的不同函數使用原始字元串作為參數。當前綴為「R」或「R」時，普通的字元串成為原始字元串。

Example: Raw String

>>> rawstr = r'Hello! How are you?'
>>> print(rawstr)
Hello! How are you?

正常字元串和原始字元串的區別在於 print() 函數中的正常字元串翻譯轉義字元(如\n、\t等)。)的值，而原始字元串中的值則不是。

Example: String vs Raw String

str1 = "Hello!\nHow are you?"
print("normal string:", str1)
str2 = r"Hello!\nHow are you?"
print("raw string:",str2)

Output

normal string: Hello!
How are you?
raw string: Hello!\nHow are you?

在上面的例子中，str1(正常字元串)中的\n被翻譯為下一行中正在列印的換行符。但是，它在str2中被印成了\n——一根生弦。

元字元

當某些字元作為部分模式匹配字元串出現時，它們具有特殊的含義。在 Windows 或 Linux DOS 命令中，我們使用*和？-它們類似於元角色。Python 的 re 模塊使用以下字元作為元字元:

*。^ $ + ?[ ] \ | ( )**

當一組字母數字字元放在方括弧[]內時，目標字元串與這些字元匹配。方括弧中可以列出一系列字元或單個字元。例如:

模式	描述
[abc]	匹配任意字元 a、b 或 c
[a-c]	它使用一個範圍來表示同一組字元。
[a-z]	僅匹配小寫字母。
[0-9]	只匹配數字。

下列特定的字元帶有特定的含義。

模式	描述
\d	匹配任何十進位數字；這相當於類[0-9]。
\D	匹配任何非數字字元
\s	匹配任何空白字元
\S	匹配任何非空白字元
\w	匹配任何字母數字字元
\W	匹配任何非字母數字字元。
。	匹配除換行符「\n」以外的任何單個字元。
？	將模式的 0 或 1 匹配到它的左邊
+	在其左側出現一個或多個圖案
*	該模式在其左側出現 0 次或更多次
\b	詞與非詞的界限。/b 與/B 相反
[..]	匹配方括弧中的任何單個字元
\	它用於特殊含義的字元，如。匹配加號的句點或+。
{n，m}	匹配前面的至少 n 次和最多 m 次出現
a\| b	匹配 a 或 b

re.match()函數

re模塊中的這個函數試圖找出指定的模式是否出現在給定字元串的開頭。

re.match(pattern, string)

如果給定的模式不在開頭，則函數返回無，如果找到匹配的對象，則返回匹配的對象。

Example: re.match()

from re import match

mystr = "Welcome to TutorialsTeacher"
obj1 = match("We", mystr)
print(obj1)
obj2 = match("teacher", mystr)
print(obj2)

Output

<re.Match object; span=(0, 2), match='We'>
None

匹配對象具有start和end屬性。

Example:

>>> print("start:", obj.start(), "end:", obj.end())

Output

start: 0 end: 2

下面的示例演示了如何使用字元範圍來確定一個字元串是否以「W」開頭，後跟一個字母。

Example: match()

from re import match

strings=["Welcome to TutorialsTeacher", "weather forecast","Winston Churchill", "W.G.Grace","Wonders of India", "Water park"]

for string in strings:
    obj = match("W[a-z]",string)
    print(obj)

Output

<re.Match object; span=(0, 2), match='We'>
None
<re.Match object; span=(0, 2), match='Wi'>
None
<re.Match object; span=(0, 2), match='Wo'>
<re.Match object; span=(0, 2), match='Wa'>

re.search()函數

re.search()函數在給定字元串的任意位置搜索指定的模式，並在第一次出現時停止搜索。

Example: re.search()

from re import search

string = "Try to earn while you learn"

obj = search("earn", string)
print(obj)
print(obj.start(), obj.end(), obj.group())
7 11 earn

Output

<re.Match object; span=(7, 11), match='earn'>

該函數還返回具有開始和結束屬性的Match對象。它還給出了一組字元，該模式是其中的一部分。

re.findall()函數

與search()功能相反，findall()繼續搜索模式，直到目標字元串用盡。對象返回所有匹配項的列表。

Example: re.findall()

from re import findall

string = "Try to earn while you learn"

obj = findall("earn", string)
print(obj)

Output

['earn', 'earn']

這個函數可以用來獲取一個句子中的單詞列表。為此，我們將使用\W*模式。我們還會檢查哪些單詞沒有母音。

Example: re.findall()

obj = findall(r"\w*", "Fly in the sky.")
print(obj)

for word in obj:
    obj= search(r"[aeiou]",word)
    if word!='' and obj==None:
        print(word)

Output

['Fly', '', 'in', '', 'the', '', 'sky', '', '']
Fly
sky

re.finditer()函數

re.finditer()函數返回目標字元串中所有匹配項的迭代器對象。對於每個匹配的組，可以通過 span()屬性獲得開始和結束位置。

Example: re.finditer()

from re import finditer

string = "Try to earn while you learn"
it = finditer("earn", string)
for match in it:
    print(match.span())

Output

(7, 11)
(23, 27)

re.split()函數

re.split()功能的工作原理類似於 Python 中str對象的 split() 方法。每次發現空白時，它都會拆分給定的字元串。在上面的findall()獲取所有單詞的例子中，列表還包含作為單詞的每個空格。被re模塊中的split()功能取消。

Example: re.split()

from re import split

string = "Flat is better than nested. Sparse is better than dense."
words = split(r' ', string)
print(words)

Output

['Flat', 'is', 'better', 'than', 'nested.', 'Sparse', 'is', 'better', 'than', 'dense.']

重新編譯()函數

re.compile()函數返回一個模式對象，可以在不同的正則表達式函數中重複使用。在下面的例子中，一個字元串「is」被編譯以獲得一個模式對象，並接受search()方法。

Example: re.compile()

from re import *

pattern = compile(r'[aeiou]')
string = "Flat is better than nested. Sparse is better than dense."
words = split(r' ', string) 
for word in words:
    print(word, pattern.match(word))

Output

Flat None
is <re.Match object; span=(0, 1), match='i'>
better None
than None
nested. None
Sparse None
is <re.Match object; span=(0, 1), match='i'>
better None
than None
dense. None

相同的模式對象可以在搜索帶有母音的單詞時重複使用，如下所示。

Example: search()

for word in words:
    print(word, pattern.search(word))

Output

Flat <re.Match object; span=(2, 3), match='a'>
is <re.Match object; span=(0, 1), match='i'>
better <re.Match object; span=(1, 2), match='e'>
than <re.Match object; span=(2, 3), match='a'>
nested. <re.Match object; span=(1, 2), match='e'>
Sparse <re.Match object; span=(2, 3), match='a'>
is <re.Match object; span=(0, 1), match='i'>
better <re.Match object; span=(1, 2), match='e'>
than <re.Match object; span=(2, 3), match='a'>
dense. <re.Match object; span=(1, 2), match='e'>

原創文章，作者：R99CS，如若轉載，請註明出處：https://www.506064.com/zh-tw/n/128984.html

Python 中的正則表達式

原始字元串

元字元

re.match()函數

re.search()函數

re.findall()函數

re.finditer()函數

re.split()函數

重新編譯()函數

相關推薦

發表回復