Python正則教程詳解

在數據處理、文本處理以及網絡爬蟲方面，正則表達式是一個不可或缺的工具。Python語言天生支持正則表達式，使得Python在數據處理方面顯得十分高效。本文將從多個方面對Python正則表達式進行詳解。

一、正則表達式入門

1、什麼是正則表達式
正則表達式（Regular Expression 或 regex）是一種用來描述、匹配字符序列的方法。可以用來檢索、替換、分割目標字符串等操作。它的表達式由普通字符和元字符（MetaCharacters）組成。

2、正則表達式中的字符表示含義
`.`：代表除了換行符`\n`之外的任意一個字符；
`^`：匹配目標字符串的開頭；
`$`：匹配目標字符串的結尾；
`[]`：表示可以匹配某一範圍內的一個字符；
`|`：表示或，用於連接多個表達式；
`\`：轉義字符，使得在正則表達式中可以使用普通字符的特殊含義；
`+`：至少匹配前面一個字符一次或多次；
`*`：匹配前面一個字符0次或多次（包括0次）；
`?`：匹配前面一個字符0次或一次，只匹配一次是默認最少匹配，加上「`?`」表示關閉默認設置，變成最多匹配。

3、正則表達式的應用
使用Python的`re`庫來使用正則表達式模塊，模塊主要包含了三個函數，分別是`match`函數、`search`函數以及`sub`函數。

（1）`match`函數
`match`函數從字符串的開頭開始搜索匹配正則表達式，如果匹配成功，則返回一個匹配的對象；否則返回None。

import re
pattern = r"hello"
string = "hello world"
match = re.match(pattern, string)
if match:
    print("匹配成功！")
else:
    print("匹配失敗！")

（2）`search`函數
`search`函數在整個字符串中搜索正則表達式，並且返回第一個匹配的對象。如果沒有找到匹配的對象，則返回None。

import re
pattern = r"world"
string = "hello world"
match = re.search(pattern, string)
if match:
    print("匹配成功！")
else:
    print("匹配失敗！")

（3）`sub`函數
`sub`函數用於對目標字符串執行替換操作，並返回替換後的字符串。

import re
pattern = r"world"
string = "hello world"
sub_str = "python"
new_str = re.sub(pattern, sub_str, string)
print(new_str)

二、正則表達式進階應用

1、匹配中文字符
正則表達式中，如果要匹配中文字符，需要用到中文字符的Unicode編碼。

import re
pattern = r"[\u4e00-\u9fa5]"
string = "這是一個中文句子"
match = re.findall(pattern, string)
print(match)

2、匹配郵箱和電話號碼
使用`[]`來匹配，使用`()`將正則表達式分組，方便後面使用。

import re
pattern_email = r"(\w+)@(163|126)\.(com|cn)"
email = "xxx@126.com"
match_email = re.match(pattern_email, email)

pattern_tel = r"(\d{3})-(\d{8})|(\d{4})-(\d{7})"
tel = "010-12345678"
match_tel = re.match(pattern_tel, tel)

if match_email and match_tel:
    print("匹配成功！")
else:
    print("匹配失敗！")

3、匹配HTML標籤
使用正則表達式來匹配HTML標籤時，我們可以使用“的方式來進行匹配，但是這種方式會存在貪婪匹配的問題。

import re

pattern = r""

string = "
hello world
原創文章，作者：MHTHY，如若轉載，請註明出處：https://www.506064.com/zh-hk/n/334091.html

Python正則教程詳解

一、正則表達式入門

二、正則表達式進階應用

相關推薦

發表回復