【原】Python中的正則表達式和示例

軟件測試test 2020-07-10

展開全文

是時候

我們一波了

模塊正則表達式（RE）指定與其匹配的一組字符串（模式）。
為了理解RE的類比，MetaCharacter是有用的，重要的，并且將在模塊re的功能中使用。
一共有14個元字符，將在功能中進行討論：

\用來掉落字符的特殊含義 跟隨它（在下面討論）[]代表角色類別^匹配開頭$匹配結尾。匹配換行符以外的任何字符？匹配零個或一個匹配項。| 表示OR（與任何字符匹配 被它分開。*任意次數(shù)（包括0次）+一次或多次{}指示先前RE的出現(xiàn)次數(shù) 匹配。（）附上一組RE

函數(shù)compile（）將
正則表達式編譯成模式對象，該對象具有用于各種操作的方法，例如搜索模式匹配或執(zhí)行字符串替換。

import re #compile（）創(chuàng)建正則表達式字符類[a-e]，#相當于[abcde]。#類[abcde]將與具有“a”、“b”、“c”、“d”、“e”的字符串匹配。p = re.compile('[a-e]') # findall（）搜索正則表達式，找到后返回一個列表print(p.findall("找到了"))

輸出：

['e'，'a'，'d'，'b'，'e'，'a']

了解輸出：
第一次出現(xiàn)是“ Aye”中的“ e”，而不是“ A”，因為它區(qū)分大小寫。
下一個出現(xiàn)是“ said”中的“ a”，然后是“ said”中的“ d”，然后是“ Gibenson”中的“ b”和“ e”，最后一個“ a”與“ Stark”匹配。

元字符反斜杠“ \”具有非常重要的作用，因為它可以發(fā)出各種序列的信號。如果要使用反斜杠而不使用其特殊含義作為元字符，請使用'\\'

\d匹配任何十進制數(shù)字，這等效 到設置的類別[0-9]。\D匹配任何非數(shù)字字符。\s匹配任何空格字符。\S匹配任何非空白字符\w匹配任何字母數(shù)字字符，這是 等效于類[a-zA-Z0-9_]。\W匹配任何非字母數(shù)字字符。

設置類[\ s ,.]將匹配任何空格字符“，”或“”..

import re # \d相當于[0-9]。p = re.compile('\d') print(p.findall("我在2020年7月9日上午11時去關注軟件測試公眾號")) # \d+ 將匹配[0-9]上的組，組大小為一個或更大 p = re.compile('\d+') print(p.findall("我在2020年7月9日上午11時去關注軟件測試公眾號"))

輸出：

['2', '0', '2', '0', '7', '9', '1', '1']['2020', '7', '9', '11']

import re # \w 相當于[a-zA-Z0-9]p = re.compile('\w') print(p.findall("Official account: software testing test.")) # \w+ 與字母數(shù)字字符組匹配。p = re.compile('\w+') print(p.findall("Official account: software testing test.")) # \W 與非字母數(shù)字字符匹配。p = re.compile('\W') print(p.findall("Official account: software testing test."))

輸出：

['O', 'f', 'f', 'i', 'c', 'i', 'a', 'l', 'a', 'c', 'c', 'o', 'u', 'n', 't', 's', 'o', 'f', 't', 'w', 'a', 'r', 'e', 't', 'e', 's', 't', 'i', 'n', 'g', 't', 'e', 's', 't']

['Official', 'account', 'software', 'testing', 'test'][' ', ':', ' ', ' ', ' ', '.']

import re # '*' 替換字符的出現(xiàn)次數(shù)。p = re.compile('ab*') print(p.findall("ababbaabbb"))

輸出：

['ab'，'abb'，'a'，'abbb']

了解輸出結果：

我們的RE為ab *，后接數(shù)字“ a”。'b'的值從0開始。

輸出'ab'是有效的，因為單一的'b'伴隨著單數(shù)'a'。輸出“ abb”有效，因為單數(shù)為“ a”和2個為“ b”。輸出“ a”有效，因為單數(shù)為“ a”并伴有0“ b”。輸出“ abbb”有效，因為單數(shù)為“ a”并伴有3個“ b”。

函數(shù)split（）
通過出現(xiàn)字符或模式來分割字符串，找到該模式后，字符串中的其余字符將作為結果列表的一部分返回。
語法：

re.split(pattern, string, maxsplit=0, flags=0)

第一個參數(shù)pattern表示正則表達式，string是將在其中搜索pattern并進行拆分的給定字符串，如果未提供maxsplit，則將其視為零“ 0”，如果提供任何非零值，則最多會發(fā)生許多分裂。如果maxsplit = 1，則字符串將僅拆分一次，從而產(chǎn)生一個長度為2的列表。這些標志非常有用，可以幫助縮短代碼，它們不是必需的參數(shù)，例如：flags = re.IGNORECASE，在此拆分中，大小寫將被忽略。

from re import split # '\W+' 非字母數(shù)字字符或字符組# 在找到“，”或空格“”時，split（）將從該點拆分字符串print(split('\W+', 'Software test, Software test, Software test')) print(split('\W+', "Software test")) # 這里的“：”、“”、“、”不是字母數(shù)字，因此是發(fā)生拆分的點print(split('\W+', 'On 12th Jan 2016, at 11:02 AM')) # '\d+' 表示數(shù)字字符或字符組# 拆分僅在“12”、“2020”、“11”、“02”發(fā)生print(split('\d+', '2020年1月12日上午11:02'))

輸出：

['Software', 'test', 'Software', 'test', 'Software', 'test']['Software', 'test']['On', '12th', 'Jan', '2020', 'at', '11', '02', 'AM']['', '年', '月', '日上午', ':', '\u200b']

函數(shù)sub（）
語法：

re.sub（pattern，repl，string，count = 0，flags=0）

函數(shù)中的“ sub”代表SubString，在給定的字符串（第3個參數(shù)）中搜索某個正則表達式模式，并在找到子字符串模式后將其替換為repl（第2個參數(shù)），計數(shù)檢查并保持次數(shù)發(fā)生這種情況。

import re # 正則表達式模式“te”匹配“testing”和“test”處的字符串。#由于忽略了大小寫，因此使用標志“te”應與字符串匹配兩次#匹配后，“testing”中的“te”替換為“~*”，在“test”中，替換“te”。print(re.sub('te', '~*' , 'Coldrain has focused on software testing test', flags = re.IGNORECASE)) # 考慮到大小寫敏感度，“test”中的“te”將不會被重新調用。print(re.sub('te', '~*' , 'Coldrain has focused on software testing test')) # 當最大值為1時，替換次數(shù)為1print(re.sub('te', '~*' , 'Coldrain has focused on software testing test', count=1, flags = re.IGNORECASE))

輸出：

Coldrain has focused on software ~*sting ~*stColdrain has focused on software ~*sting ~*stColdrain has focused on software ~*sting test

函數(shù)subn（）
語法：

re.subn（pattern，repl，string，count = 0，flags= 0）

subn（）在所有方面都類似于sub（），除了提供輸出的方式外。它返回一個元組，其中包含替換和新字符串的總數(shù)，而不僅僅是字符串。

import re print(re.subn('te', '~*' , '雨寒已經(jīng)關注了軟件測試test')) t = re.subn('te', '~*' , '雨寒已經(jīng)關注了軟件測試test', flags = re.IGNORECASE) print(t) print(len(t)) # 這將產(chǎn)生與sub（）相同的輸出print(t[0])

輸出：

('雨寒已經(jīng)關注了軟件測試~*st', 1)('雨寒已經(jīng)關注了軟件測試test', 0)2雨寒已經(jīng)關注了軟件測試test

函數(shù)escape（）
語法：

re.escape（字符串）

返回所有非字母數(shù)字都加反斜杠的字符串，如果要匹配其中可能包含正則表達式元字符的任意文字字符串，此方法很有用。

import re # escape（）返回每個非字母數(shù)字字符前帶有反斜杠“\”的字符串# 僅在第一種情況下“”，不是字母數(shù)字# 在第二種情況下，“，插入符號“^”、“-”、“[]”、“\”不是字母數(shù)字print(re.escape("I'm still writing at 1 a.m")) print(re.escape("I Asked what is this [a-9], he said \t ^WoW"))

輸出

I'm\ still\ writing\ at\ 1\ a\.mI\ Asked\ what\ is\ this\ \[a\-9\],\ he\ said\ \ \ \^WoW