Python 中的語法和拼寫檢查器

在下面的教程中，我們將討論一個名為語言工具的 Python 包，並了解如何使用 Python 編程語言創建一個簡單的語法和拼寫檢查器。

所以，讓我們開始吧。

理解 Python 語言工具庫

LanguageTool 是一個用於語法和拼寫檢查的開源工具，也被稱為 OpenOffice 的拼寫檢查器。這個包允許程序員通過 Python 代碼片段或命令行界面來檢測語法和拼寫錯誤。

如何安裝語言工具庫？

要安裝 Python 庫，我們需要『pip』，一個管理從可信公共存儲庫中安裝模塊所需的包的框架。一旦我們有了【畫中畫】，我們就可以使用來自窗口命令提示符(CMD)或終端的命令安裝語言工具庫，如下所示:

語法:


$ pip install language-tool-python

language_tool_python 庫會默認下載一個 LanguageTool 服務器作為 JAR 文件，並在後台執行，在本地檢測語法錯誤。但是 LanguageTool 也提供了支持的公共 HTTP 校對 API 然而，通話次數是有限制的。

驗證安裝

一旦安裝了庫，我們可以通過創建一個空的 Python 程序文件並編寫一個 import 語句來驗證它，如下所示:

文件:驗證. py


import language_tool_python

現在，保存上述文件，並在終端中使用以下命令執行它:

語法:


$ python verify.py

如果上述 Python 程序文件沒有返回任何錯誤，則庫安裝正確。但是，在出現異常的情況下，請嘗試重新安裝庫，並且還建議參考模塊的官方文檔。

使用 Python 語言工具庫

在下一節中，我們將使用一個實際的例子來理解 Python 中語言工具庫的工作。下面的 Python 腳本演示了語法錯誤的檢測和糾正。我們將使用以下文本:

以上文本包含一些用粗體突出顯示的語法和拼寫錯誤。讓我們考慮下面的 Python 腳本來理解語言工具**實用程序的工作原理:

示例:


# importing the package
import language_tool_python

# using the tool
my_tool = language_tool_python.LanguageTool('en-US')

# given text
my_text = """LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for for information on potential errors. or we can use this text too see an some of the issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? Please not that they are not perfect. Style problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 November 2021""" 

# getting the matches
my_matches = my_tool.check(my_text)

# printing matches
print(my_matches)

輸出:

[Match({'ruleId': 'ENGLISH_WORD_REPEAT_RULE', 'message': 'Possible typo: you repeated a word', 'replacements': ['for'], 'offsetInContext': 43, 'context': "...Text' button. Click the colored phrases for for information on potential errors. or we ...", 'offset': 165, 'errorLength': 7, 'category': 'MISC', 'ruleIssueType': 'duplication', 'sentence': 'Click the colored phrases for for information on potential errors.'}), Match({'ruleId': 'UPPERCASE_SENTENCE_START', 'message': 'This sentence does not start with an uppercase letter.', 'replacements': ['Or'], 'offsetInContext': 43, 'context': '...or for information on potential errors. or we can use this text too see an some of...', 'offset': 206, 'errorLength': 2, 'category': 'CASING', 'ruleIssueType': 'typographical', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'TOO_TO', 'message': 'Did you mean "to see"?', 'replacements': ['to see'], 'offsetInContext': 43, 'context': '...tential errors. or we can use this text too see an some of the issues that LanguageTool...', 'offset': 230, 'errorLength': 7, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'EN_A_VS_AN', 'message': 'Use "a" instead of 'an' if the following word doesn't start with a vowel sound, e.g. 'a sentence', 'a university'.', 'replacements': ['a'], 'offsetInContext': 43, 'context': '...errors. or we can use this text too see an some of the issues that LanguageTool ca...', 'offset': 238, 'errorLength': 2, 'category': 'MISC', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['detect', 'defect', 'deduct', 'deject'], 'offsetInContext': 43, 'context': '...ome of the issues that LanguageTool can dedect. Whot do someone thinks of grammar chec...', 'offset': 282, 'errorLength': 6, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['Who', 'What', 'Shot', 'Whom', 'Hot', 'WHO', 'Whet', 'Whit', 'Whoa', 'Whop', 'WHT', 'Wot', 'W hot'], 'offsetInContext': 43, 'context': '...he issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? ...', 'offset': 290, 'errorLength': 4, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Whot do someone thinks of grammar checkers?'}), Match({'ruleId': 'PLEASE_NOT_THAT', 'message': 'Did you mean "note"?', 'replacements': ['note'], 'offsetInContext': 43, 'context': '...eone thinks of grammar checkers? Please not that they are not perfect. Style proble...', 'offset': 341, 'errorLength': 3, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Please not that they are not perfect.'}), Match({'ruleId': 'PM_IN_THE_EVENING', 'message': 'This is redundant. Consider using "P.M."', 'replacements': ['P.M.'], 'offsetInContext': 43, 'context': '...yle problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 Nov...', 'offset': 414, 'errorLength': 19, 'category': 'REDUNDANCY', 'ruleIssueType': 'style', 'sentence': 'Style problems get a blue marker: It is 7 P.M. in the evening.'})]

說明:

在上面的代碼片段中，我們已經導入了所需的庫，並定義了一個工具，該工具使用 LanguageTool 實用程序來檢查文本中的語法和拼寫錯誤。然後，我們定義了另一個字符串變量來存儲我們想要檢查的文本段落。然後，我們使用 check() 功能檢索匹配，並為用戶打印它們。

因此，我們可以觀察到我們有一個詳細的字典，顯示了規則標識、消息、替換、偏移上下文、上下文、偏移等等。我們可以在語言工具社區找到每個規則標識的詳細解釋。

既然我們已經發現了錯誤，是時候糾正它們了。讓我們考慮下面演示相同內容的 Python 腳本:

示例:


# importing the package
import language_tool_python

# using the tool
my_tool = language_tool_python.LanguageTool('en-US')

# given text
my_text = """LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for for information on potential errors. or we can use this text too see an some of the issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? Please not that they are not perfect. Style problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 November 2021""" 

# getting the matches
my_matches = my_tool.check(my_text)

# defining some variables
myMistakes = []
myCorrections = []
startPositions = []
endPositions = []

# using the for-loop
for rules in my_matches:
    if len(rules.replacements) > 0:
        startPositions.append(rules.offset)
        endPositions.append(rules.errorLength + rules.offset)
        myMistakes.append(my_text[rules.offset : rules.errorLength + rules.offset])
        myCorrections.append(rules.replacements[0])

# creating new object
my_NewText = list(my_text) 

# rewriting the correct passage
for n in range(len(startPositions)):
    for i in range(len(my_text)):
        my_NewText[startPositions[n]] = myCorrections[n]
        if (i > startPositions[n] and i < endPositions[n]):
            my_NewText[i] = ""

my_NewText = "".join(my_NewText)

# printing the text
print(my_NewText)

輸出:

LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for information on potential errors. Or we can use this text to see a some of the issues that LanguageTool can detect. Who do someone thinks of grammar checkers? Please note that they are not perfect. Style problems get a blue marker: It is 7 P.M.. The weather was nice on Monday, 22 November 2021

說明:

在上面的代碼片段中，我們包含了一些新的變量來處理錯誤、更正、開始位置和結束位置。然後，我們使用循環的來遍歷 my_matches 中的規則，並用它們的更正替換錯誤。然後，我們將這些更正的文本存儲在一個列表中。最後，我們再次使用的循環來遍歷列表中的字符串元素，將它們連接在一起，並為用戶打印結果文本。

因此，我們成功地糾正了在前面的代碼片段中發現的錯誤。

現在，讓我們使用下面的 Python 腳本來觀察我們之前捕獲的錯誤以及它們各自的更正:

示例:


print(list(zip(myMistakes, myCorrections)))

輸出:

[('for for', 'for'), ('or', 'Or'), ('too see', 'to see'), ('an', 'a'), ('dedect', 'detect'), ('Whot', 'Who'), ('not', 'note'), ('P.M. in the evening', 'P.M.')]

說明:

在上面的代碼片段中，我們打印了文本中的錯誤列表及其相應的更正。

自動將建議應用於文本

讓我們考慮一個簡單的例子，演示如何使用 Python 中的語言工具庫將建議自動應用於文本。

示例:


# importing the library
import language_tool_python

# creating the tool
my_tool = language_tool_python.LanguageTool('en-US')

# given text
my_text = 'A quick broun fox jumpps over a a little lazy dog.'

# correction
correct_text = my_tool.correct(my_text)

# printing some texts
print("Original Text:", my_text)
print("Text after correction:", correct_text)

輸出:

Original Text: A quick broun fox jumpps over a a little lazy dog.
Text after correction: A quick brown fox jumps over a little lazy dog.

說明:

在上面的代碼片段中，我們已經導入了所需的庫，並為指定語言為美國英語的語言工具定義了工具。然後我們定義了一個字符串變量，並在其中存儲了一些文本。然後，我們使用工具的糾正()功能自動糾正文本中的錯誤，並為用戶打印結果文本。

原創文章，作者：簡單一點，如若轉載，請註明出處：https://www.506064.com/zh-hk/n/128714.html