用C++實現高效字元串操作

在計算機編程中，字元串操作是一種使用頻率極高的操作。因此，如何在C++中實現高效的字元串操作是每個程序員必須掌握的技能之一。本文將從多個方面詳細闡述如何用C++實現高效的字元串操作。

一、字元串的概念及常見操作

字元串是指多個字元組成的序列，在C++中常用char數組或std::string來表示字元串。常見的字元串操作包括：

1、字元串拼接。將兩個或多個字元串連接起來，可以使用+運算符或std::string的append方法。

std::string str1 = "Hello";
std::string str2 = "World";
// 使用+運算符
std::string str3 = str1 + str2;
// 使用append方法
str1.append(str2);

2、字元串查找。在一個字元串中查找是否包含另一個字元串，可以使用std::string的find方法。

std::string str = "Hello, world!";
if (str.find("world") != std::string::npos) {
    std::cout << "Found" << std::endl;
} else {
    std::cout << "Not found" << std::endl;
}

3、字元串分割。將一個字元串按照某個分隔符分割成多個子字元串，可以使用std::string的substr和find方法。

std::string str = "Hello,world,!";
std::vector<std::string> substrs;
std::string::size_type pos1 = 0, pos2 = 0;
while ((pos2 = str.find(",", pos1)) != std::string::npos) {
    substrs.push_back(str.substr(pos1, pos2 - pos1));
    pos1 = pos2 + 1;
}
substrs.push_back(str.substr(pos1));

二、字元串操作的效率問題

對於常規的字元串操作，直接使用std::string的方法或者使用char數組實現都可以滿足要求。但對於需要高效處理字元串的場景，需要考慮字元串操作的效率。

1、字元串拼接。直接使用+運算符或std::string的append方法會導致頻繁的動態內存分配和釋放，影響性能。可以使用std::stringstream或者char數組來優化。例如使用std::stringstream：

std::string str1 = "Hello";
std::string str2 = "World";
std::stringstream ss;
ss << str1 << str2;
std::string str3 = ss.str();

使用char數組：

std::string str1 = "Hello";
std::string str2 = "World";
char buf[100];
std::snprintf(buf, sizeof(buf), "%s%s", str1.c_str(), str2.c_str());
std::string str3 = buf;

2、字元串查找。std::string的find方法在查找過程中會使用暴力匹配演算法，時間複雜度為O(n*m)，n和m分別為兩個字元串的長度。可以使用KMP演算法或Boyer-Moore演算法來優化。例如使用KMP演算法：

std::string str = "Hello, world!";
std::string pattern = "world";
std::vector<int> next(pattern.size(), -1);
for (int i = 1, j = -1; i < pattern.size(); ++i) {
    while (j >= 0 && pattern[j + 1] != pattern[i]) {
        j = next[j];
    }
    if (pattern[j + 1] == pattern[i]) {
        ++j;
    }
    next[i] = j;
}
for (int i = 0, j = -1; i < str.size(); ++i) {
    while (j >= 0 && pattern[j + 1] != str[i]) {
        j = next[j];
    }
    if (pattern[j + 1] == str[i]) {
        ++j;
    }
    if (j == pattern.size() - 1) {
        std::cout << "Found at " << i - j << std::endl;
        j = next[j];
    }
}

3、字元串分割。使用std::string的substr方法和find方法會導致頻繁的字元串拷貝和動態內存分配和釋放，影響性能。可以使用標準庫的std::regex來實現正則表達式匹配，或者手寫字元串分割演算法。例如使用手寫分割演算法：

std::string str = "Hello,world,!";
std::vector<std::string> substrs;
std::string::size_type pos1 = 0, pos2 = 0;
while ((pos2 = str.find(",", pos1)) != std::string::npos) {
    substrs.push_back(str.substr(pos1, pos2 - pos1));
    pos1 = pos2 + 1;
}
substrs.push_back(str.substr(pos1));

三、常用字元串庫的性能比較

不同的字元串庫在實現上有所不同，因此在性能上也有所差異。我們在一台64位Ubuntu系統上，使用g++編譯器比較了STL庫、Boost庫和folly庫的字元串拼接、查找和分割性能。代碼如下：

#include <iostream>

#include <chrono>

#include <vector>

#include <string>

#include <sstream>

#include <boost/algorithm/string.hpp>

#include <boost/lexical_cast.hpp>

#include <folly/String.h>
int main() {
 // 字元串拼接
 std::string str1 = "Hello";
 std::string str2 = "World";
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 std::stringstream ss;
 ss << str1 << str2;
 std::string result = ss.str();
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "std::stringstream: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
 }
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 char buf[100];
 std::snprintf(buf, sizeof(buf), "%s%s", str1.c_str(), str2.c_str());
 std::string result = buf;
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "std::snprintf: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
 }
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 std::string result = str1 + str2;
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "std::string: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
 }
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 std::string result = str1.append(str2);
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "std::string::append: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
 }
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 std::string result = folly::to<std::string>(str1, str2);
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "folly::to: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
 }
 // 字元串查找
 std::string str = "Hello, world!";
 std::string pattern = "world";
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 std::size_t pos = str.find(pattern);
 if (pos != std::string::npos) {
 std::string result = str.substr(pos);
 }
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "std::string::find and substr: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
 }
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 std::vector<std::string> substrs;
 boost::split(substrs, str, boost::is_any_of(","));
 for (const auto& substr : substrs) {
 if (substr == pattern) {
 std::string result = substr;
 }
 }
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "boost::split and for loop: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
 }
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 std::vector<std::string> substrs;
 folly::split(",", str, substrs);
 for (const auto& substr : substrs) {
 if (substr == pattern) {
 std::string result = substr;
 }
 }
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "folly::split and for loop: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
 }
 // 字元串轉換
 {
 std::string str1 = "12345";
 std::string str2 = "67890";
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 std::string result = str1 + str2;
 int value = std::stoi(result);
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "std::stoi: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
 }
 {
 std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
 for (int i = 0; i < 1000000; ++i) {
 std::string result = str1 + str2;
 int value = boost::lexical_cast<int>(result);
 }
 std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
 std::cout << "boost::lexical_cast:
原創文章，作者：小藍，如若轉載，請註明出處：https://www.506064.com/zh-tw/n/185964.html

用C++實現高效字元串操作

一、字元串的概念及常見操作

二、字元串操作的效率問題

三、常用字元串庫的性能比較

相關推薦

發表回復