提问者:小点点

RE2不匹配非ASCII字符


我无法使用十六进制/八进制表示使RE2匹配字节(不是ascii)。

下面的代码片段解释了这个问题:

char *test = "abc""\xe2""xyz";
std::string str(test); // "abc\342xyz" . \342 is octal for \xe2
// str.size() == 7

re2::StringPiece string_piece(str); // size is 7, as expected

std::string out;

// extracts the letter 'z' into 'out;. \172 is the octal for z
bool match = re2::RE2::PartialMatch(string_piece, ("(\172)"), &out); // match = true, out = 'z'.

// should extract the character \342...but it doesn't.
match = re2::RE2::PartialMatch(string_piece, ("(\342)"), &out); // match = false

共1个答案

匿名用户

将编码设置为Latin-1-re2默认值为UTF-8

match = re2::RE2::PartialMatch(string_piece,
          re2::RE2("(\342)", re2::RE2::Latin1),
          &out);