已解決430363個問題，去搜搜看，總會有你想問的

如何使用正則表達式按單詞分隔文本？

首頁猿問如何使用正則表達式按單詞分隔文本？

如何使用正則表達式按單詞分隔文本？

Cats萌萌 2022-12-31 13:58:13

OpenFileDialog openFileDialog = new OpenFileDialog(); if (openFileDialog.ShowDialog() == true) { //your code }我有 .srt 文件，它有一些文本結構。例子：100:00:01,514 --> 00:00:04,185I'm investigatingSaturday night's shootings.200:00:04,219 --> 00:00:05,754What's to investigate?Innocent people我希望得到像“我是”、“正在調查”、“星期六”這樣的分裂詞。我創造了模式@"[a-zA-Z']"這將我的文字分開幾乎是正確的。但是 .srt 文件也包含一些無用的標簽結構，就像這樣<i>我想刪除。如何構建我的模式，將文本按單詞分隔并刪除“<”和“>”之間的所有文本（包括大括號）？

查看完整描述

2 回答

HUX布斯

TA貢獻1876條經驗獲得超6個贊

好吧，很難以一種方式在正則表達式中做到這一點（至少對我來說是這樣），但你可以分兩步做到這一點。

首先，您從字符串中刪除 html 字符，然后提取之后的單詞。

看看下面。

var text = "00:00:01,514 --> 00:00:04,185 I'm investigating Saturday night's shootings.<i>"

// remove all html char

var noHtml = Regex.Replace(text, @"(<[^>]*>).*", "");

// and now you could get only the words by using @"[a-zA-Z']" on noHtml. You should get "I'm investigating Saturday night's shootings."

反對回復 2022-12-31

LEATH

TA貢獻1936條經驗獲得超7個贊

您可以否定環顧四周以斷言不存在由以下not <s 結束的序列，并且不存在后跟 not s 序列的 a 序列。><>

using System;

using System.Text.RegularExpressions;

public class Program

{

public static void Main()

{

string input = @"

Hello world, <rubbish>it's a wonderful day.

<trash>

foreach (Match match in Regex.Matches(input, @"(?<!<[^>]*)[a-zA-Z']+(?![^<]*>)"))

{

Console.WriteLine(match.Value);

}

輸出：

Hello

world

it's

wonderful

day

.NET 小提琴

反對回復 2022-12-31

2 回答
0 關注
112 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何使用正則表達式按單詞分隔文本？

如何使用正則表達式按單詞分隔文本？

2 回答

添加回答

如何使用正則表達式按單詞分隔文本？

如何使用正則表達式按單詞分隔文本？