首頁猿問在js中用標記分割字符串

在js中用標記分割字符串

JavaScript

達令說 2023-08-05 19:30:45

我有一個帶有一些標記的字符串：'This is {startMarker} the string {endMarker} for {startMarker} example. {endMarker}'我需要將其解析為一個數組，例如：[ {marker: false, value: 'This is'}, {marker: true, value: 'the string'}, {marker: false, value: 'for'}, {marker: true, value: 'example.'}]因此保持句子順序但添加標記信息。知道我怎樣才能實現這一目標嗎？謝謝

查看完整描述

3 回答

互換的青春

TA貢獻1797條經驗獲得超6個贊

這應該可以解決問題

const my_str = 'This is {startMarker} the string {endMarker} for {startMarker} example.{endMarker}';

const my_arr = my_str.split('{endMarker}').reduce((acc, s) =>

s.split('{startMarker}').map((a,i) =>

a && acc.push({

marker: i ? true : false,

value: a.trim()}))

&& acc,[]);

console.log(my_arr)

反對回復 2023-08-05

繁花不似錦

TA貢獻1851條經驗獲得超4個贊

只是因為您是新貢獻者......

interface MarkedString {

marker: boolean

value: string

}

function markString(text: string): MarkedString[] {

let match: RegExpExecArray | null

const firstMatch = text.slice(0, text.indexOf('{') - 1)

const array: MarkedString[] = firstMatch.length > 0 ? [

{ marker: false, value: firstMatch }

] : []

while ((match = /\{(.+?)\}/g.exec(text)) !== null) {

if (!match) break

const marker = match[0].slice(1, match[0].slice(1).indexOf('}') + 1)

const markerEnd = match.index + match[0].length

const value = text.slice(markerEnd ,markerEnd + text.slice(markerEnd).indexOf('{')).trim()

if (value === '') break

if (marker === 'startMarker') {

array.push({ marker: true, value })

} else if (marker === 'endMarker') {

array.push({ marker: false, value })

}

text = text.slice(markerEnd + value.length + 1)

}

return array

}

反對回復 2023-08-05

阿波羅的戰車

TA貢獻1862條經驗獲得超6個贊

const escapeRegex = s => s.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&");

const extract = (start, end, str) => Array.from(

? str.matchAll(`(.+?)(${escapeRegex(start)}|${escapeRegex(end)}|$)`),

? ([, text, mark]) => ({

? ? marker: mark === end,

? ? value: text.trim()

? })

);

console.log(extract(

? "{startMarker}",

? "{endMarker}",

? "This is {startMarker} the string {endMarker} for {startMarker} example. {endMarker}"

));

解釋

正則表達式

我們發送以兩個標記之一結尾的文本段。我們可以提取包括標記在內的每個部分。

This is {startMarker} the string {endMarker}?

^______^^___________^^__________^^_________^

| text? ? ? ?mark? ?||? ?text? ? ? ? mark? |

^___________________^^_____________________^

? ? ? ?section? ? ? ? ? ? ? ?section

文本將成為value結果對象的文本，可以檢查標記段是否是{endMarker}為了生成true或false用于結果對象。

因此，如果我們能夠正確提取段和節，結果是：

result = {

? marker: marker === "{endMarker}",

? value: text.trim()

}

可以為我們執行此操作的正則表達式是：

/(.+?)(\{startMarker\}|\{endMarker\}|$)/g

請參閱 Regex101

(.+?)將匹配并捕獲文本段
(\{startMarker\}|\{endMarker\}|$)將匹配并提取文本段末尾的標記。它還匹配行尾，以防最后一個標記后有更多文本，就像您有for {startMarker} example. {endMarker} more text here

一代

更一般地說，我們可以采用任何字符串作為開始和結束標記，然后對它們進行轉義以確保它們字面匹配，即使其中存在像.或之類的元字符*。

const?escapeRegex?=?s?=>?s.replace(/[.*+\-?^${}()|[\]\\]/g,?"\\$&");

這樣我們就可以將startand作為字符串并使用構造函數生成end正則表達式：RegExp

const escapeRegex = s => s.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&");

const start = "{startMarker}";

const end = "{endMarker}";

const regex = new RegExp(`(.+?)(${escapeRegex(start)}|${escapeRegex(end)}|$)`, "g");

console.log(regex.toString());

匹配

該String#matchAll方法將生成一個迭代器，其中包含應用于字符串的正則表達式的所有匹配項。

const escapeRegex = s => s.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&");

const extract = (start, end, str) => {

? const sequence = str.matchAll(`(.+?)(${escapeRegex(start)}|${escapeRegex(end)}|$)`);

? for(const result of sequence) {

? ? console.log(result);

? }

};

extract(

? "{startMarker}",

? "{endMarker}",

? "This is {startMarker} the string {endMarker} for {startMarker} example. {endMarker}"

);

該.matchAll()方法接受字符串作為參數，并使用RegExp構造函數自動將其轉換為正則表達式，并進一步自動添加全局標志。然而，TypeScript 目前似乎不允許這樣做 - 該方法的類型只允許一個RegExp對象，因此僅對于 TypeScript（直到類型修復）你必須調用

str.matchAll(new?RegExp(`(.+?)(${escapeRegex(start)}|${escapeRegex(end)}|$)`,?"g"))

轉換為數組

將可迭代對象轉換為數組的最簡單方法是使用Array.from.?它采用的第一個參數可以是可迭代的，并且會自動轉換為數組。第二個參數是在將每個元素放入數組之前應用的映射函數。

由于我們收到正則表達式匹配結果，我們可以使用此函數將它們直接轉換為所需的項目：

result => {

? const match = result[1];

? const marker = result[2];

? return {

? ? marker: marker === end,

? ? value: match.trim()

? };

}

這給了我們更詳細的版本：

const escapeRegex = s => s.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&");

const extract = (start, end, str) => {

? return Array.from(

? ? str.matchAll(`(.+?)(${escapeRegex(start)}|${escapeRegex(end)}|$)`),

? ? result => {

? ? ? const match = result[1];

? ? ? const marker = result[2];

? ? ? return {

? ? ? ? marker: marker === end,

? ? ? ? value: match.trim()

? ? ? };

? ? }

? );

}

console.log(extract(

? "{startMarker}",

? "{endMarker}",

? "This is {startMarker} the string {endMarker} for {startMarker} example. {endMarker}"

));

游樂場鏈接

然而，我們可以通過解構來減少所需的代碼，它就變成了。

([, text, mark]) => ({

? marker: mark === end,

? value: text.trim()

})

這最終為我們提供了頂部的初始代碼（再次包含在內，以避免向上滾動）：

const escapeRegex = s => s.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&");

const extract = (start, end, str) => Array.from(

? str.matchAll(`(.+?)(${escapeRegex(start)}|${escapeRegex(end)}|$)`),

? ([, text, mark]) => ({

? ? marker: mark === end,

? ? value: text.trim()

? })

);

console.log(extract(

? "{startMarker}",

? "{endMarker}",

? "This is {startMarker} the string {endMarker} for {startMarker} example. {endMarker}"

));

關于 ES2020 兼容性的最后說明

String#matchAll來自 ES2020 規范。如果您當前沒有瞄準該目標并且不想這樣做，您可以使用工作方式非常相似的生成器函數輕松推出自己的版本：

function* matchAll(pattern, text) {

? const regex = typeof pattern === "string"

? ? ? new RegExp(pattern, "g")? //convert to global regex

? ? : new RegExp(pattern);? ? ? //or make a copy of the regex object to avoid mutating the input

? ??

? let result;

? while(result = regex.exec(text)) //apply `regex.exec` repeatedly

? ? yield result;? ? ? ? ? ? ? ? ? //and produce each result from the iterator

}

這里唯一值得注意的遺漏是，String#matchAll如果傳入非全局正則表達式對象，則會拋出錯誤。它仍然可以實現，但我使用了一個稍短的實現來進行說明。

使用自定義，matchAll您可以定位 ES2020 之前的版本，而無需填充

const escapeRegex = s => s.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&");

function* matchAll(pattern, text) {

? const regex = typeof pattern === "string"

? ? ? new RegExp(pattern, "g")

? ? : new RegExp(pattern);

? ??

? let result;

? while(result = regex.exec(text))

? ? yield result;

}

const extract = (start, end, str) => Array.from(

? matchAll(`(.+?)(${escapeRegex(start)}|${escapeRegex(end)}|$)`, str),

? ([, text, mark]) => ({

? ? marker: mark === end,

? ? value: text.trim()

? })

);

console.log(extract(

? "{startMarker}",

? "{endMarker}",

? "This is {startMarker} the string {endMarker} for {startMarker} example. {endMarker}"

));

反對回復 2023-08-05

3 回答
0 關注
185 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

在js中用標記分割字符串

在js中用標記分割字符串

3 回答

一代

匹配

轉換為數組

添加回答