首頁猿問 RegEx拆分camelCase或...

RegEx拆分camelCase或TitleCase（高級）

Java 正則表達式

繁星淼淼 2019-10-08 15:06:15

我找到了一個出色的RegEx來提取camelCase或TitleCase表達的一部分。 (?<!^)(?=[A-Z])它按預期工作：值->值camelValue-> camel / ValueTitleValue->標題/值例如，使用Java：String s = "loremIpsum";words = s.split("(?<!^)(?=[A-Z])");//words equals words = new String[]{"lorem","Ipsum"}我的問題是在某些情況下它不起作用：情況1：VALUE-> V / A / L / U / E情況2：eclipseRCPExt-> eclipse / R / C / P / Ext在我看來，結果應該是：情況1：VALUE情況2：日食/ RCP /外部換句話說，給定n個大寫字符：如果n個字符后跟小寫字符，則組應為：（n-1個字符）/（第n個字符+小寫字符）如果n個字符位于末尾，則該組應為：（n個字符）。關于如何改善此正則表達式的任何想法嗎？

查看完整描述

3 回答

拉風的咖菲貓

TA貢獻1995條經驗獲得超2個贊

以下正則表達式適用于所有上述示例：

public static void main(String[] args)

{

for (String w : "camelValue".split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])")) {

System.out.println(w);

}

它通過強制否定的向后看不僅在字符串的開頭忽略匹配項，而且還忽略在大寫字母后跟另一個大寫字母的匹配項。這樣可以處理“ VALUE”之類的情況。

正則表達式的第一部分本身由于無法在“ RPC”和“ Ext”之間分割而在“ eclipseRCPExt”上失敗。這是第二個條款的目的：(?<!^)(?=[A-Z][a-z]。此子句允許在每個大寫字母前跟一個小寫字母前進行拆分，但字符串的開頭除外。

反對回復 2019-10-08

狐的傳說

TA貢獻1804條經驗獲得超3個贊

看來您正在使此過程變得比所需的更為復雜。對于camelCase，拆分位置僅是大寫字母緊跟在小寫字母之后的任何位置：

(?<=[a-z])(?=[A-Z])

這是此正則表達式如何拆分示例數據的方法：

value -> value
camelValue -> camel / Value
TitleValue -> Title / Value
VALUE -> VALUE
eclipseRCPExt -> eclipse / RCPExt

與所需輸出的唯一區別是與eclipseRCPExt，我認為這是在此處正確分割的。

附錄-改進版本

注意：這個答案最近得到了好評，我意識到有更好的方法...

通過在上述正則表達式中添加第二種替代方法，可以正確拆分所有OP的測試用例。

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

這是改進的正則表達式如何拆分示例數據的方法：

value -> value
camelValue -> camel / Value
TitleValue -> Title / Value
VALUE -> VALUE
eclipseRCPExt -> eclipse / RCP / Ext

反對回復 2019-10-08

斯蒂芬大帝

TA貢獻1827條經驗獲得超8個贊

我無法獲得aix的解決方案（也不能在RegExr上運行），所以我想出了自己的經過測試的方法，似乎可以完全滿足您的要求：

((^[a-z]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($))))

這是一個使用它的示例：

; Regex Breakdown: This will match against each word in Camel and Pascal case strings, while properly handling acrynoms.

; (^[a-z]+) Match against any lower-case letters at the start of the string.

; ([A-Z]{1}[a-z]+) Match against Title case words (one upper case followed by lower case letters).

; ([A-Z]+(?=([A-Z][a-z])|($))) Match against multiple consecutive upper-case letters, leaving the last upper case letter out the match if it is followed by lower case letters, and including it if it's followed by the end of the string.

newString := RegExReplace(oldCamelOrPascalString, "((^[a-z]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($))))", "$1 ")

newString := Trim(newString)

在這里，我用空格分隔每個單詞，因此，下面是一些如何轉換字符串的示例：

ThisIsATitleCASEString =>這是一個標題案例字符串

andThisOneIsCamelCASE =>而這一個是Camel CASE

上面的解決方案可以滿足原始帖子的要求，但是我還需要一個正則表達式來查找包含數字的駱駝和帕斯卡字符串，因此我也想出了一種包含數字的變體：

((^[a-z]+)|([0-9]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($)|([0-9]))))

以及使用它的示例：

; Regex Breakdown: This will match against each word in Camel and Pascal case strings, while properly handling acrynoms and including numbers.

; (^[a-z]+) Match against any lower-case letters at the start of the command.

; ([0-9]+) Match against one or more consecutive numbers (anywhere in the string, including at the start).

; ([A-Z]{1}[a-z]+) Match against Title case words (one upper case followed by lower case letters).

; ([A-Z]+(?=([A-Z][a-z])|($)|([0-9]))) Match against multiple consecutive upper-case letters, leaving the last upper case letter out the match if it is followed by lower case letters, and including it if it's followed by the end of the string or a number.

newString := RegExReplace(oldCamelOrPascalString, "((^[a-z]+)|([0-9]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($)|([0-9]))))", "$1 ")

newString := Trim(newString)

以下是一些使用此正則表達式轉換數字字符串的示例：

myVariable123 =>我的變量123

my2Variables =>我的2個變量

3rdVariableIsHere =>第3rdVariable在這里

12345NumsAtTheStartIncludedToo => 12345 Nums在開始時也包含

反對回復 2019-10-08

3 回答
0 關注
757 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

RegEx拆分camelCase或TitleCase（高級）

RegEx拆分camelCase或TitleCase（高級）

3 回答

附錄-改進版本

添加回答