首頁猿問從字符串中提取信息

從字符串中提取信息

一只斗牛犬 2023-07-17 13:48:54

https://website-name.some-domain.some-sub-domain.com/resourceId當給定（類型 1）或（類型 2）形式的字符串時https://website-name.some-sub-domain.com/resourceId?randomContent，我只需要提取兩個子字符串。我需要website-name一個字符串和resourceId另一字符串。我使用以下代碼提取了網站名稱：s := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"w := regexp.MustCompile("https://(.*?)\\.")website := w.FindStringSubmatch(s)fmt.Println(website[1])我有其他正則表達式來獲取resourceIds := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"r := regexp.MustCompile("com/(.*?)\\?")resource := r.FindStringSubmatch(s)fmt.Println(resource[1])這適用于任何以?或結尾的字符串?randomContent。但我的字符串沒有尾隨?，我無法處理這種情況（類型 1）。我試圖"(com/(.*?)\\?)|(com/(.*?).*)"得到resourceId但沒有用。我無法找到一種優雅的方法來提取這兩個子字符串。注意：therandomContent是任意長的子串，the 也是如此resourceId。但里面resourceId不會有。?遇到a ?，就可以說resourceId結束了。另外，website-name可以不同，但模式是相同的 - 任意子域和 a.com將出現在字符串中。這是我嘗試過的： https: //play.golang.org/p/MGQIT5XRuuh

查看完整描述

3 回答

慕容3067478

TA貢獻1773條經驗獲得超3個贊

您顯示的示例字符串是普通的 HTTPS URL，因此您可以使用該net/url包來解析它們。是website-name的第一部分parsedUrl.Hostname()，resourceId是parsedUrl.Path較少的前導部分/。

u, err := url.Parse(s)

if err != nil {

panic(err)

}

host := u.Hostname()

first := strings.SplitN(host, ".", 2)[0]

fmt.Printf("website-name: %s\n", first)

fmt.Printf("resourceId: %s\n", u.Path[1:])

https://play.golang.org/p/fnF2RTBuFxR有一個完整的示例，包括問題中的兩個 URL 字符串。即使 URL 的主機名部分不以結尾.com，或者路徑部分包含該字符串，或者存在端口號或哈希片段或其他變體，此方法也有效。

反對回復 2023-07-17

月關寶盒

TA貢獻1772條經驗獲得超5個贊

我猜這個表達式可能有效：

(?i)https?:\/\/(www\.)?([^.]*)[^\/]*\/([^?\r\n]*)

測試

package main

import (

"regexp"

"fmt"

)

func main() {

var re = regexp.MustCompile(`(?m)(?i)https?:\/\/(www\.)?([^.]*)[^\/]*\/([^?\r\n]*)`)

var str = `https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent

https://website-name.some-domain.some-sub-domain.com/resourceId`

for i, match := range re.FindAllString(str, -1) {

fmt.Println(match, "found at index", i)

}

演示

反對回復 2023-07-17

揚帆大魚

TA貢獻1799條經驗獲得超9個贊

也許像這樣簡單的事情會有幫助。

您可以使用以下正則表達式提取網站名稱并返回第一組：

//([^/.]+)

// start with //

([^/.]+) match anything until first dot

您可以使用以下正則表達式提取resourceId并返回第一組：

.com/([^/?]+)

.com/ start with .com

([^/?]+) match everything until the first ? (if exists, else matches till end)

鏈接到 Go Playground

反對回復 2023-07-17

3 回答
0 關注
162 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

從字符串中提取信息

從字符串中提取信息

3 回答

演示

添加回答