4 回答

TA貢獻1744條經驗 獲得超4個贊
帶有scrapy的CSS選擇器選項:
address = response.css("span.address-line1::text, span.address-line2::text, span[itemprop=addressLocality]::text, span[itemprop=addressRegion]::text, span[itemprop=postalCode]::text").extract() # should return list
if address:
address = ", ".

TA貢獻1909條經驗 獲得超7個贊
使用單行 XPath 的骯臟解決方案:
concat(//span[@class='address-line1']/text(),' ',//span[@class='address-line2']/text(),' ',//span[@itemprop='addressLocality']/text(),', ',//span[@itemprop='addressRegion']/text(),//span[@itemprop='postalCode']/text())
輸出 :
"5835 Post Rd. Suite 217 East Greenwich, RI02818"

TA貢獻1921條經驗 獲得超9個贊
這是面向未來的想法,因為 ids/classes 可以在此期間發生變化:
from re import sub
from bs4 import BeautifulSoup as bs
teststr = """<span class="office-address" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">
<span class="address-line1">5835 Post Rd.</span>
<span class="address-line2">Suite 217</span>
</span>
<span class="city-state-zip">
<span itemprop="addressLocality">East Greenwich</span>, <span itemprop="addressRegion">RI</span> <span itemprop="postalCode">02818</span>
</span>
</span>"""
r = bs(teststr,"lxml").getText().strip()
r = sub( r"\n", ", ", r)
r = sub( r"[, ]{2,}", ", ", r)
print ( r )
結果:
5835 Post Rd., Suite 217, East Greenwich, RI 02818
添加回答
舉報