首頁手記 soup findall class

soup findall class

標簽：

雜七雜八

利用Soup库：快速提取HTML文档中的独立段落

Soup是一个Python库，用于处理HTML和XML文档。在Soup中，findall方法是用于查找所有匹配指定模式的标签。class_参数用于过滤结果，只返回具有指定类名的标签。

1. Soup的基本使用方法

首先，需要导入bs4库中的BeautifulSoup模块。然后，使用BeautifulSoup()函数，将HTML文档作为输入参数，并指定解析器类型，通常使用'html.parser'。接下来，就可以使用Soup提供的各种方法对HTML文档进行操作了。

2. 使用Soup的findall方法

findall方法用于查找所有匹配指定模式的标签。它的语法如下：

soup.findall(tag, attrs=None, classes=None, filters=None)

参数说明：

tag：要查找的标签名称。
attrs：可选的属性参数，用于筛选具有特定属性值的标签。
classes：可选的类名参数，用于筛选具有特定类名的标签。
filters：可选的筛选条件参数，用于筛选满足特定条件的标签。

以一个简单的HTML文档为例：

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document Title</title>
</head>
<body>
    <h1>Heading 1</h1>
    <p class="text1">This is a paragraph with class text1.</p>
    <p class="text2">This is another paragraph with class text2.</p>
    <p class="text3">This is a third paragraph with class text3.</p>
</body>
</html>

我们可以使用findall方法来找到所有具有class属性为text的段落标签：

from bs4 import BeautifulSoup

html = '''
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document Title</title>
</head>
<body>
    <h1>Heading 1</h1>
    <p class="text1">This is a paragraph with class text1.</p>
    <p class="text2">This is another paragraph with class text2.</p>
    <p class="text3">This is a third paragraph with class text3.</p>
</body>
</html>
'''

soup = BeautifulSoup(html, 'html.parser')

paragraphs = soup.findall('p', class_='text')

for p in paragraphs:
    print(p.text)

输出结果：

This is a paragraph with class text1.
This is another paragraph with class text2.
This is a third paragraph with class text3.

3. 使用Soup的findall方法的进阶用法

在上面的例子中，我们使用findall方法找到了所有具有class属性为text的段落标签。但是，还有更多的用法可以探索。

如果我们要查找所有具有class属性值开头的标签，可以使用startswith参数：

paragraphs = soup.findall('p', class_='text', startswith='text')

输出结果：

This is a paragraph with class text1.
This is another paragraph with class text2.

如果我们要查找所有具有任意多个class属性的标签，可以使用any参数：

paragraphs = soup.findall('p', class_='text', any(['text1', 'text2']))

输出结果：

This is a paragraph with class text1.
This is another paragraph with class text2.
This is a third paragraph with class text3.

如果我们要查找所有具有特定类名的标签，但不考虑属性值是否包含空格，可以使用not_in参数：

paragraphs = soup.findall('p', class_='text', not_in=['text1', 'text3'])

输出结果：

點擊查看更多內容

為 TA 點贊

若覺得本文不錯，就分享一下吧！

評論

評論

共同學習，寫下你的評論

評論加載中...

展開查看更多評論

作者其他優質文章

正在加載中

呼如林

手記
篇

粉絲

103

獲贊與收藏

365

關注作者，訂閱最新文章

閱讀免費教程

后端通用面試教程

41個小節 32074 358

網絡編程入門教程

20個小節 13206 249

Pandas 入門教程

25個小節 19595 369

推薦

評論

收藏

共同學習，寫下你的評論



感謝您的支持，我會繼續努力的～

掃碼打賞，你說多少就多少

贊賞金額會直接到老師賬戶

支付方式

打開微信掃一掃，即可進行掃碼打賞哦

今天注冊有機會得

100積分直接送

付費專欄免費學

大額優惠券免費領

立即參與放棄機會

點擊
抽獎

慕課手記新用戶專享福利

恭喜你，你的運氣太好了，居然抽中了 100個積分！

恭喜你，抽中了價值元的專欄！

太棒了，直接落到你賬戶里！

積分商城里的羅技鼠標、機械鍵盤、
Kindle 閱讀器、小米平衡車
Apple iPad （10.2英寸）、大額優惠券
在等著你去兌換了噢

作者：

免費贈送

兌換碼：1111222211 復制

優惠券可用于購買實戰課、體系課
無門檻使用

先去看看，有什么好東西馬上兌換我愛學習，選課去


亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

soup findall class

閱讀免費教程