亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

使用 BeautifulSoup 從 Tom Holland 的 IMDB 頁面中提取角色角色

使用 BeautifulSoup 從 Tom Holland 的 IMDB 頁面中提取角色角色

慕絲7291255 2023-04-18 17:09:12
我從 Tom Holland 的 IMDB 頁面中提取了以下數據并將其定義為“movie_contents”:[<div class="filmo-row odd" id="actor-tt10872600"> <span class="year_column">  2021 </span> <b><a href="/title/tt10872600/">Untitled Spider-Man Sequel</a></b> (<a class="in_production" href="https://pro.imdb.com/title/tt10872600?rf=cons_nm_filmo">announced</a>) <br/> Peter Parker / Spider-Man </div>, <div class="filmo-row even" id="actor-tt1464335"> <span class="year_column">  2021 </span> <b><a href="/title/tt1464335/">Uncharted</a></b> (<a class="in_production" href="https://pro.imdb.com/title/tt1464335?rf=cons_nm_filmo">filming</a>) <br/> Nathan Drake </div>, <div class="filmo-row odd" id="actor-tt2076822"> <span class="year_column">  2021 </span> <b><a href="/title/tt2076822/">Chaos Walking</a></b> (<a class="in_production" href="https://pro.imdb.com/title/tt2076822?rf=cons_nm_filmo">post-production</a>) <br/> Todd Hewitt </div>, <div class="filmo-row even" id="actor-tt9130508"> <span class="year_column">  2020/I </span> <b><a href="/title/tt9130508/">Cherry</a></b> (<a class="in_production" href="https://pro.imdb.com/title/tt9130508?rf=cons_nm_filmo">post-production</a>) <br/> Nico Walker </div>, <div class="filmo-row odd" id="actor-tt7395114"> <span class="year_column">  2020 </span> <b><a href="/title/tt7395114/">The Devil All the Time</a></b> (<a class="in_production" href="https://pro.imdb.com/title/tt7395114?rf=cons_nm_filmo">completed</a>) <br/> Arvin Russell </div>, <div class="filmo-row even" id="actor-tt7146812"> <span class="year_column">  2020/I </span> <b><a href="/title/tt7146812/">Onward</a></b> <br/> Ian Lightfoot (voice) </div>, <div class="filmo-row odd" id="actor-tt6673612"> <span class="year_column">  2020 </span> <b><a href="/title/tt6673612/">Dolittle</a></b> <br/> Jip (voice) </div>我有問題如何提取所有角色名稱“Peter Parker / Spider-Man”、“Nathan Drake”、“Todd Hewitt”等?
查看完整描述

2 回答

?
白板的微信

TA貢獻1883條經驗 獲得超3個贊

該腳本將打印演員的所有角色:


import requests

from bs4 import BeautifulSoup



url = 'https://www.imdb.com/name/nm4043618/'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')


seen = set()

for row in soup.select('#filmo-head-actor + div .filmo-row > br'):

    role = row.find_next(text=True).strip()

    if not role in seen:

        seen.add(role)

        print(role)

印刷:


Peter Parker / Spider-Man

Nathan Drake

Todd Hewitt

Nico Walker

Arvin Russell

Ian Lightfoot (voice)

Jip (voice)

Walter (voice)

Samuel Insull

Brother Diarmuid - The Novice

Jack Fawcett

Bradley Baker

Thomas Nickerson

Tom

Gregory Cromwell

Former Billy (Encore) (uncredited)

Isaac

Eddie (voice)

Boy

Lucas

Sh? (UK version, voice)

編輯:要獲得 DataFrame 的角色,您可以這樣做:


import requests

import pandas as pd

from bs4 import BeautifulSoup



url = "https://www.imdb.com/name/nm4043618/"

soup = BeautifulSoup(requests.get(url).content, "html.parser")


seen = set()

all_data = []

for row in soup.select("#filmo-head-actor + div .filmo-row > br"):

    role = row.find_next(text=True).strip()

    if not role in seen:

        seen.add(role)

        all_data.append(role)


df = pd.DataFrame(all_data, columns=["Role"])

print(df)

印刷:


                                  Role

0            Peter Parker / Spider-Man

1                         Nathan Drake

2                          Todd Hewitt

3                          Nico Walker

4                        Arvin Russell

5                Ian Lightfoot (voice)

6                          Jip (voice)

7                       Walter (voice)

8                        Samuel Insull

9        Brother Diarmuid - The Novice

10                        Jack Fawcett

11                       Bradley Baker

12                    Thomas Nickerson

13                                 Tom

14                    Gregory Cromwell

15  Former Billy (Encore) (uncredited)

16                               Isaac

17                       Eddie (voice)

18                                 Boy

19                               Lucas

20             Sh? (UK version, voice)


查看完整回答
反對 回復 2023-04-18
?
HUX布斯

TA貢獻1876條經驗 獲得超6個贊

嘗試:


from bs4 import BeautifulSoup


html = '''<html>

 <div class="filmo-row odd" id="actor-tt10872600">

 <span class="year_column">

  2021

 </span>

 <b><a href="/title/tt10872600/">Untitled Spider-Man Sequel</a></b>

 (<a class="in_production" href="https://pro.imdb.com/title/tt10872600?rf=cons_nm_filmo">announced</a>)

 <br/>

 Peter Parker / Spider-Man

 </div>, <div class="filmo-row even" id="actor-tt1464335">

 <span class="year_column">

  2021

 </span>

 <b><a href="/title/tt1464335/">Uncharted</a></b>

 (<a class="in_production" href="https://pro.imdb.com/title/tt1464335?rf=cons_nm_filmo">filming</a>)

 <br/>

 Nathan Drake

 </div>, <div class="filmo-row odd" id="actor-tt2076822">

 <span class="year_column">

  2021

 </span>

 <b><a href="/title/tt2076822/">Chaos Walking</a></b>

 (<a class="in_production" href="https://pro.imdb.com/title/tt2076822?rf=cons_nm_filmo">post-production</a>)

 <br/>

 Todd Hewitt

 </div>, <div class="filmo-row even" id="actor-tt9130508">

 <span class="year_column">

  2020/I

 </span>

 <b><a href="/title/tt9130508/">Cherry</a></b>

 (<a class="in_production" href="https://pro.imdb.com/title/tt9130508?rf=cons_nm_filmo">post-production</a>)

 <br/>

 Nico Walker

 </div>, <div class="filmo-row odd" id="actor-tt7395114">

 <span class="year_column">

  2020

 </span>

 <b><a href="/title/tt7395114/">The Devil All the Time</a></b>

 (<a class="in_production" href="https://pro.imdb.com/title/tt7395114?rf=cons_nm_filmo">completed</a>)

 <br/>

 Arvin Russell

 </div>, <div class="filmo-row even" id="actor-tt7146812">

 <span class="year_column">

  2020/I

 </span>

 <b><a href="/title/tt7146812/">Onward</a></b>

 <br/>

 Ian Lightfoot (voice)

 </div>, <div class="filmo-row odd" id="actor-tt6673612">

 <span class="year_column">

  2020

 </span>

 <b><a href="/title/tt6673612/">Dolittle</a></b>

 <br/>

 Jip (voice)

 </div>

 '''

soup = BeautifulSoup(html, 'html.parser')



divs = soup.select('div.filmo-row.odd')

for div in divs:

    text = div.find_all(text=True, recursive=False)

    print(*[t.strip() for t in text if len(t) > 3])

印刷:


Peter Parker / Spider-Man

Todd Hewitt

Arvin Russell

Jip (voice)


查看完整回答
反對 回復 2023-04-18
  • 2 回答
  • 0 關注
  • 179 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號