首頁猿問使用 selenium 和...

使用 selenium 和 python 進行網絡抓取時刪除 標簽以正確對齊

Python

qq_花開花謝_0 2023-04-25 16:52:44

我想 在網絡抓取頁面時刪除 html 標簽，但替換似乎不起作用。我不確定是否有另一種方法或更好的方法使用 selenium 和 python 來做到這一點。先感謝您。from selenium import webdriverfrom selenium.webdriver.support.ui import Selectfrom selenium.webdriver.common.keys import Keysdriver = webdriver.Chrome("drivers/chromedriver")driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")state_drop = driver.find_element_by_id("state")state = Select(state_drop)state.select_by_visible_text("New Hampshire")driver.find_element_by_id("city").send_keys("Moultonborough")driver.find_element_by_id("name").send_keys("Moultonborough Academy")driver.find_element_by_class_name("forms_input_button").send_keys(Keys.RETURN)driver.find_element_by_id("hsSelectRadio_1").click()courses_subheading = driver.find_elements_by_tag_name("th.header")print(courses_subheading[0].text, " " ,courses_subheading[1].text, " ", courses_subheading[2].text, " ", courses_subheading[3].text, " ", courses_subheading[4].text我試過這個：for i in courses_subheading: courses_subheading.replace(" ", " ")但得到一個錯誤：AttributeError: 'list' object has no attribute 'replace'目前，它看起來像這樣：CourseWeight Title Notes MaxCredits OKThrough DisabilityCourse但我想要這樣：Course Weight Title Notes Max Credits OK Through Disability Course

查看完整描述

2 回答

肥皂起泡泡

TA貢獻1829條經驗獲得超6個贊

無需刪除， 您可以輕松避免 標簽。要打印表格標題，例如?Title、Notes等，您需要為引入WebDriverWait并且visibility_of_all_elements_located()您可以使用以下任一Locator Strategies：

使用css_selector：

driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")

Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "state")))).select_by_visible_text("New Hampshire")

driver.find_element_by_css_selector("input#city").send_keys("Moultonborough")

driver.find_element_by_css_selector("input#name").send_keys("Moultonborough Academy")

driver.find_element_by_css_selector("input[value='Search']").click()

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='hsCode']"))).click()

print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#approvedCourseTable_1 th.header")))])

使用xpath：

driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")

Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "state")))).select_by_visible_text("New Hampshire")

driver.find_element_by_xpath("http://input[@id='city']").send_keys("Moultonborough")

driver.find_element_by_xpath("http://input[@id='name']").send_keys("Moultonborough Academy")

driver.find_element_by_xpath("http://input[@value='Search']").click()

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "http://input[@name='hsCode']"))).click()

print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "http://table[@id='approvedCourseTable_1']//th[@class='header']")))])

控制臺輸出：

['Course\nWeight', 'Title', 'Notes', 'Max\nCredits', 'OK\nThrough', 'Disability\nCourse']

注意：您必須添加以下導入：

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.common.by import By

from selenium.webdriver.support import expected_conditions as EC

反對回復 2023-04-25

拉丁的傳說

TA貢獻1789條經驗獲得超8個贊

要完成，如果你真的想刪除標簽br，你可以使用（我已經修復了你的 XPath 表達式）：

import re

courses_subheading = driver.find_elements_by_xpath("(//tr[th[@class='header']])[1]/th")

headers = [re.sub('\s+',' ',el.text) for el in courses_subheading]

print(headers)

輸出：

['Course Weight', 'Title', 'Notes', 'Max Credits', 'OK Through', 'Disability Course']

反對回復 2023-04-25

2 回答
0 關注
192 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

使用 selenium 和 python 進行網絡抓取時刪除 <br> 標簽以正確對齊

使用 selenium 和 python 進行網絡抓取時刪除 <br> 標簽以正確對齊

2 回答

添加回答

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

使用 selenium 和 python 進行網絡抓取時刪除 <br> 標簽以正確對齊

使用 selenium 和 python 進行網絡抓取時刪除 <br> 標簽以正確對齊

2 回答

添加回答