已解決430363個問題，去搜搜看，總會有你想問的

帶有 標簽的網頁抓取攔截器

首頁猿問帶有 ...

帶有 標簽的網頁抓取攔截器

Python

SMILET 2023-05-09 09:53:42

背景：我最終試圖通過網絡抓取食譜網站并收集關鍵信息公司。食譜 - 名稱、配料、準備說明、烹飪時間、準備時間。我把這個項目分成小塊。到目前為止，我有代碼可以從食譜的網頁上抓取配料。我需要幫助的地方：我正在嘗試改進我編寫的一些代碼（它目前會抓取食譜成分），以便它也抓取食譜步驟（或網站稱之為“方法”）代碼輸入（1） -對于刮成分（這很好用?。篺rom scraper_api import ScraperAPIClientfrom splinter import Browserfrom webdriver_manager.chrome import ChromeDriverManagerexecutable_path = {'executable_path': ChromeDriverManager().install()}browser = Browser('chrome', **executable_path)resp = requests.get("https://www.simplyrecipes.com/recipes/cooking_for_two_strawberry_almond_oat_smoothie/")soup = BeautifulSoup(resp.text, "html.parser")div_ = soup.find("div", attrs={"class": "recipe-callout"})recipes = {"_".join(div_.find("h2").text.split()): [x.text for x in div_.findAll("li", attrs={"class": "ingredient"})]}代碼輸出 (1){'Strawberry_Almond_Oat_Smoothie_Recipe': ['1/2 cup uncooked old-fashioned rolled oats', '2 cups frozen strawberries', '1 cup plain yogurt (regular or Greek, any fat percentage)', '1 cup unsweetened vanilla almond milk (or milk of your choice)', '1/2 medium banana, fresh or frozen, sliced', '1/4 teaspoon pure almond extract', '1-2 teaspoons honey (optional)']}我的研究：查看了同一食譜網站的 HTML 代碼后，我確定了我需要關注的 HTML -看起來我需要定位：<div>與id="sr-recipe-callout"和class="recipe-callout"。具有元素的標簽 - 令人討厭的是，有些標簽沒有，而那些標簽不包含配方方法并且沒有用。我需要幫助的地方：我不知道如何改進這段代碼，特別是我如何確定我只想提取具有元素的代碼。我知道這是很多信息，但希望它是有意義的，并且有人可以指導我改進/回收我當前的代碼，以適應方法的 HTML 的細微差別，它是用于成分的。

查看完整描述

1 回答

幕布斯6054654

TA貢獻1876條經驗獲得超7個贊

from bs4 import BeautifulSoup

soup = BeautifulSoup(resp, "html.parser")

div = soup.find("div", attrs={"id": "sr-recipe-method"})

# select all tag's inside the <div>

for p in div.findAll("p"):

# check if exist's inside tag

if p.find('strong'):

print(p.text)

1 Combine the ingredients: In a blender, combine the oats, strawberries, yogurt, almond milk, banana, and almond extract.

2 Puree the smoothie: Starting on low speed, puree the ingredients. Turn the blender on high and continue to puree until smooth. Serve right away.

反對回復 2023-05-09

1 回答
0 關注
120 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

帶有 <p> 標簽的網頁抓取攔截器

帶有 <p> 標簽的網頁抓取攔截器

1 回答

添加回答

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

帶有 <p> 標簽的網頁抓取攔截器

帶有 <p> 標簽的網頁抓取攔截器

1 回答

添加回答