亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

使用 Scrapy 從 Business Insider 抓取股票詳細信息

使用 Scrapy 從 Business Insider 抓取股票詳細信息

嚕嚕噠 2022-12-06 15:26:44
我正在嘗試從以下站點提取每只股票的“名稱”、“最新價格”和“%”字段: https ://markets.businessinsider.com/index/components/s&p_500但是,即使我已經確認我的 XPaths 在 Chrome 控制臺中為這些字段工作,我也沒有得到任何數據。作為參考,我一直在使用本指南: https ://realpython.com/web-scraping-with-scrapy-and-mongodb/items.pyfrom scrapy.item import Item, Fieldclass InvestmentItem(Item):    ticker = Field()    name = Field()    px = Field()    pct = Field()investment_spider.pyfrom scrapy import Spiderfrom scrapy.selector import Selectorfrom investment.items import InvestmentItemclass InvestmentSpider(Spider):    name = "investment"    allowed_domains = ["markets.businessinsider.com"]    start_urls = [            "https://markets.businessinsider.com/index/components/s&p_500",            ]    def parse(self, response):        stocks = Selector(response).xpath('//*[@id="index-list-container"]/div[2]/table/tbody/tr')        for stock in stocks:            item = InvestmentItem()            item['name'] = stock.xpath('td[1]/a/text()').extract()[0]            item['px'] = stock.xpath('td[2]/text()[1]').extract()[0]            item['pct'] = stock.xpath('td[5]/span[2]').extract()[0]            yield item控制臺輸出:...2020-05-26 00:08:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://markets.businessinsider.com/robots.txt> (referer: None)2020-05-26 00:08:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://markets.businessinsider.com/index/components/s&p_500> (referer: None)2020-05-26 00:08:33 [scrapy.core.engine] INFO: Closing spider (finished)2020-05-26 00:08:33 [scrapy.statscollectors] INFO: Dumping Scrapy stats:...2020-05-26 00:08:33 [scrapy.core.engine] INFO: Spider closed (finished)
查看完整描述

2 回答

?
蝴蝶不菲

TA貢獻1810條經驗 獲得超4個贊

您在 xpath 表達式的開頭缺少“./”。我已經簡化了你的 xpaths:


def parse(self, response):

    stocks = response.xpath('//table[@class="table table-small"]/tr')


    for stock in stocks[1:]:

        item = InvestmentItem()

        item['name'] = stock.xpath('./td[1]/a/text()').get()

        item['px'] = stock.xpath('./td[2]/text()[1]').get().strip()

        item['pct'] = stock.xpath('./td[5]/span[2]/text()').get()


        yield item



查看完整回答
反對 回復 2022-12-06
?
阿波羅的戰車

TA貢獻1862條經驗 獲得超6個贊

XPATH版本


    def parse(self, response):


        rows = response.xpath('//*[@id="index-list-container"]/div[2]/table/tr')

        for row in rows:

            yield{

                'name' : row.xpath('td[1]/a/text()').extract(),

                'price':row.xpath('td[2]/text()[1]').extract(),

                'pct':row.xpath('td[5]/span[2]/text()').extract(),

                'datetime':row.xpath('td[7]/span[2]/text()').extract(),

            }

CSS版本


    def parse(self, response):


        table = response.css('div#index-list-container table.table-small') 

        rows = table.css('tr') 


        for row in rows:

            name = row.css("a::text").get()

            high_low = row.css('td:nth-child(2)::text').get()

            date_time = row.css('td:nth-child(7) span:nth-child(2) ::text').get()


            yield {      

                'name' : name, 

                'high_low': high_low,

                'date_time' : date_time                

            }

結果


{"high_low": "\r\n146.44", "name": "3M", "date_time": "05/26/2020 04:15:11 PM UTC-0400"},

{"high_low": "\r\n42.22", "name": "AO Smith", "date_time": "05/26/2020 04:15:11 PM UTC-0400"},

{"high_low": "\r\n91.47", "name": "Abbott Laboratories", "date_time": "05/26/2020 04:15:11 PM UTC-0400"},

{"high_low": "\r\n92.10", "name": "AbbVie", "date_time": "05/26/2020 04:15:11 PM UTC-0400"},

{"high_low": "\r\n193.71", "name": "Accenture", "date_time": "05/26/2020 04:15:11 PM UTC-0400"},

{"high_low": "\r\n73.08", "name": "Activision Blizzard", "date_time": "05/25/2020 08:00:00 PM UTC-0400"},

{"high_low": "\r\n385.26", "name": "Adobe", "date_time": "05/25/2020 08:00:00 PM UTC-0400"},

{"high_low": "\r\n133.48", "name": "Advance Auto Parts", "date_time": "05/26/2020 04:15:11 PM UTC-0400"},




查看完整回答
反對 回復 2022-12-06
  • 2 回答
  • 0 關注
  • 113 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號