我從命令 python 收集 url,然后將它的插入到 start_urlsfrom flask import Flask, jsonify, requestimport scrapyimport subprocessclass ClassSpider(scrapy.Spider): name = 'mySpider' #start_urls = [] #pages = 0 news = [] def __init__(self, url, nbrPage): self.pages = nbrPage self.start_urls = [] self.start_urlsappend(url) def parse(self): ... def run(self): subprocess.check_output(['scrapy', 'crawl', 'mySpider', '-a', f'url={self.start_urls}', '-a', f'nbrPage={self.pages}']) return self.newsapp = Flask(__name__)data = []@app.route('/', methods=['POST'])def getNews(): mySpiderClass = ClassSpider(request.json['url'], 2) return jsonify({'data': mySpider.run()})if __name__ == "__main__": app.run(debug=True)我得到這個錯誤: raise not supported("unsupported url scheme %s: %s" % scrapy.exceptions.NotSupported: Unsupported URL scheme '': no handler available for that scheme當我放置 a print('my urls List: ' + str(self.start_urls))時,它會打印一個 url 列表,例如 --> my urls List: ['www.googole.com']任何幫助請
1 回答

臨摹微笑
TA貢獻1982條經驗 獲得超2個贊
我想發生這種情況是因為您首先附加url
到self.start_urls
然后用列表調用ClassSpider
srun
方法,self.start_urls
該方法又將列表附加到列表,最后得到一個嵌套列表而不是字符串列表。
為避免這種情況,您應該像這樣更改__init__
方法:
def __init__(self, url, nbrPage): self.pages = nbrPage self.url = url self.start_urls = [] self.start_urls.append(url)
然后通過self.url
而不是self.start_urls
in run
:
def run(self): subprocess.check_output(['scrapy', 'crawl', 'mySpider', '-a', f'url={self.url}', '-a', f'nbrPage={self.pages}']) return self.news
添加回答
舉報
0/150
提交
取消