首頁猿問過濾后的 twitter api 流示例

過濾后的 twitter api 流示例

Python

RISEBY 2023-09-26 14:58:53

我需要獲取 Twitter 流的過濾樣本我正在使用 tweepy 我檢查了 Stream 類的函數來獲取樣本流并進行過濾但我不明白我應該如何設置班級應該是stream.filter(track=['']).sample() stream.sample().filter(track=[''])或者每一個都排成一行或者什么如果您有另一個想法如何根據關鍵字過濾器獲取示例流，請幫助提前致謝

查看完整描述

2 回答

侃侃無極

TA貢獻2051條經驗獲得超10個贊

Twitter v2 API 包括用于隨機采樣的端點和用于過濾推文的端點。

import requests

import os

import json

import pandas as pd

# To set your enviornment variables in your terminal run the following line:

# export 'BEARER_TOKEN'='<your_bearer_token>'

data = []

counter = 0

def create_headers(bearer_token):

? ? headers = {"Authorization": "Bearer {}".format(bearer_token)}

? ? return headers

def get_rules(headers, bearer_token):

? ? response = requests.get(

? ? ? ? "https://api.twitter.com/2/tweets/search/stream/rules", headers=headers

? ? )

? ? if response.status_code != 200:

? ? ? ? raise Exception(

? ? ? ? ? ? "Cannot get rules (HTTP {}): {}".format(response.status_code, response.text)

? ? ? ? )

? ? print(json.dumps(response.json()))

? ? return response.json()

def delete_all_rules(headers, bearer_token, rules):

? ? if rules is None or "data" not in rules:

? ? ? ? return None

? ? ids = list(map(lambda rule: rule["id"], rules["data"]))

? ? payload = {"delete": {"ids": ids}}

? ? response = requests.post(

? ? ? ? "https://api.twitter.com/2/tweets/search/stream/rules",

? ? ? ? headers=headers,

? ? ? ? json=payload

? ? )

? ? if response.status_code != 200:

? ? ? ? raise Exception(

? ? ? ? ? ? "Cannot delete rules (HTTP {}): {}".format(

? ? ? ? ? ? ? ? response.status_code, response.text

? ? ? ? ? ? )

? ? ? ? )

? ? print(json.dumps(response.json()))

def set_rules(headers, delete, bearer_token):

? ? # You can adjust the rules if needed

? ? sample_rules = [

? ? ? ? {"value": "dog has:images", "tag": "dog pictures"},

? ? ? ? {"value": "cat has:images -grumpy", "tag": "cat pictures"},

? ? ]

? ? payload = {"add": sample_rules}

? ? response = requests.post(

? ? ? ? "https://api.twitter.com/2/tweets/search/stream/rules",

? ? ? ? headers=headers,

? ? ? ? json=payload,

? ? )

? ? if response.status_code != 201:

? ? ? ? raise Exception(

? ? ? ? ? ? "Cannot add rules (HTTP {}): {}".format(response.status_code, response.text)

? ? ? ? )

? ? print(json.dumps(response.json()))

def get_stream(headers, set, bearer_token):

? ? global data, counter

? ? response = requests.get(

? ? ? ? "https://api.twitter.com/2/tweets/search/stream", headers=headers, stream=True,

? ? )

? ? print(response.status_code)

? ? if response.status_code != 200:

? ? ? ? raise Exception(

? ? ? ? ? ? "Cannot get stream (HTTP {}): {}".format(

? ? ? ? ? ? ? ? response.status_code, response.text

? ? ? ? ? ? )

? ? ? ? )

? ? for response_line in response.iter_lines():

? ? ? ? if response_line:

? ? ? ? ? ? json_response = json.loads(response_line)

? ? ? ? ? ? print(json.dumps(json_response, indent=4, sort_keys=True))

? ? ? ? ? ? data.append(json_response['data'])

? ? ? ? ? ? if len(data) % 100 == 0:

? ? ? ? ? ? ? ? print('storing data')

? ? ? ? ? ? ? ? pd.read_json(json.dumps(data), orient='records').to_json(f'tw_example_{counter}.json', orient='records')

? ? ? ? ? ? ? ? data = []

? ? ? ? ? ? ? ? counter +=1

def main():

? ? bearer_token = os.environ.get("BEARER_TOKEN")

? ? headers = create_headers(bearer_token)

? ? rules = get_rules(headers, bearer_token)

? ? delete = delete_all_rules(headers, bearer_token, rules)

? ? set = set_rules(headers, delete, bearer_token)

? ? get_stream(headers, set, bearer_token)

if __name__ == "__main__":

? ? main()

然后，將 pandas dataframe 中的數據加載為 df = pd.read_json('tw_example.json',? orient='records').

反對回復 2023-09-26

叮當貓咪

TA貢獻1776條經驗獲得超12個贊

我建議閱讀 tweepy 的 api 文檔。

通過閱讀其他代碼片段，我相信應該這樣做：

stream.filter(track=['Keyword'])
print(stream.sample())

反對回復 2023-09-26

千萬里不及你

TA貢獻1784條經驗獲得超9個贊

據我了解，tweepy使用 twitter v1.1 API，該 API 有單獨的 API 用于實時采樣和過濾推文。

Twitter API 參考。?v1 實時采樣?v1 實時過濾

方法一：可以使用stream.filter(track=['Keyword1', 'keyord2'])等方法獲取過濾后的流數據，然后從收集的數據中采樣記錄。

class StreamListener(tweepy.StreamListener):

? ? def on_status(self, status):

? ? ? ? # do data processing and storing here

方法 2：可以編寫以隨機時間間隔啟動和停止流式傳輸的程序（例如，每 15 分鐘間隔 3 分鐘隨機采樣）。

方法三：可以使用采樣API來收集數據，然后用關鍵字過濾來存儲相關數據。

反對回復 2023-09-26

2 回答
0 關注
214 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

過濾后的 twitter api 流示例

過濾后的 twitter api 流示例

2 回答

添加回答