2 回答

TA貢獻1111條經驗 獲得超0個贊
假設在每個標題之后所有元素都是該標題的內容,并且第一個元素始終是標題 - 您可以使用itertools.groupby.
key可以是元素是否具有標題標簽,這樣標題的內容將在其后分組:
from itertools import groupby
lst = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]
headers = []
content = []
for key, values in groupby(lst, key=lambda x: "<h" in x):
if key:
headers.append(*values)
else:
content.append(" ".join(values))
print(headers)
print(content)
給出:
['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']
['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content']
您當前方法的問題是您總是只將一項添加到內容中。您要做的是累積temp_content列表,直到遇到下一個標題,然后才添加它并重置:
headers = []
content = []
temp_content = None
for i in list:
if "<h" in i:
if temp_content is not None:
content.append(" ".join(temp_content))
temp_content = []
headers.append(i)
else:
temp_content.append(i)

TA貢獻1848條經驗 獲得超6個贊
您可以在collections.defaultdict
迭代列表時將標題和內容收集到 a 中。然后將鍵和值拆分為最后headers
的content
列表。我們可以通過簡單地檢查一個字符串來檢測標題。str.startswith
"<h"
我還使用該continue
語句在找到標頭后立即進入下一次迭代。也可以在這里只使用一個else
語句。
from collections import defaultdict
lst = [
"<h1> question 1",
"question 1 content",
"question 1 more content",
"<h1> answer 1",
"answer 1 content",
"answer 1 more content",
"<h1> question 2",
"question 2 content",
"<h> answer 2",
"answer 2 content",
]
header_map = defaultdict(list)
header = None
for item in lst:
if item.startswith("<h"):
header = item
continue
header_map[header].append(item)
headers = list(header_map)
print(headers)
content = [" ".join(v) for v in header_map.values()]
print(content)
輸出:
['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']
['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content'
添加回答
舉報