1 回答

TA貢獻1810條經驗 獲得超4個贊
示例文本:
FullName;ISO3;ISO1;molecular_weight
Alanine;Ala;A;89.09
Arginine;Arg;R;174.20
Asparagine;Asn;N;132.12
Aspartic_Acid;Asp;D;133.10
Cysteine;Cys;C;121.16
基于“;”創建列 分隔器:
import pandas as pd
f = "aminoacids"
df = pd.read_csv(f,sep=";")
編輯:考慮到評論,我認為文本看起來更像是這樣的:
t = """1234; text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon 2345; then the new Ad-Text begins until the next ID 3456; and so on1234; text in written from with multiple """
在這種情況下,像這樣的正則表達式會將您的字符串拆分為 id 和文本,然后您可以使用它們來生成 pandas 數據框。
import re
r = re.compile("([0-9]+);")
re.split(r,t)
輸出:
['',
'1234',
' text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon ',
'2345',
' then the new Ad-Text begins until the next ID ',
'3456',
' and so on',
'1234',
' text in written from with multiple ']
編輯2:這是對評論中提問者附加問題的回應: 如何將此字符串轉換為具有 2 列的 pandas 數據框:ID 和文本
import pandas as pd
# a is the output list from the previous part of this answer
# Create list of texts. ::2 takes every other item from a list, starting with the FIRST one.
texts = a[::2][1:]
print(texts)
# Create list of ID's. ::1 takes every other item from a list, starting with the SECOND one
ids = a[1::2]
print(ids)
df = pd.DataFrame({"IDs":ids,"Texts":texts})
添加回答
舉報