1 回答

TA貢獻1886條經驗 獲得超2個贊
一些一般指導:
您正在創建一個池。池大小應取決于計算機,而不是作業的大小。例如,您希望池中有 4 個進程而不是 10000 個進程,即使您有 10000 個文件要處理
在每個進程上運行的作業應該簡單但已參數化。在您的例子中,創建一個函數來獲取文件名作為輸入并執行轉換。然后將輸入文件映射到其中。過濾應在調用之前完成。
map
因此,我會將您的代碼轉換為如下所示的內容:
import os
from dbfread import DBF
import pandas as pd
import multiprocessing
directory = 'C:\\Path_to_DBF_Files' #define file directory
files_in = os.listdir(directory) #store files in directory to list
def convert(file):
file_path = os.path.join(files_in, file)
print(f'\nReading in {file}...')
dbf = DBF(file_path) #create DBF object
dbf.encoding = 'utf-8' #set encoding attribute to utf-8 instead of acsii
dbf.char_decode_errors = 'ignore' #set decoding errors attribute to ignore any errors and read in DBF file as is
print('\nConverting to DataFrame...')
df = pd.DataFrame(iter(dbf)) #convert to Pandas dataframe
df.columns.astype(str) #convert column datatypes to string
print(df)
print('\nWriting to CSV...')
dest_directory = 'C:\\Path_to_output_directory\\%s.csv' % ('D' + file.strip('.DBF')) #define destination directory and names for output files
df.to_csv(dest_directory, index = False)
print(f'\nConverted {file} to CSV. Moving to next file...')
pool = multiprocessing.Pool(processes = 4)
pool.map(convert, [file for file in files_in if file.startswith('D') and file.endswith('.DBF')])
添加回答
舉報