请教各位 V 友一个问题,本人需要增量处理一些大型的 XML 文件,从 python-cookbook 上找到了代码,我改到了我的场景下,但是代码似乎没有正常工作,内存占用上升很快,大约处理十几万行会占用几个 g 内存,我不太理解,希望大神指点,主要逻辑代码如下
macOS BigSur
python 3.8.12
from xml.etree.ElementTree import iterparse
def parse_and_remove(filename, path):
path_parts = path.split('/')
doc = iterparse(filename, ('start', 'end'))
# Skip the root element
next(doc)
tag_stack = []
elem_stack = []
for event, elem in doc:
if event == 'start':
tag_stack.append(elem.tag)
elem_stack.append(elem)
elif event == 'end':
if tag_stack == path_parts:
yield elem
elem_stack[-2].remove(elem)
try:
tag_stack.pop()
elem_stack.pop()
except IndexError:
pass
data = parse_and_remove('my.xml','path')
client, table = getMongo()
for pothole in data:
resDict = {
# 获取我需要的数据
}
table.insert(resDict)
client.close()
1
2i2Re2PLMaDnghL 2021-11-10 09:46:26 +08:00
1. 尝试换用 lxml
2. 尝试用 xpath 而不是手动 iter 比对 path |