###耗时
perl 最慢 等不及处理完 就停止了 perl
nodejs 1 分钟多
php 30 多秒
ruby 30 多秒
python 11 秒左右
go 4 秒左右
###时间上 go 和 python 胜出
###功能上面 这个 csv 文件不标准 有个字段有个单个双引号
go 和 nodejs 和 ruby 都报错 无法处理完 上面它们两个的时间是把那个单引号移除后的 csv 文件
php 没报错 但因为单个双引号忽略了很多行 它把那些双引号当分界符了
功能上 python 胜出 python 完全能处理不标准的 csv 最后能生成正确 csv 就几行代码
###代码写起来 nodejs 最恶心
nodejs 屌什么屌 非常像 ghostscirpt 作者评价 perl 的话:perl 像从狗的肛门里吐出来的东西
写这么个小项目 感觉 nodejs 才像从狗的肛门里吐出来的东西
![]() |
1
ysc3839 15 天前 via Android
所以代码呢?
|
2
zhouyin OP |
3
zhouyin OP |
4
hefish 15 天前
哈哈,说的非常高级。
|
![]() |
5
gainsurier 15 天前 via iPhone
估计 C 写需要一秒吗
|
6
zhouyin OP @gainsurier
python 和 php ruby 不就是 c 实现的么 只是 python 实现得好 |
7
chenqh 15 天前
python 为什么会那么快?难道是 C 库?
|
8
chenqh 15 天前
等等 nodejs 怎么这么快?JIT 呢?比 php 和 ruby 这种没 JIT 都慢?
|
9
zhouyin OP @gainsurier
还有 nodejs c++实现 没 python 做得好 |
![]() |
10
henbf 14 天前 ![]() 喷 Node.js 之前反思一下自己是不是应该先搞清楚 I/O 和流的基本概念
|
11
zhouyin OP @henbf
我不是 nodejs 高手 我把 a.js 更新了 使用了输出流 但现在报堆溢出错误了 : ```bash -bash-4.2# node a.js (node:17974) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [WriteStream]. Use emitter.setMaxListeners() to increase limit (Use `node --trace-warnings ...` to show where the warning was created) (node:17974) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [WriteStream]. Use emitter.setMaxListeners() to increase limit (node:17974) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [WriteStream]. Use emitter.setMaxListeners() to increase limit <--- Last few GCs ---> [17974:0x1c3dbf0] 40306 ms: Scavenge (reduce) 2046.8 (2082.1) -> 2046.5 (2082.6) MB, 44.4 / 0.0 ms (average mu = 0.342, current mu = 0.316) allocation failure [17974:0x1c3dbf0] 40396 ms: Scavenge (reduce) 2047.2 (2082.6) -> 2046.8 (2082.8) MB, 31.1 / 0.0 ms (average mu = 0.342, current mu = 0.316) allocation failure <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory 1: 0x7fcfb6136908 node::Abort() [/lib64/libnode.so.93] 2: 0x7fcfb6024451 [/lib64/libnode.so.93] 3: 0x7fcfb732a552 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/lib64/libnode.so.93] 4: 0x7fcfb732a8e7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/lib64/libnode.so.93] 5: 0x7fcfb74ea305 [/lib64/libnode.so.93] 6: 0x7fcfb74ea3e5 [/lib64/libnode.so.93] 7: 0x7fcfb74fe77c v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/lib64/libnode.so.93] 8: 0x7fcfb74ff0a1 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/lib64/libnode.so.93] 9: 0x7fcfb7502269 v8::internal::Heap::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/lib64/libnode.so.93] 10: 0x7fcfb75022f7 v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/lib64/libnode.so.93] 11: 0x7fcfb74c27d0 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/lib64/libnode.so.93] 12: 0x7fcfb74badb4 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [/lib64/libnode.so.93] 13: 0x7fcfb74bcbdf v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [/lib64/libnode.so.93] 14: 0x7fcfb74c4d5d v8::internal::Factory::NewStringFromUtf8(v8::base::Vector<char const> const&, v8::internal::AllocationType) [/lib64/libnode.so.93] 15: 0x7fcfb733d59d v8::String::NewFromUtf8(v8::Isolate*, char const*, v8::NewStringType, int) [/lib64/libnode.so.93] 16: 0x7fcfb6215390 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, node::encoding, v8::Local<v8::Value>*) [/lib64/libnode.so.93] 17: 0x7fcfb6123ef3 [/lib64/libnode.so.93] 18: 0x7fcfb71ba3cc [/lib64/libnode.so.93] Aborted ``` |
![]() |
12
henbf 14 天前
@zhouyin 你的写的不对
const { createReadStream, createWriteStream } = require("fs"); const { parse } = require("csv-parse"); const inputPath = "../outpy.csv"; const outputPath = "./test.txt"; const readStream = createReadStream(inputPath); const writeStream = createWriteStream(outputPath, { flags: "a" }); const parser = parse({ delimiter: ",", from_line: 2 }); readStream.pipe(parser); parser.on("data", (row) => { writeStream.write(row.join(",") + "\n"); }); parser.on("end", () => { console.log("finished"); writeStream.end(); }); parser.on("error", (error) => { console.error("CSV Parsing Error:", error); }); |
13
zhouyin OP 一开始我就是差不多你这样写的 没想到速度没提升 所以改成那样 以为 write 那里有缓冲区
一字不换把你的代码 运行 结果 耗时 一分钟多 望 python 莫及 -bash-4.2# time node a.js finished real 1m3.579s user 1m4.103s sys 0m2.478s |
![]() |
14
henbf 14 天前
@zhouyin 这中间还要看你对 csv 的每一行进行了怎么样的处理,你用 python 只是一读一写没有任何额外的处理,相当于复制。用 Node.js ,你却把每一行转换成数组,写的时候又把数组转换成字符串,当然慢了。
const { createReadStream, createWriteStream } = require("fs"); const inputPath = "../outpy.csv"; const outputPath = "./test.txt"; const readStream = createReadStream(inputPath, { highWaterMark: 256 * 1024 }); const writeStream = createWriteStream(outputPath, { flags: "a" }); readStream.pipe(writeStream); readStream.on("end", () => { console.log("finished"); writeStream.end(); }); readStream.on("error", (err) => { console.error("Error reading file:", err); }); writeStream.on("error", (err) => { console.error("Error writing file:", err); }); |
17
zhouyin OP @zhouyin
用了 csvwriter 时间 3 分多 -bash-4.2# time node a.js finished real 3m45.028s user 4m12.751s sys 2m59.847s |
19
stabc 14 天前
1. 解析 csv ,要一个字符一个字符拆分和拼接,底层语言绝对优势,因为可以根据位置拿来直接用,而 node 每次都创建新 string 对象。
2. python 标准库就有 csv 模块,所以也是底层在执行,那么他比 go 语言慢那么多,说明写的比较差。 3. 我刚才简单测试了一下,node 如果优化一下解析过程,减少字符串拼接,解析 400M 的 csv 文件,总用时可以压缩到 5 秒以内。 |
21
julyclyde 13 天前
@stabc 为什么,因为“标准库有”所以就“底层”了?
https://github.com/python/cpython/blob/main/Lib/csv.py python 的 csv 模块是个纯 python 的啊,并不是 C 的 |
22
stabc 13 天前 ![]() @julyclyde 你这个是接口层,底层在这里: https://github.com/python/cpython/blob/main/Modules/_csv.c
|