-
Notifications
You must be signed in to change notification settings - Fork 305
Description
- Node.js Version: 7.4.0
- v8 Version: 5.6.326.50
- OS: macOS Sierra
- Scope (install, code, runtime, meta, other?): basic string manipulation
- Module (and version) (if relevant): Buffer
Hi, I'm working on VS Code (based on Electron), and I'm looking into improving our memory usage when dealing with large files in microsoft/vscode#30180.
Our buffer implementation is basically using an array of lines. I am aware of the advantages and disadvantages of that, but I would still like to push it to its limits. Our file reading involves reading chunks and pushing those through iconv-lite to handle file encoding. Long story short, we have a bunch of ~64KB strings that we need to split into lines.
The fastest way (that doesn't involve a native C++ node module) I've found so far is a using a simple str.split(\r\n|\r|\n). This works very well, but it ends up creating a (sliced string) for each line, all of which point to the parent chunk. When dealing with files of 3MM lines, these objects add up and eliminating them can mean a few extra tens of MB of memory savings.
Our current workaround to rid ourselves of the (sliced string) is here:
var lines = largeStr.split(/\r\n|\r|\n/);
for (var i = 0, len = lines.length; i < len; i++) {
lines[i] = Buffer.from(lines[i]).toString();
}I don't know if the above takes advantage of string interning or if it is the most efficient way to do this short of writing a native node module.
Do you have any idea? Thank you.