Skip to content

Best way to avoid (sliced string) ? #711

@alexdima

Description

@alexdima
  • Node.js Version: 7.4.0
  • v8 Version: 5.6.326.50
  • OS: macOS Sierra
  • Scope (install, code, runtime, meta, other?): basic string manipulation
  • Module (and version) (if relevant): Buffer

Hi, I'm working on VS Code (based on Electron), and I'm looking into improving our memory usage when dealing with large files in microsoft/vscode#30180.

Our buffer implementation is basically using an array of lines. I am aware of the advantages and disadvantages of that, but I would still like to push it to its limits. Our file reading involves reading chunks and pushing those through iconv-lite to handle file encoding. Long story short, we have a bunch of ~64KB strings that we need to split into lines.

The fastest way (that doesn't involve a native C++ node module) I've found so far is a using a simple str.split(\r\n|\r|\n). This works very well, but it ends up creating a (sliced string) for each line, all of which point to the parent chunk. When dealing with files of 3MM lines, these objects add up and eliminating them can mean a few extra tens of MB of memory savings.

Our current workaround to rid ourselves of the (sliced string) is here:

var lines = largeStr.split(/\r\n|\r|\n/);
for (var i = 0, len = lines.length; i < len; i++) {
    lines[i] = Buffer.from(lines[i]).toString();
}

I don't know if the above takes advantage of string interning or if it is the most efficient way to do this short of writing a native node module.

Do you have any idea? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions