Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flat internal Event representation #1968

Open
colinsurprenant opened this issue Oct 27, 2014 · 5 comments
Open

flat internal Event representation #1968

colinsurprenant opened this issue Oct 27, 2014 · 5 comments

Comments

@colinsurprenant
Copy link
Contributor

an idea worth exploring is to move from an internal Hash/hierarchical event representation to a simpler flat representation. this is really some early brainstorming, please contribute ideas, thoughts, comments.

Overall goals: The LogStash::Event object supports nested structures and a special syntax for accessing nested field (field references). Internally, the object's JSON representation is basically the same as the object itself. Can be a hash of hash of hash, or whatever. From a memory usage perspective, this can consume lots of object references. From a serialization point of view, visiting all the objects can be costly. We are interested in exploring some internal-representation improvements that should improve per-event memory usage and event serialization costs.

basically instead of having an internal object hierarchy representation like

{
  "message" => "foo",
  "geoip" => {
    "coords" => {
      "latitude" => 45.5,
      "longitude" => 73.5667
    }
  }
}

we could have something like

{
  "message" => "foo",
  "geoip.coords.latitude" => 45.5,
  "geoip.coords.longitude" => 73.5667,
}

or, using the logstash path convention:

{
  "[message]" => "foo",
  "[geoip][coords][latitude]" => 45.5,
  "[geoip][coords][longitude]" => 73.5667,
}

This could be done while preserving the current Event api, making this backward compatible.

Pros:

  • we could get rid of the whole Accessors class which caches fields path to inner objects values. this would become essentially a 1:1 lookup. it would speed up lookups and remove the Accessors complexity.
  • event serialization for persistence would become simpler and certainly faster

Cons:

  • json input codec would need to change to use the Jackson streaming api and "convert" json object to a flat representation.
  • same idea for json output codec, we'd need to produce a json object from a flat representation.
  • we'd need to verify all usage of Event#to_hash and probably have to perform flat -> hierarchical conversion to create a proper Hash representation of the Event.

Thoughts?

@clintongormley
Copy link
Contributor

I think the only gotcha is: How would you handle arrays of objects?

"visits": [
    { "page": "/", "duration": "5s"},
    { "page": "/help", "duration": "10s"},
    ....
]

@jordansissel
Copy link
Contributor

@clintongormley arrays in general are possibly difficult here, and we probably can't do it the same way ES does.

The existing fieldref syntax allows array access of your 'duration' field like [visits][0][duration] and [visits][1][duration]

@colinsurprenant
Copy link
Contributor Author

right, the complete @clintongormley example would become

[visits][0][page] => "/"
[visits][0][duration] => "5s"
[visits][1][page] => "/help"
[visits][1][duration] => "10s"

@colinsurprenant colinsurprenant changed the title flat internal Event representation [logstash] flat internal Event representation Nov 5, 2014
@colinsurprenant colinsurprenant changed the title [logstash] flat internal Event representation flat internal Event representation Nov 5, 2014
@jordansissel
Copy link
Contributor

@colinsurprenant thoughts on this? We can still address this later, but I think for now, we can close this until we want to revisit it. Thoughts?

@colinsurprenant
Copy link
Contributor Author

I think the idea of exploring alternate inner Event data representation is worth keeping open but this is definitely not anything on the radar for now since our focus is now on the Java Event implementation and the added serialization benefits and the new Java Accessors impl which has also improved field reference access performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants