Skip to content

Fix NaN subpopulation in link vehicle counts#246

Open
syhwawa wants to merge 8 commits intomainfrom
fix/miss-subpopulation-in-link-counts
Open

Fix NaN subpopulation in link vehicle counts#246
syhwawa wants to merge 8 commits intomainfrom
fix/miss-subpopulation-in-link-counts

Conversation

@syhwawa
Copy link
Contributor

@syhwawa syhwawa commented Mar 10, 2026

Summary

When running event handlers on counts with subpopulation) in our simulations, we found NaN rows appeared in the output link_vehicle_counts_car_sub_subpopulation CSVs/GeoJSONs.
image (14)

It was confirmed that the NaN subpopulation rows originate from taxi trips — taxi vehicle IDs follow the pattern {person_id}_taxi, so the code's direct vehicle-ID lookup against the person attributes dictionary always misses.

Some actions in the CI build pipeline were out of date, which failed in GitHub Actions.

This PR aims to fix the bugs to make sure those NaN subpopulations will be added and removed them from the vehicle links counts outputs and any other relavent event handlers.

Root Cause Analysis

Elara builds an attributes dictionary keyed by person ID from output_plans.xml.gz:

def get_attributes_from_plans(self, elem):
    ident = elem.xpath("@id")[0]   # reads <person id="10628">
    ...
    return ident, attributes

# Result:
attributes = {
    "10628": {"subpopulation": "high", ...},
    "0":     {"subpopulation": "low",  ...},
    ...
}

Why taxi end up as NaN?

In MATSim, taxi declared as networkMode="car" , then the lookup happens in process_event() — event_handlers.py:

ident = elem.get("vehicle")          # → "10628_taxi"
veh_mode = self.vehicle_mode(ident)  # → "car" (networkMode in vehicles XML)
if veh_mode == self.mode:            # "car" == "car" → True, enters counting
    attribute_class = self.attributes.get(ident, {}) \
                          .get("subpopulation", None)
    # attributes.get("10628_taxi") → NOT FOUND
    # person "10628" exists with subpopulation="high"
    # but key "10628_taxi" ≠ "10628"
    # → returns None → NaN in CSV

The MATSim event XML does carry the person ID :

<event type="vehicle enters traffic" person="10628" vehicle="10628_taxi" .../>

But Elara ignores person= and uses vehicle= for the lookup. For regular car trips, vehicle == personso it works fine. For taxi and car_passenger tripsvehicle ≠ person`, so the lookup silently fails.

Key Changes

Event Handlers (elara/event_handlers.py)

  • veh_to_person cache added to LinkVehicleCounts, LinkVehicleCapacity, and LinkVehicleSpeeds — populates on vehicle enters traffic events, maps vehicle_id → person_id, and is used on subsequent entered link / left link events so that taxi vehicles (e.g. 10628_taxi) correctly resolve to their person (10628) for attribute lookup
  • extract_attribute_values() always includes NoneNone is now always present as a fallback class regardless of attribute availability, with updated docstring
  • NaN row filter added to finalise() in six handlers: LinkVehicleCounts, LinkVehicleCapacity, LinkVehicleSpeeds, LinkPassengerCounts, StopPassengerCounts, StopToStopPassengerCounts, and RoutePassengerCounts — drops structural zero-count NaN rows after aggregation while preserving NaN rows that carry real counts (e.g. bus drivers whose person ID is not in attributes)

Tests (tests/test_2_event_handlers.py)

  • Updated existing finalise test assertions to remove np.nan from expected subpopulation value sets — reflects corrected output where zero-count NaN rows are no longer present for car, passenger, route, stop, and stop-to-stop handlers
  • Updated VehiclePassengerGraph pickle fixtures to match new handler output

CI Fixes (build_pipeline.yml)

  • Upgraded actions/checkout v2 → v4, actions/setup-python v1 → v5, actions/cache v1 → v4, Python 3.7 → 3.8
  • Replaced inline AWS credential steps with a shared aws-upload reusable workflow
  • Updated Slack notify action from v2.0.0 → v2.2.0

AWS Credentials Update

The CI was failing due to expired/invalid AWS credentials. The fix required:

  1. Go to the AWS dev account → Secrets Manager and retrieve the secret values for bitbucket-s3
  2. Update the following GitHub repository secrets with the new values:
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY

Once the secrets were rotated, the CI pipeline passed successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Broken CI build pipeline Blank subpopulation entries in elara outputs

1 participant