gnact is a Python package to explore and analyze dense, geospatial datasets. It's primary purposes are to:
-
Employ, Evaluate, and Compare a range of clustering methods across multiple hyperparameters. Density-based clustering algorithms include DBSCAN, OPTICS, ST-DBSCAN, and HDBSCAN. The module also includes a custom implementation of static trip segmentation and provides a wrapper for Scikit-Mobility's excellent dynamic trip segmentation method.
-
Translate a dense, geospatial dataset into a network using a list of already known sites, sites found through clustering methods, or a combination of both. Once translate to a network, gnact can help visualize and analyze trends and patterns in a geospatial dataset using network analysis tools.
-
Route prediction based on the probability of historical occurrences. For example, if a device visits site 1 and then site 2, we can predict the next site by looking at the next site historically visited after a device has visited site 1 and site 2. In development is an implementation that uses deep learning to predict the next port in a sequence.
-
In development is a module including a range of time-series analysis tools.
For a demo of GNACT using real-world examples including a large sample of AIS ship tracking data, please see the following Notebooks:
- Clustering demo via Jupyter's NBviewer
- Community Detection and Comparison demo via Jupyter's NBviewer
The package includes the following main modules:
This module contains functions to apply a range of clustering methods against a Pandas DataFrame with a lat, lon, time, and id for each position. The results can be summarized as clusters (calc_centers) and key metrics such as precision, recall, and F1 score can be calculated (calc_stats). Key helper functions like calc_nn (which returns a df with the distance to the nearest site for each point) and get_df_stats (which returns a df determining if a cluster is a True Positive, False Positive, or False Negative) integrate with other GNACT functions.
This module builds on clustering functions to plot clusters, stops, and statistical summaries using Folium.
This module generates static trip segmentation based on a list of sites to generate a df of ordered stops. This df can then be treated as an edgelist and is used as such by functions to plot specific UIDs and analyze the network generated by these edgelists.
Given an ordered list of stops from the network module, this module can use an NLP N-Gram based extension to predict the next site visited (N+1) given the historical probability of known previous site visits.
This module includes connections to a PostGres Database, several helper functions for managing databases, and an traj_enhance_df function that addes speed and course over ground to a DataFrame that includes lat, lon, and time.
Additional work is ongoing to develop a time_series module to include in GNACT.