- Data - A list of (lat,lng) coordinates around San Francisco. Look under
gps-fun/data
. - A Frontend Page - To visualize the raw data and clustered results on the map.
- Scripting entries - Run clustering algorithms to generate clusters and find the most representative center of a cluster. Centers can be visualized on the frontend page.
The following steps will help you setup a development environment.
$ sudo easy_install pip
$ sudo pip install virtualenv
$ git clone [email protected]:salami162/gps-fun.git
$ cd gps-fun
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python manage.py --help
python manage.py runserver
This will launch a server on localhost at port 5000. Hit up the index page at http://localhost:5000/
The Start/Stop toggle button on the top right corner is meant to start polling for changes in the trained clusters. Before you hit it the first time, make sure you've run atleast one round of clustering, so as to generate a ./data/trained_output.csv
. To run one, see the next step.
python manage.py kmeans -c 4 -src './data/wwc_conf_dataset_tiny.csv' -dest './data/trained_output.csv'
Given a csv file of locations, generates clusters and outputs the cluster centers into another csv file. The above command will output 4 clusters, with the lat/lng of the centers in ./data/trained_output.csv
Here's some reading you can do to help familiarize yourself with Clustering, k-means clustering and Hierarchical clustering.
sklearn clustering links to docs for the python package that implements various clustering algorithms.