http://phucanhapril.github.io/craigslove/
python craigslove_scraper.py <base url>
where the base url is newyork.craigslist.org, or sfbay.craigslist.org, etc.
This will scrape personals from Strictly Platonic (stp), Miscellaneous Romance (msr), w4w, w4m, m4w, and m4m.
Posts are saved to posts/<city>/<query>.csv
(for example: results/providence/w4w.csv
)
install Python 2.7 and pip
sudo pip install BeautifulSoup4
brew install libxml2
pip install lxml (if that doesn’t work try this first: 'xcode-select --install')
instructions for running within the ml/clustering directory:
python cluster.py -path data/city/sample -c 2 -plot True
This will perform 2-means clustering on each csv file in data/city/sample
.
The resulting clusters in json format are saved to cluster_data/
.
It will also create 2-dimensional scatter plots representing the clusters, saved to cluster_plot/
.
note: ml/clustering/data/
is populated from the original raw posts/
by running ml/clustering/util/format_posts_for_clustering.py
.