GitHub - phucanhapril/craigslove: craigslist personals analyzer

Craigslove: Love in the Time of Craigslist

http://phucanhapril.github.io/craigslove/

craigslist web scraper

python craigslove_scraper.py <base url>

where the base url is newyork.craigslist.org, or sfbay.craigslist.org, etc.

This will scrape personals from Strictly Platonic (stp), Miscellaneous Romance (msr), w4w, w4m, m4w, and m4m.

Posts are saved to posts/<city>/<query>.csv (for example: results/providence/w4w.csv)

setup (mac instructions)

install Python 2.7 and pip

sudo pip install BeautifulSoup4  
brew install libxml2
pip install lxml (if that doesn’t work try this first: 'xcode-select --install')

k-means clustering

ml/clustering/cluster.py

instructions for running within the ml/clustering directory:

python cluster.py -path data/city/sample -c 2 -plot True

This will perform 2-means clustering on each csv file in data/city/sample.
The resulting clusters in json format are saved to cluster_data/.
It will also create 2-dimensional scatter plots representing the clusters, saved to cluster_plot/.

note: ml/clustering/data/ is populated from the original raw posts/ by running ml/clustering/util/format_posts_for_clustering.py.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
ageheight		ageheight
archive		archive
midterm_viz		midterm_viz
ml		ml
posts		posts
util		util
.gitignore		.gitignore
README.md		README.md
craigslove_scraper.py		craigslove_scraper.py
generate_post.py		generate_post.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Craigslove: Love in the Time of Craigslist

craigslist web scraper

setup (mac instructions)

k-means clustering

ml/clustering/cluster.py

CODE REFERENCED: http://brandonrose.org/clustering

About

Releases

Packages

Contributors 4

Languages

phucanhapril/craigslove

Folders and files

Latest commit

History

Repository files navigation

Craigslove: Love in the Time of Craigslist

craigslist web scraper

setup (mac instructions)

k-means clustering

ml/clustering/cluster.py

CODE REFERENCED: http://brandonrose.org/clustering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages