This folder contains notebooks showcasing concepts covered in the book. Most of the examples only use one of the subfolders in archive (the one that contains data for writers.stackexchange.com).
I've included a processed version of the data as a .csv
for convenience.
If you want to generate this data yourself, or generate it for another subfolder, you should:
-
Download a subfolder from the stackoverflow archives
-
Run
parse_xml_to_csv
to convert it to a DataFrame -
Run
generate_model_text_features
to generate a DataFrames with precomputed features
The notebooks belong to a few categories of concepts, described below.