Adapter to run Python top2vec topic model in
You needed to setup a Clojure repl with:
- a working libpython-clj
- a working installation of python library top2vec
- I have tried 1.0.26
- a working instaltion of the PWI metric:
I provied here a Dockerfile which does the above instalation correctly. Using this, a working repl running in Docker can be started with:
docker run -ti -v $HOME/.m2:/home/user/.m2 -v "$(pwd):/app" -p 12345:12345 -w /app python3 -c "import cljbridge;cljbridge.init_clojure_repl(port=12345,bind='')"
Then the followin code trains the top2vec model on some texts.
(require '[clojure.test :refer :all]
'[ :refer :all]
'[camel-snake-kebab.core :as csk]
'[tablecloth.api :as tc])
(def raw-data
(tc/dataset ""
{:key-fn csk/->kebab-case-keyword
:file-type :csv
:gzipped? true}))
(def data
(-> raw-data
(tc/shuffle {:seed 123})
(tc/head 10000)
(tc/select-columns :text)
(def train-result-learn
( data {:speed :learn
:model-type :top2vec
:min_count 1
:documents-column :text}))
(clojure.pprint/pprint (update-in train-result-learn [:model-data] dissoc :model-as-bytes))
(def top2vec-model-py ( train-result-learn))
The obtained top2vec-model-py
is the python object of the trained model.
It can be used from Clojure via libpython-clj
calls of its API:
For a few cases I provide wrappers for the python API. A wordcloud of a topic (the first this case) can be obtained as a SVG string by:
(wc->svg top2vec-model-py (first (get-all-word-scores top2vec-model-py)) 100 100)
Copyright © 2021 Carsten Behring
EPLv1.0 is just the default for projects generated by clj-new
: you are not
required to open source this project, nor are you required to use EPLv1.0!
Feel free to remove or change the LICENSE
file and remove or update this
section of the
Distributed under the Eclipse Public License version 1.0.