-
-
Notifications
You must be signed in to change notification settings - Fork 36
/
Copy pathtech.v3.dataset.categorical.html
36 lines (35 loc) · 14.9 KB
/
tech.v3.dataset.categorical.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.categorical documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">7.042</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch current"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.categorical.html#var-dataset-.3Ecategorical-maps"><div class="inner"><span>dataset->categorical-maps</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.categorical.html#var-fit-categorical-map"><div class="inner"><span>fit-categorical-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.categorical.html#var-fit-one-hot"><div class="inner"><span>fit-one-hot</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.categorical.html#var-invert-categorical-map"><div class="inner"><span>invert-categorical-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.categorical.html#var-invert-one-hot-map"><div class="inner"><span>invert-one-hot-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.categorical.html#var-reverse-map-categorical-xforms"><div class="inner"><span>reverse-map-categorical-xforms</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.categorical.html#var-transform-categorical-map"><div class="inner"><span>transform-categorical-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.categorical.html#var-transform-one-hot"><div class="inner"><span>transform-one-hot</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.categorical</h1><div class="doc"><div class="markdown"><p>Conversions of categorical values into numbers and back. Two forms of conversions
are supported, a straight value->integer map and one-hot encoding.</p>
<p>The functions in this namespace manipulate the metadata on the columns of the dataset, wich can be inspected via <code>clojure.core/meta</code></p>
</div></div><div class="public anchor" id="var-dataset-.3Ecategorical-maps"><h3>dataset->categorical-maps</h3><div class="usage"><code>(dataset->categorical-maps dataset)</code></div><div class="doc"><div class="markdown"><p>Given a dataset, return a sequence of categorical map entries.</p>
<pre><code class="language-clojure">user> (ds-cat/dataset->categorical-maps catds)
({:lookup-table {:a 0, :b 1, :c 2, :d 3},
:src-column :x,
:result-datatype :float64}
{:lookup-table {:a 0, :b 1, :c 2, :d 3},
:src-column :y,
:result-datatype :float64})
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/categorical.clj#L145">view source</a></div></div><div class="public anchor" id="var-fit-categorical-map"><h3>fit-categorical-map</h3><div class="usage"><code>(fit-categorical-map dataset colname & [table-args res-dtype])</code></div><div class="doc"><div class="markdown"><p>Given a column, map it into an numeric space via a discrete map of values
to integers. This fits the categorical transformation onto the column and returns
the transformation.</p>
<p>If <code>table-args</code> is not given, the distinct column values will be mapped into 0..x without any specific order.</p>
<p>'table-args` allows to specify the precise mapping as a sequence of pairs of <a href="val idx">val idx</a> or as a sorted seq of values.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/categorical.clj#L78">view source</a></div></div><div class="public anchor" id="var-fit-one-hot"><h3>fit-one-hot</h3><div class="usage"><code>(fit-one-hot dataset colname & [table-args res-dtype])</code></div><div class="doc"><div class="markdown"><p>Fit a one hot transformation to a column. Returns a reusable transformation.
Maps each unique value to a column with 1 every time the value appears in the
original column and 0 otherwise.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/categorical.clj#L200">view source</a></div></div><div class="public anchor" id="var-invert-categorical-map"><h3>invert-categorical-map</h3><div class="usage"><code>(invert-categorical-map dataset {:keys [src-column lookup-table], :as opts})</code></div><div class="doc"><div class="markdown"><p>Invert a categorical map returning the column to the original set of values.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/categorical.clj#L163">view source</a></div></div><div class="public anchor" id="var-invert-one-hot-map"><h3>invert-one-hot-map</h3><div class="usage"><code>(invert-one-hot-map dataset {:keys [one-hot-table src-column], :as opts})</code></div><div class="doc"><div class="markdown"><p>Invert a one-hot transformation removing the one-hot columns and adding back the
original column.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/categorical.clj#L286">view source</a></div></div><div class="public anchor" id="var-reverse-map-categorical-xforms"><h3>reverse-map-categorical-xforms</h3><div class="usage"><code>(reverse-map-categorical-xforms dataset)</code></div><div class="doc"><div class="markdown"><p>Given a dataset where we have converted columns from a categorical representation
to either a numeric reprsentation or a one-hot representation, reverse map
back to the original dataset given the reverse mapping of label->number in
the column's metadata.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/categorical.clj#L318">view source</a></div></div><div class="public anchor" id="var-transform-categorical-map"><h3>transform-categorical-map</h3><div class="usage"><code>(transform-categorical-map dataset fit-data)</code></div><div class="doc"><div class="markdown"><p>Apply a categorical mapping transformation fit with fit-categorical-map.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/categorical.clj#L111">view source</a></div></div><div class="public anchor" id="var-transform-one-hot"><h3>transform-one-hot</h3><div class="usage"><code>(transform-one-hot dataset one-hot-fit-data)</code></div><div class="doc"><div class="markdown"><p>Apply a one-hot transformation to a dataset</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/categorical.clj#L224">view source</a></div></div></div></body></html>