-
-
Notifications
You must be signed in to change notification settings - Fork 1
/
tech.v3.dataset.html
273 lines (261 loc) · 38.4 KB
/
tech.v3.dataset.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name"></span> <span class="project-version">1.000-beta-25</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3 branch current"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-3"><a href="tech.v3.datatype.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datatype</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.argops.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>argops</span></div></a></li><li class="depth-4"><a href="tech.v3.datatype.functional.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>functional</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -83px;"><span class="top" style="height: 92px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4"><a href="tech.v3.libs.cljs-ajax.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cljs-ajax</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.html#var--.3E.3Edataset"><div class="inner"><span>->>dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var--.3Edataset"><div class="inner"><span>->dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-column"><div class="inner"><span>column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-column-.3Edata"><div class="inner"><span>column->data</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-column-count"><div class="inner"><span>column-count</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-column-map"><div class="inner"><span>column-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-column-names"><div class="inner"><span>column-names</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-columns"><div class="inner"><span>columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-concat"><div class="inner"><span>concat</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-data-.3Ecolumn"><div class="inner"><span>data->column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-data-.3Edataset"><div class="inner"><span>data->dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-dataset-.3Edata"><div class="inner"><span>dataset->data</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-dataset-.3Etransit-str"><div class="inner"><span>dataset->transit-str</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-dataset.3F"><div class="inner"><span>dataset?</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-filter"><div class="inner"><span>filter</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-filter-column"><div class="inner"><span>filter-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-filter-dataset"><div class="inner"><span>filter-dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-group-by"><div class="inner"><span>group-by</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-group-by-column"><div class="inner"><span>group-by-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-head"><div class="inner"><span>head</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-intersect-missing-sets"><div class="inner"><span>intersect-missing-sets</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-merge-by-column"><div class="inner"><span>merge-by-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-missing"><div class="inner"><span>missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-remove-columns"><div class="inner"><span>remove-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-remove-missing"><div class="inner"><span>remove-missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-remove-rows"><div class="inner"><span>remove-rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-rename-columns"><div class="inner"><span>rename-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-replace-missing"><div class="inner"><span>replace-missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-reverse-rows"><div class="inner"><span>reverse-rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-row-at"><div class="inner"><span>row-at</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-row-count"><div class="inner"><span>row-count</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-row-map"><div class="inner"><span>row-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-rows"><div class="inner"><span>rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-rowvec-at"><div class="inner"><span>rowvec-at</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-rowvecs"><div class="inner"><span>rowvecs</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-select"><div class="inner"><span>select</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-select-columns"><div class="inner"><span>select-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-select-missing"><div class="inner"><span>select-missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-select-rows"><div class="inner"><span>select-rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-soft-select-columns"><div class="inner"><span>soft-select-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-sort-by"><div class="inner"><span>sort-by</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-sort-by-column"><div class="inner"><span>sort-by-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-tail"><div class="inner"><span>tail</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-transit-file-.3Edataset"><div class="inner"><span>transit-file->dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-transit-read-handler-map"><div class="inner"><span>transit-read-handler-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-transit-str-.3Edataset"><div class="inner"><span>transit-str->dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-transit-write-handler-map"><div class="inner"><span>transit-write-handler-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-union-missing-sets"><div class="inner"><span>union-missing-sets</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-unique-by"><div class="inner"><span>unique-by</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-unique-by-column"><div class="inner"><span>unique-by-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.html#var-update"><div class="inner"><span>update</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset</h1><div class="doc"><div class="markdown"><p>Dataframe (map of columns) data processing system for clojurescript.
This API is a simplified version of the
<a href="https://techascent.github.io/tech.ml.dataset/">jvm-version's api</a>.</p>
<p>Datasets are maps of columns so assoc will add a new column and dissoc
can remove a column. In addition they allow very fast subrect selection,
filtering, sorting, concatenation and grouping (group-by). The columnwise
analogues are always a lot faster than the general analogues so for instance
<code>sort-by-column</code> is much faster than <code>sort-by</code>.</p>
<p>Datasets serialize and deserialize to transit (or anything else) much faster
than a sequence of maps and they take up less memory overall.</p>
<pre><code class="language-clojure">cljs.user> (require '[tech.v3.dataset :as ds])
nil
cljs.user> (-> (ds/->dataset {:a (range 100)
:b (take 100 (cycle ["hey" "you" "goonies"]))})
(ds/head))
#dataset[unnamed [5 2]
| :a | :b |
|---:|---------|
| 0 | hey |
| 1 | you |
| 2 | goonies |
| 3 | hey |
| 4 | you |]
</code></pre>
</div></div><div class="public anchor" id="var--.3E.3Edataset"><h3>->>dataset</h3><div class="usage"><code>(->>dataset options data)</code><code>(->>dataset data)</code></div><div class="doc"><div class="markdown"><p>data-last analogue of <a href="tech.v3.dataset.html#var--.3Edataset">->dataset</a> for use in <code>->></code> macros.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L136">view source</a></div></div><div class="public anchor" id="var--.3Edataset"><h3>->dataset</h3><div class="usage"><code>(->dataset data options)</code><code>(->dataset data)</code><code>(->dataset)</code></div><div class="doc"><div class="markdown"><p>Convert either a sequence of maps or a map of columns into a dataset.
Options are similar to the jvm version of tech.v3.dataset in terms of
parser-fn. This function can take either a sequence of maps or a map of
columns.</p>
<p>Examples:</p>
<pre><code class="language-clojure">cljs.user> (->> (ds/->dataset {:a (range 100)
:b (take 100 (cycle ["hey" "you" "goonies"]))})
(vals)
(map (comp :datatype meta)))
(:float64 :string)
cljs.user> (->> (ds/->dataset {:a (range 100)
:b (take 100 (cycle ["hey" "you" "goonies"]))}
{:parser-fn {:a :int8}})
(vals)
(map (comp :datatype meta)))
(:int8 :string)
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L102">view source</a></div></div><div class="public anchor" id="var-column"><h3>column</h3><div class="usage"><code>(column ds k)</code></div><div class="doc"><div class="markdown"><p>Return the column at positing k. Failing to find the column is an error.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L189">view source</a></div></div><div class="public anchor" id="var-column-.3Edata"><h3>column->data</h3><div class="usage"><code>(column->data col)</code></div><div class="doc"><div class="markdown"><p>Transform a column in raw data safe for passing to transit or edn.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L854">view source</a></div></div><div class="public anchor" id="var-column-count"><h3>column-count</h3><div class="usage"><code>(column-count ds)</code></div><div class="doc"><div class="markdown"><p>Integer column count of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L169">view source</a></div></div><div class="public anchor" id="var-column-map"><h3>column-map</h3><div class="usage"><code>(column-map dataset result-colname map-fn res-dtype-or-opts filter-fn-or-ds)</code><code>(column-map dataset result-colname map-fn filter-fn-or-ds)</code><code>(column-map dataset result-colname map-fn)</code></div><div class="doc"><div class="markdown"><p>Produce a new (or updated) column as the result of mapping a fn over columns.</p>
<ul>
<li><code>dataset</code> - dataset.</li>
<li><code>result-colname</code> - Name of new (or existing) column.</li>
<li><code>map-fn</code> - function to map over columns. Same rules as <code>tech.v3.datatype/emap</code>.</li>
<li><code>res-dtype-or-opts</code> - If not given result is scanned to infer missing and datatype.
If using an option map, options are described below.</li>
<li><code>filter-fn-or-ds</code> - A dataset, a sequence of columns, or a <code>tech.v3.datasets/column-filters</code>
column filter function. Defaults to all the columns of the existing dataset.</li>
</ul>
<p>Returns a new dataset with a new or updated column.</p>
<p>Options:</p>
<ul>
<li><code>:datatype</code> - Set the dataype of the result column. If not given result is scanned
to infer result datatype and missing set.</li>
<li><code>:missing-fn</code> - if given, columns are first passed to missing-fn as a sequence and
this dictates the missing set. Else the missing set is by scanning the results
during the inference process. See <code>tech.v3.dataset.column/union-missing-sets</code> and
<code>tech.v3.dataset.column/intersect-missing-sets</code> for example functions to pass in
here.</li>
</ul>
<p>Examples:</p>
<pre><code class="language-clojure">
;;From the tests --
(let [testds (ds/->dataset [{:a 1.0 :b 2.0} {:a 3.0 :b 5.0} {:a 4.0 :b nil}])]
;;result scanned for both datatype and missing set
(is (= (vec [3.0 6.0 nil])
(:b2 (ds/column-map testds :b2 #(when % (inc %)) [:b]))))
;;result scanned for missing set only. Result used in-place.
(is (= (vec [3.0 6.0 nil])
(:b2 (ds/column-map testds :b2 #(when % (inc %))
{:datatype :float64} [:b]))))
;;Nothing scanned at all.
(is (= (vec [3.0 6.0 nil])
(:b2 (ds/column-map testds :b2 #(inc %)
{:datatype :float64
:missing-fn ds-col/union-missing-sets} [:b]))))
;;Missing set scanning causes NPE at inc.
(is (thrown? Throwable
(ds/column-map testds :b2 #(inc %)
{:datatype :float64}
[:b]))))
;;Ad-hoc repl --
user> (require '[tech.v3.dataset :as ds]))
nil
user> (def ds (ds/->dataset "test/data/stocks.csv"))
#'user/ds
user> (ds/head ds)
test/data/stocks.csv [5 3]:
| symbol | date | price |
|--------|------------|-------|
| MSFT | 2000-01-01 | 39.81 |
| MSFT | 2000-02-01 | 36.35 |
| MSFT | 2000-03-01 | 43.22 |
| MSFT | 2000-04-01 | 28.37 |
| MSFT | 2000-05-01 | 25.45 |
user> (-> (ds/column-map ds "price^2" #(* % %) ["price"])
(ds/head))
test/data/stocks.csv [5 4]:
| symbol | date | price | price^2 |
|--------|------------|-------|-----------|
| MSFT | 2000-01-01 | 39.81 | 1584.8361 |
| MSFT | 2000-02-01 | 36.35 | 1321.3225 |
| MSFT | 2000-03-01 | 43.22 | 1867.9684 |
| MSFT | 2000-04-01 | 28.37 | 804.8569 |
| MSFT | 2000-05-01 | 25.45 | 647.7025 |
user> (def ds1 (ds/->dataset [{:a 1} {:b 2.0} {:a 2 :b 3.0}]))
#'user/ds1
user> ds1
_unnamed [3 2]:
| :b | :a |
|----:|---:|
| | 1 |
| 2.0 | |
| 3.0 | 2 |
user> (ds/column-map ds1 :c (fn [a b]
(when (and a b)
(+ (double a) (double b))))
[:a :b])
_unnamed [3 3]:
| :b | :a | :c |
|----:|---:|----:|
| | 1 | |
| 2.0 | | |
| 3.0 | 2 | 5.0 |
user> (ds/missing (*1 :c))
{0,1}
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L640">view source</a></div></div><div class="public anchor" id="var-column-names"><h3>column-names</h3><div class="usage"><code>(column-names ds)</code></div><div class="doc"><div class="markdown"><p>Return the column names as a sequence.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L183">view source</a></div></div><div class="public anchor" id="var-columns"><h3>columns</h3><div class="usage"><code>(columns ds)</code></div><div class="doc"><div class="markdown"><p>Return the columns, in order, of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L177">view source</a></div></div><div class="public anchor" id="var-concat"><h3>concat</h3><div class="usage"><code>(concat ds & args)</code><code>(concat)</code></div><div class="doc"><div class="markdown"><p>This is a copying concatenation so the result will be realized. Missing columns
will be filled in with missing values.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L427">view source</a></div></div><div class="public anchor" id="var-data-.3Ecolumn"><h3>data->column</h3><div class="usage"><code>(data->column {:keys [metadata missing data]})</code></div><div class="doc"><div class="markdown"><p>Transform data produced via column->data into a column</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L914">view source</a></div></div><div class="public anchor" id="var-data-.3Edataset"><h3>data->dataset</h3><div class="usage"><code>(data->dataset ds-data)</code></div><div class="doc"><div class="markdown"><p>Given data produced via dataset->data create a new dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L948">view source</a></div></div><div class="public anchor" id="var-dataset-.3Edata"><h3>dataset->data</h3><div class="usage"><code>(dataset->data ds)</code></div><div class="doc"><div class="markdown"><p>Convert a dataset into a pure data datastructure save for transit or direct json
serialization. Uses base64 encoding of numeric data.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L877">view source</a></div></div><div class="public anchor" id="var-dataset-.3Etransit-str"><h3>dataset->transit-str</h3><div class="usage"><code>(dataset->transit-str ds & [format handlers])</code></div><div class="doc"><div class="markdown"><p>Write a transit string adding in the dataset write handler</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L965">view source</a></div></div><div class="public anchor" id="var-dataset.3F"><h3>dataset?</h3><div class="usage"><code>(dataset? ds)</code></div><div class="doc"><div class="markdown"><p>Return true of this is a dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L144">view source</a></div></div><div class="public anchor" id="var-filter"><h3>filter</h3><div class="usage"><code>(filter ds pred)</code></div><div class="doc"><div class="markdown"><p>Filter the dataset. Pred gets passed each row as a map.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L332">view source</a></div></div><div class="public anchor" id="var-filter-column"><h3>filter-column</h3><div class="usage"><code>(filter-column ds colname & [pred])</code></div><div class="doc"><div class="markdown"><p>Filter the dataset by column colname. If pred isn't passed in the column's values
are treated as truthy.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L322">view source</a></div></div><div class="public anchor" id="var-filter-dataset"><h3>filter-dataset</h3><div class="usage"><code>(filter-dataset dataset filter-fn-or-ds)</code></div><div class="doc"><div class="markdown"><p>Filter the columns of the dataset returning a new dataset. This pathway is
designed to work with the tech.v3.dataset.column-filters namespace.</p>
<ul>
<li>If filter-fn-or-ds is a dataset, it is returned.</li>
<li>If filter-fn-or-ds is sequential, then select-columns is called.</li>
<li>If filter-fn-or-ds is :all, all columns are returned</li>
<li>If filter-fn-or-ds is an instance of IFn, the dataset is passed into it.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L588">view source</a></div></div><div class="public anchor" id="var-group-by"><h3>group-by</h3><div class="usage"><code>(group-by ds f)</code></div><div class="doc"><div class="markdown"><p>Group the dataset by the values returned from passing f over each row, represented as a
map, of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L375">view source</a></div></div><div class="public anchor" id="var-group-by-column"><h3>group-by-column</h3><div class="usage"><code>(group-by-column ds colname)</code></div><div class="doc"><div class="markdown"><p>Group the dataset by column colname</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L364">view source</a></div></div><div class="public anchor" id="var-head"><h3>head</h3><div class="usage"><code>(head ds n)</code><code>(head ds)</code></div><div class="doc"><div class="markdown"><p>Return the first n rows of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L304">view source</a></div></div><div class="public anchor" id="var-intersect-missing-sets"><h3>intersect-missing-sets</h3><div class="usage"><code>(intersect-missing-sets col-seq)</code></div><div class="doc"><div class="markdown"><p>Intersect the missing sets of the columns</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L772">view source</a></div></div><div class="public anchor" id="var-merge-by-column"><h3>merge-by-column</h3><div class="usage"><code>(merge-by-column lhs rhs colname)</code></div><div class="doc"><div class="markdown"><p>Merge rows assuming left, right have the same columns. Left is taken first then
any right not appear with left are appended. This is far less general but much
faster than a join operation; it is useful for merging timeseries data.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L477">view source</a></div></div><div class="public anchor" id="var-missing"><h3>missing</h3><div class="usage"><code>(missing ds-or-col)</code></div><div class="doc"><div class="markdown"><p>Return the missing set as a clojure set. The underlying protocol returns
missing sets as js sets as those have superior performance when using numbers.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L151">view source</a></div></div><div class="public anchor" id="var-remove-columns"><h3>remove-columns</h3><div class="usage"><code>(remove-columns ds colnames)</code></div><div class="doc"><div class="markdown"><p>Remove these columns from the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L277">view source</a></div></div><div class="public anchor" id="var-remove-missing"><h3>remove-missing</h3><div class="usage"><code>(remove-missing ds)</code></div><div class="doc"><div class="markdown"><p>Remove missing rows from a dataset or column</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L242">view source</a></div></div><div class="public anchor" id="var-remove-rows"><h3>remove-rows</h3><div class="usage"><code>(remove-rows ds rowidxs)</code></div><div class="doc"><div class="markdown"><p>Remove these row indexes out of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L229">view source</a></div></div><div class="public anchor" id="var-rename-columns"><h3>rename-columns</h3><div class="usage"><code>(rename-columns ds rename-map)</code></div><div class="doc"><div class="markdown"><p>Given a map of old-name->new-name, rename some subset of columns
without changing their column order.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L286">view source</a></div></div><div class="public anchor" id="var-replace-missing"><h3>replace-missing</h3><div class="usage"><code>(replace-missing ds colnames & [replace-cmd])</code></div><div class="doc"><div class="markdown"><p>Replace missing values in dataset.</p>
<ul>
<li>colnames one or more columns to run replace cmd</li>
<li>replace-cmd - one of <code>:first</code> <code>:last</code> <code>:lerp</code> <code>[:value val]</code> ifn</li>
</ul>
<p>If replace-cmd is an ifn it will be given the column-datatype first and last arguments
in the missing span and the number of missing elements. Either the first or last may be
nil if the missing span is at the beginning or end. In the case where all values are
missing both arguments may be nil.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L503">view source</a></div></div><div class="public anchor" id="var-reverse-rows"><h3>reverse-rows</h3><div class="usage"><code>(reverse-rows ds)</code></div><div class="doc"><div class="markdown"><p>Reverse the order of the rows of a dataset or a column</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L254">view source</a></div></div><div class="public anchor" id="var-row-at"><h3>row-at</h3><div class="usage"><code>(row-at ds idx)</code></div><div class="doc"><div class="markdown"><p>Get row as a map at index idx. Negative indexes index from the end.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L208">view source</a></div></div><div class="public anchor" id="var-row-count"><h3>row-count</h3><div class="usage"><code>(row-count ds-or-col)</code></div><div class="doc"><div class="markdown"><p>Integer row count of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L161">view source</a></div></div><div class="public anchor" id="var-row-map"><h3>row-map</h3><div class="usage"><code>(row-map ds map-fn & [options])</code></div><div class="doc"><div class="markdown"><p>Map a function across the rows of the dataset producing a new dataset
that is merged back into the original potentially replacing existing columns.
Options are passed into the <a href="tech.v3.dataset.html#var--.3Edataset">->dataset</a> function so you can control the resulting
column types by the usual dataset parsing options described there.</p>
<p>Examples:</p>
<pre><code class="language-clojure">cljs.user> (def stocks (ds/transit-file->dataset "test/data/stocks.transit-json"))
#'cljs.user/stocks
cljs.user> (ds/head stocks)
#dataset[https://github.com/techascent/tech.ml.dataset/raw/master/test/data/stocks.csv [5 3]
| :symbol | :date | :price |
|---------|------------|-------:|
| MSFT | 2000-01-01 | 39.81 |
| MSFT | 2000-02-01 | 36.35 |
| MSFT | 2000-03-01 | 43.22 |
| MSFT | 2000-04-01 | 28.37 |
| MSFT | 2000-05-01 | 25.45 |]
cljs.user> (ds/head (ds/row-map stocks (fn [row]
{:symbol (keyword (row :symbol))
:price2 (* (row :price)(row :price))})))
#dataset[https://github.com/techascent/tech.ml.dataset/raw/master/test/data/stocks.csv [5 4]
| :symbol | :date | :price | :price2 |
|---------|------------|-------:|--------------:|
| :MSFT | 2000-01-01 | 39.81 | 1584.83610000 |
| :MSFT | 2000-02-01 | 36.35 | 1321.32250000 |
| :MSFT | 2000-03-01 | 43.22 | 1867.96840000 |
| :MSFT | 2000-04-01 | 28.37 | 804.85690000 |
| :MSFT | 2000-05-01 | 25.45 | 647.70250000 |]
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L778">view source</a></div></div><div class="public anchor" id="var-rows"><h3>rows</h3><div class="usage"><code>(rows ds)</code></div><div class="doc"><div class="markdown"><p>Get a sequence of maps from a dataset</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L197">view source</a></div></div><div class="public anchor" id="var-rowvec-at"><h3>rowvec-at</h3><div class="usage"><code>(rowvec-at ds idx)</code></div><div class="doc"><div class="markdown"><p>Get row as a vec of values at index idx. Negative indexes index from the end.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L214">view source</a></div></div><div class="public anchor" id="var-rowvecs"><h3>rowvecs</h3><div class="usage"><code>(rowvecs ds)</code></div><div class="doc"><div class="markdown"><p>Get a sequence of persistent vectors from a dataset</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L202">view source</a></div></div><div class="public anchor" id="var-select"><h3>select</h3><div class="usage"><code>(select ds cols rows)</code></div><div class="doc"><div class="markdown"><p>Select a subrect of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L297">view source</a></div></div><div class="public anchor" id="var-select-columns"><h3>select-columns</h3><div class="usage"><code>(select-columns ds colnames)</code></div><div class="doc"><div class="markdown"><p>Select these column in this order. This can be used both to select specific columns
and to set the order of columns. Columns not found are errors</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L262">view source</a></div></div><div class="public anchor" id="var-select-missing"><h3>select-missing</h3><div class="usage"><code>(select-missing ds)</code></div><div class="doc"><div class="markdown"><p>Select the missing rows from a dataset or a column</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L248">view source</a></div></div><div class="public anchor" id="var-select-rows"><h3>select-rows</h3><div class="usage"><code>(select-rows ds rowidxs)</code></div><div class="doc"><div class="markdown"><p>Select these row indexes out of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L220">view source</a></div></div><div class="public anchor" id="var-soft-select-columns"><h3>soft-select-columns</h3><div class="usage"><code>(soft-select-columns ds colnames)</code></div><div class="doc"><div class="markdown"><p>Select these columns in this order. Columns not found are quietly ignored. To get errors
for missing columns see <a href="tech.v3.dataset.html#var-select-columns">select-columns</a>.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L269">view source</a></div></div><div class="public anchor" id="var-sort-by"><h3>sort-by</h3><div class="usage"><code>(sort-by ds keyfn & [comp options])</code></div><div class="doc"><div class="markdown"><p>Sort dataset by keyfn. Keyfn is passed each row as a map.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L355">view source</a></div></div><div class="public anchor" id="var-sort-by-column"><h3>sort-by-column</h3><div class="usage"><code>(sort-by-column ds colname & [sort-op options])</code></div><div class="doc"><div class="markdown"><p>Sort the dataset by column colname. For sort options and the interaction between
sort-fn and the options see <a href="tech.v3.datatype.argops.html#var-argsort">tech.v3.datatype.argops/argsort</a>.</p>
<ul>
<li><code>sort-op</code> - a boolean binary predicate comparison operation such as < or >.</li>
</ul>
<p>Options:</p>
<ul>
<li><code>:nan-strategy</code> - defaults to <code>:last</code> - for numeric columns where to place missing values.
Options are <code>:first</code>, <code>:last</code>, <code>:exception</code>.</li>
<li><code>:comparator</code> - pass in a custom comparator - a function returning -1,0, or 1. If no
sort-op is passed in this defaults to <code>compare</code>.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L338">view source</a></div></div><div class="public anchor" id="var-tail"><h3>tail</h3><div class="usage"><code>(tail ds n)</code><code>(tail ds)</code></div><div class="doc"><div class="markdown"><p>Return the last n rows of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L313">view source</a></div></div><div class="public anchor" id="var-transit-file-.3Edataset"><h3>transit-file->dataset</h3><div class="usage"><code>(transit-file->dataset fname)</code></div><div class="doc"><div class="markdown"><p>Given a file of transit data return a dataset. This only works on Node.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L987">view source</a></div></div><div class="public anchor" id="var-transit-read-handler-map"><h3>transit-read-handler-map</h3><div class="usage"><code>(transit-read-handler-map)</code></div><div class="doc"><div class="markdown"><p>Return a map mapping the dataset tag to a transit read handler.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L973">view source</a></div></div><div class="public anchor" id="var-transit-str-.3Edataset"><h3>transit-str->dataset</h3><div class="usage"><code>(transit-str->dataset json-data & [format handlers])</code></div><div class="doc"><div class="markdown"><p>Parse a transit string adding in the dataset read handler</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L979">view source</a></div></div><div class="public anchor" id="var-transit-write-handler-map"><h3>transit-write-handler-map</h3><div class="usage"><code>(transit-write-handler-map)</code></div><div class="doc"><div class="markdown"><p>Return a map mapping the dataset type to a transit writer handler.</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L959">view source</a></div></div><div class="public anchor" id="var-union-missing-sets"><h3>union-missing-sets</h3><div class="usage"><code>(union-missing-sets col-seq)</code></div><div class="doc"><div class="markdown"><p>Union the missing sets of the columns</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L766">view source</a></div></div><div class="public anchor" id="var-unique-by"><h3>unique-by</h3><div class="usage"><code>(unique-by ds f)</code></div><div class="doc"><div class="markdown"><p>Unique-by taking first</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L411">view source</a></div></div><div class="public anchor" id="var-unique-by-column"><h3>unique-by-column</h3><div class="usage"><code>(unique-by-column ds colname)</code></div><div class="doc"><div class="markdown"><p>Unique-by taking first</p>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L387">view source</a></div></div><div class="public anchor" id="var-update"><h3>update</h3><div class="usage"><code>(update lhs-ds filter-fn-or-ds update-fn & args)</code></div><div class="doc"><div class="markdown"><p>Update this dataset. Filters this dataset into a new dataset,
applies update-fn, then merges the result into original dataset.</p>
<p>This pathways is designed to work with the tech.v3.dataset.column-filters namespace.</p>
<ul>
<li><code>filter-fn-or-ds</code> is a generalized parameter. May be a function,
a dataset or a sequence of column names.</li>
<li>update-fn must take the dataset as the first argument and must return
a dataset.</li>
</ul>
<pre><code class="language-clojure">(ds/bind-> (ds/->dataset dataset) ds
(ds/remove-column "Id")
(ds/update cf/string ds/replace-missing-value "NA")
(ds/update-elemwise cf/string #(get {"" "NA"} % %))
(ds/update cf/numeric ds/replace-missing-value 0)
(ds/update cf/boolean ds/replace-missing-value false)
(ds/update-columnwise (cf/union (cf/numeric ds) (cf/boolean ds))
#(dtype/elemwise-cast % :float64)))
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/cnuernber/tmdjs/blob/master/src/tech/v3/dataset.cljs#L613">view source</a></div></div></div></body></html>