DataWorker is designed to manage large datasets client-side, using modern web technologies such as Web Workers and WebSockets. DataWorker allows users to perform expensive operations (such as sorting, joining, searching, filtering, grouping, and much more) in a multi-threaded environment to maintain responsiveness. Data can be provided locally as an array, or remotely via AJAX or WebSockets. DataWorker has no dependencies on external libraries, but works well with others.
Not all browsers fully support all of the provided features (such as using a WebSocket from within a Web Worker) because some of the technology is quite new, so fallbacks are in place to make sure data can still be accessed across the most amount of devices possible.
To get started, you will need to include one of the distribution files in the dist/
folder: dataworker.js
for development/testing, and dataworker.min.js
for production. For example:
<script src="dist/dataworker.min.js"></script>
Local Data
DataWorker takes an array of records to construct the initial dataset. The first record defines column properties.
var dataset = [
[
{
name: "column_a",
title: "Column A",
aggType: "max",
sortType: "alpha"
},
{
name: "column_b",
title: "Column B",
aggType: "max",
sortType: "alpha"
},
{
name: "column_c",
title: "Column C",
aggType: "min",
sortType: "alpha"
}
],
[ "apple", "violin", "music" ],
[ "cat", "tissue", "dog" ],
[ "banana", "piano", "gum" ],
[ "gummy", "power", "star" ]
];
var dw = new DataWorker(dataset);
It is also possible to just supply column names:
var dataset = [
[ "column_a", "column_b", "column_c" ],
[ "apple", "violin", "music" ],
[ "cat", "tissue", "dog" ],
[ "banana", "piano", "gum" ],
[ "gummy", "power", "star" ]
];
var dw = new DataWorker(dataset);
If only column names are supplied, the following default column properties are used:
title
:columnName
aggType
: "max"sortType
: "alpha"
Additionally, you should give DataWorker an error handler (by default DataWorker uses console.error
) using the onError
method or by adding it as a property on the dataset object:
dataset.onError = function (msg) { alert(msg) };
new DataWorker(dataset);
or
var dw = new DataWorker(dataset).onError(function (errorMsg) {
alert(errorMsg);
});
Error handling
DataWorker can be provided with an error handler:
dw.onError(function (msg) {
alert(msg);
});
By default, errors are printed to the console.
Streaming Data
Via Websockets
Alternatively, DataWorker can stream data from a WebSocket server:
var dw = new DataWorker({
datasource : "ws://127.0.0.1:8888",
authenticate : "{}",
request : "{\"cmd\":\"requestDataset\"}"
});
datasource
should be the URL of the websocket server.
authenticate
is optional. If present, will be the first message sent to the server upon connecting.
request
is sent after authenticate
.
After the request is sent, DataWorker will expect a reply in the form of a JSON object with expectedNumRows
and columns
properties such as:
{
expectedNumRows : 10,
columns : [ "column_a", "column_b", "column_c" ]
}
expectedNumRows
should be the number of rows to expect in that dataset.
columns
should be the columns of the expected dataset. They can be provided as simple column names or as hashes with custom column properties.
Note DataWorker cannot proceed without first knowing the columns and the total number of expected rows. That being said, these two properties may be transmitted separately. For example:
// First message:
{ "expectedNumRows": 10 }
// Second message:
{ columns: [ "column_a", "column_b", "column_c" ] }
Afterwards, DataWorker will expect arrays of dataset rows from the server. These rows will be appended to the dataset. The reply will be similar to the following:
[
[ "apple", "violin", "music" ],
[ "cat", "tissue", "dog" ],
[ "banana", "piano", "gum" ],
[ "gummy", "power", "star" ]
]
The onReceiveColumns
callback will be called when columns and the expected number of rows are received from the Websocket server. This callback can be set with the onReceiveColumns
method:
dw.onReceiveColumns(function () {
alert("Columns have been received!");
});
Alternatively, this callback can be passed into the constructor:
var dw = new DataWorker({
datasource : "ws://127.0.0.1:8888",
authenticate : "{}",
request : "{\"cmd\":\"requestDataset\"}",
onReceiveColumns : function (numRows) {
alert("Columns have been received!");
}
});
The onReceiveRows
callback will be called whenever rows are received from the Websocket server. This callback can be set with the onReceiveRows
method:
dw.onReceiveRows(function (numRows) {
alert("Received " + numRows + " rows.");
});
Alternatively, this callback can be passed into the constructor:
var dw = new DataWorker({
datasource : "ws://127.0.0.1:8888",
authenticate : "{}",
request : "{\"cmd\":\"requestDataset\"}",
onReceiveRows : function (numRows) {
alert(numRows + " rows received.");
}
});
If not set, this callback defaults to doing nothing.
The onAllRowsReceived
callback is called when all expected rows have been received from the Websocket server. This callback can be set with the onAllRowsReceived
method:
dw.onAllRowsReceived(function () {
alert("All rows have been received!");
});
Alternatively, this callback can be passed into the constructor:
var dw = new DataWorker({
datasource : "ws://127.0.0.1:8888",
authenticate : "{}",
request : "{\"cmd\":\"requestDataset\"}",
onReceiveRows : function (numRows) {
alert(numRows + " rows received.");
}
onAllRowsReceived : function () {
alert("All rows have been received!");
}
});
If not set, this callback does nothing.
Via AJAX
If WebSockets are not available, data can be received via AJAX as well:
var dw = new DataWorker({
datasource : "http://127.0.0.1:8888",
request : "?this=is;a=query;string",
onAllRowsReceived : function () {
alert("all rows have been received!");
}
});
datasource
should be the base url to send GET requests to.
request
should be the query string to append to the base url. The prepended question mark is optional; if it is missing, one will automatically be inserted. Optionally, request
could be a javascript object or a JSON string; DataWorker will convert this into a query string.
Using fallbacks
If multiple data sources exist, it is possible to provide fallbacks when instantiating DataWorker. This can be useful if you have a server down for maintenance or your WebSocket service is not running. Simply provide the possible sources as an array to datasource
in priority order:
var dw = new DataWorker({
request : { cmd: "requestDataset" },
datasource : [
"https://example.com",
"ws://127.0.0.1:8888",
"http://127.0.0.1:9999"
]
});
The sources will be tried in order. If they all fail, DataWorker will throw an error.
For finer-grain control, authentications can be supplied on a per-datasource basis:
var dw = new DataWorker({
request : "\"cmd\":\"requestDataset\"}",
datasource : [
{
source : "ws://127.0.0.1:8888",
authenticate : "zxcv"
},
{
source : "http://127.0.0.1/data-origin",
authenticate : "asdf"
}
]
});
Requesting further data
A new dataset can be requested using the requestDataset
method:
// For WebSockets or AJAX:
dw.requestDataset("{\"cmd\":\"requestDataset\"}");
// For AJAX only:
dw.requestDataset("query=string");
The newly-requested dataset will completely replace the previously-requested dataset.
A new dataset can be requested and appended to the current dataset using the requestDatasetForAppend
method:
// For WebSockets or AJAX:
dw.requestDatasetForAppend("{\"cmd\":\"requestDataset\"}");
// For AJAX only:
dw.requestDatasetForAppend("query=string");
Note that the newly-requested dataset should have the same columns as the previously-requested dataset.
Cancelling ongoing requests
Ongoing requests can be cancelled with the cancelOnoingRequests
method:
dw.cancelOngoingRequests();
When requestDataset
is called, ongoing requests are automatically cancelled (requestDatasetForAppend
does not cancel ongoing requests).
When connected to a Websocket server, DataWorker will send the cancelRequestsCmd
message signalling the server to cancel the previous requests if such a message has been provided:
var dw = new DataWorker({
datasource: {
source : "ws://127.0.0.1:8888",
cancelRequestsCmd : "CANCEL",
cancelRequestsAck : "ACK_CANCEL"
}
});
If cancelRequestsAck
has been provided, DataWorker will wait for the server to acknowledge with a reply that matches cancelRequestsAck
exactly before proceeding.
Triggers
DataWorker can pass messages to and from the remote server via the postMessage
method and triggers:
dw.onTrigger(function (reply) {
alert(reply);
}).postMessage("MESSAGE TO SERVER");
In the above example, the string "MESSAGE TO SERVER" is sent to the server, and any reply will by shown as an alert. Server replies that are triggers should be stringified JSON objects that look like this:
{
trigger: true,
msg: "REPLY FROM SERVER"
}
The onTrigger
callback can be passed into the constructor as well:
var dw = new DataWorker({
datasource : "ws://127.0.0.1:8888",
authenticate : "{}",
onTrigger: function (reply) { alert(reply) }
});
Web Workers
Web Workers are enabled by default, to get the most performance out of DataWorker. Without any extra configuration, a Blob will be used to create the Web Worker. However, some browsers do not support remote data from within a Web Worker created in that fashion, though they work fine when created using a standard URL. DataWorker will detect these errors and look for a file named dw-helper.js
in the same folder as dw.js
. THe method used to determine the source path does not work well in some circumstances, for example when a module loader like RequireJS. To provide alternative sources, extra options can be provided when instantiating DataWorker.
If workerSource
is provided, it will be used first. If that source fails to create a Web Worker, or an error occurs on connection to the datasource, a Blob will be created next to try again. If the Blob fails, then backupWorkerSource
will be used next. If that fails, or wasn't provided, DataWorker will make it's best guess as to the location of dw-helper.js
. When all else fails, DataWorker will revert to a single-threaded environment. Finally, an error will be thrown if a connection still cannot be made and no local dataset is provided.
new DataWorker({
datasource: "https://example.com/dataset.json",
workerSource: "/path/to/dw-helper.js",
backupWorkerSource: "/other/path/to/dw-helper.js"
});
There is an initial cost to creating a Web Worker, and they are meant to stay alive as long as possible. This can be important for avoiding memory leaks in single page applications. Rather than terminating a Web Worker, DataWorkerHelper is designed to idle when not in use and be reused when needed. To clean up an instance of DataWorker, simply call finish(optionalCallback)
on the instance. If a callback is included it will be called once the Web Worker has responded that it is ready to be reused. Simply creating a new DataWorker will reuse that Web Worker.
Web Workers are still an experimental technology, and may have some issues. If you prefer to avoid using Web Workers at all, you may pass in a flag to force it to be single-threaded.
new DataWorker({
datasource: "https://example.com/dataset.json",
forceSingleThread: true
});
Clone
A DataWorker dataset can be deep-copied by calling clone
:
dw.clone(function (newD) {
newDataset = newD;
});
Note that any streaming datasources are not copied as part of the cloning process.
Child Rows
DataWorker has a concept of child rows. Child rows are rows that are a subset of another row, and may be used, for example, to provide more detail. The functionality is currently limited to a small subset of functions. Where use of child rows is not explicitly defined, they are treated as if they are distinct rows that have been added to the dataset. These are the currently supported functions:
Display Values
DataWorker can use differing values for display and dataset operations. This enables you to include HTML formatting for cell values but not have to consider them when actually working with the dataset. To take advantage of this feature, simply pass the cell value an object with the display
and raw
properties defined:
var dataset = [
[ "column_a", "column_b", "column_c" ],
[ { display: "<i>apple</i>", raw: "apple" }, "violin", "music" ],
[ "cat", "tissue", "dog" ],
[ "banana", "piano", "gum" ]
];
var d = new DataWorker(dataset);
d.getRows(function (rows) {
// rows is [
// [ "<i>apple</i>", "violin", "music" ],
// [ "cat", "tissue", "dog" ],
// [ "banana", "piano", "gum" ]
// ]
});
d.applyFilter(/^apple^/, "column_a"); // Operates on the raw value
d.getRows(function (rows) {
// rows is [
// [ "<i>apple</i>", "violin", "music" ],
// ]
});
The display
value will be returned by the methods that allow access to the dataset, while the methods that manipulate the dataset will operate on the raw
value. Also note that the two methods of specifying cell values can be mixed.
Alter columns
Column properties can be altered after DataWorker is instantiated.
Alter column name
To change the name of column_a
to column_a1
:
dw.alterColumnName("column_a", "column_a1");
Alter column title
To change the title of column_a
to Things I Love
:
dw.alterColumnTitle("column_a", "Things I Love");
Alter column aggregate type
To change the aggregate type of column_a
to min
:
dw.alterColumnAggregateType("column_a", "min");
Valid aggregate types are:
max
min
sum
Alter column sort type
To change the sort type of column_a
to num
:
dw.alterColumnSortType("column_a", "num");
Valid sort types are:
alpha
num
Alternatively, you may pass in a sort function that takes two arguments (a, b)
and returns -1
for a < b
, 1
for a > b
or 0
for a == b
.
Prepend column names
To prepend a_
to all column names:
dw.prependColumnNames("a_");
Append
The append
method is used to concatenate two datasets together. The following appends dataset2
to dataset1
:
var dataset1 = [
[ "column_a", "column_b", "column_c" ],
[ "apple", "violin", "music" ],
[ "cat", "tissue", "dog" ],
[ "banana", "piano", "gum" ],
];
var dataset2 = [
[ "column_a", "column_b", "column_c" ],
[ "gummy", "power", "apple" ],
[ "car", "screen", "phone" ],
[ "sign", "bagel", "chips" ]
];
var dw = new DataWorker(dataset1);
dw.append(dataset2);
Alternatively, you may also append a DataWorker dataset:
var dataset1 = [
[ "column_a", "column_b", "column_c" ],
[ "apple", "violin", "music" ],
[ "cat", "tissue", "dog" ],
[ "banana", "piano", "gum" ]
];
var dataset2 = [
[ "column_a", "column_b", "column_c" ],
[ "gummy", "power", "apple" ],
[ "car", "screen", "phone" ],
[ "sign", "bagel", "chips" ]
];
var d1 = new DataWorker(dataset1);
var d2 = new DataWorker(dataset2);
d1.append(d2);
Note that column names must match up; an error will be thrown otherwise.
Filter
The applyFilter
method is used to filter out rows that do not contain the specified regex. The following filters out any row that does not contain the word "apple":
var dataset = [
[ "column_a", "column_b", "column_c" ],
[ "apple", "red", "fuji" ],
[ "apple", "green", "granny smith" ],
[ "apple", "yellow", "golden delicious" ],
[ "banana", "green", "unripe" ],
[ "banana", "yellow", "ripe" ],
[ "banana", "brown", "beyond ripe" ],
[ "banana", "black", "rotten/frozen" ]
], dw = new DataWorker(dataset);
dw.applyFilter(/\bapple\b/)
/* This results in the following rows:
[ "apple", "red", "fuji" ],
[ "apple", "green", "granny smith" ],
[ "apple", "yellow", "golden delicious" ]
*/
You may also filter only on certain columns:
dw.applyFilter(/\bapple\b/, "column_a", "column_b");
Note that the following also works (and results in the exact same dataset):
dw.applyFilter(/\bapple\b/, [ "column_a", "column_b" ]);
Complex Filters
You may also use the complex syntax, in which all filters provided must find a match for the row to be visible. The complex filters can take any of the following arguments:
columns
(array of strings or single string): Columns on which to filter (defaults to all columns)column
(single string): Single column on which to filter (defaults to all columns) (note: if bothcolumns
andcolumn
are defined, the latter will be used)matchAll
(boolean): If this flag is set totrue
then all columns provided must match the filter for the row to stay visible. By default, if any of the columns match the row will stay visibleaccentInsensitive
(boolean): If this flag is set totrue
then characters with accent marks will be treated as normal ascii character (e.g., applé matches both applé and apple).regex
(string or RegExp): Columns must match this regular expression (if a string is provided it will be converted to a RegExp)!regex
(string or RegExp): Columns must not match this regular expression (if a string is provided it will be converted to a RegExp)eq
(value): Columns must equal (==) this valuene
(value): Columns must not equal (!=) this valuegte
(value): Columns must be greater than or equal to (>=) this valuegt
(value): Columns must be greater than (>) this valuelte
(value): Columns must be lesser than or equal to (<=) this valuelt
(value): Columns must be lesser than (<) this value
For example:
dw.applyFilter(
{
column : "column_a",
eq : "apple"
},
{
column : "column_b",
regex : /yellow/,
"!regex" : /blue/
},
{
columns : [ "column_a", "column_b" ],
gte: "apple",
lt: "zebra"
}
);
/* This results in the single row:
[ "apple", "yellow", "golden delicious" ]
*/
The filter can be cleared by calling the clearFilters
method:
dw.clearFilters();
Filters stack on top of each other:
dw.applyFilter(/banana/);
/* This results in the following rows:
[ "banana", "green", "unripe" ],
[ "banana", "yellow", "ripe" ],
[ "banana", "brown", "beyond ripe" ],
[ "banana", "black", "rotten/frozen" ]
*/
dw.applyFilter(/yellow/);
/* The results are further filtered down to one row:
[ "banana", "yellow", "ripe" ],
*/
dw.clearFilters().applyFilter(/yellow/);
/* Old filters are cleared and a new one is applied:
[ "apple", "yellow", "golden delicious" ],
[ "banana", "yellow", "ripe" ],
*/
To permanently remove rows from a dataset with a filter, use the filter
method:
dw.filter(/\bapple\b/)
Search
Similar to filters, the search
method filters the currently visible dataset. The difference is that this will not modify the dataset. The simple form takes a callback and a search term, which can be either a regular expression, a string, or an array of complex filters.
dw.search(function (results) { console.log(results); }, /apple/);
It's also possible to pass in extra, optional arguments as an object. The valid options are:
columns
(array of strings or single string): Columns to be searched and returned (defaults to all columns)searchOn
(array of strings or single string): Columns to be searched (defaults tocolumns
, which defaults to all columns)returnColumns
(array of strings or single string): Columns to be returned (defaults tocolumns
, which defaults to all columns)sortOn
(array of strings or single string): Rows will be sorted on this/these columns, which can be invisible columnslimit
(integer): Maximum number of rows to return. Data will be sorted before limiting resultsfromRow
(integer): Indicates the zero-based row number of the result set to start returning. This can be used, for example, in conjunction withlimit
to get paged results.allRows
(boolean): Allows the search to include hidden rows. Defaults tofalse
.getDistinct
(boolean): Only returns rows that are unique from each other. Defaults tofalse
.
For example:
dw.search(function (results) { console.log(results); }, /a.*le/i, {
columns: [ "column_a", "column_c" ],
sortOn: "-column_b",
fromRow: 6,
limit: 3
});
Group
Similar to grouping in SQL, the group
method allows you to group rows together.
dw.group("column_a");
You may also group by multiple rows:
dw.group("column_a", "column_b");
Note that the following is also valid:
dw.group([ "column_a", "column_b" ]);
Rows with the same value for the specified column(s) will be combined; the column property aggType
determines how values for non-specified columns are combined.
Join
DataWorker also supports joining via the join
method. It can inner join, left outer join, or right outer join.
The following inner joins d1
with d2
on column_a
from d1
and column_d
from d2
:
d1.join(d2, "column_a", "column_d");
The following left outer joins d1
with d2
on column_a
from d1
and column_d
from d2
:
d1.join(d2, "column_a", "column_d", "left");
The following right outer joins d1
with d2
on column_a
from d1
and column_d
from d2
:
d1.join(d2, "column_a", "column_d", "right");
Joins can also be performed on multiple columns:
d1.join(d2, [ "column_a", "column_b" ], [ "column_d", "column_e" ]);
Limit
The applyLimit
method limits the amount of visible rows in the dataset. The following allows only the first 10 rows in the dataset to be visible:
dw.applyLimit(10);
The limit can be cleared by calling the clearFilters
method:
dw.clearFilters();
To permanently remove rows from a dataset with a limit, use the limit
method:
dw.limit(10)
Remove columns
You may completely delete columns from a dataset with the removeColumns
method.
dw.removeColumns("column_a", "column_b");
Note that the following is also valid:
dw.removeColumns([ "column_a", "column_b" ]);
Hide columns
Instead of permanently deleting the columns, you may also temporarily hide them from view using the hideColumns
method:
dw.hideColumns("column_a", "column_b");
Note that the following is also valid:
dw.hideColumns([ "column_a", "column_b" ]);
Hidden columns can be shown with the showColumns
method:
dw.showColumns("column_a", "column_b");
Note that the following is also valid:
dw.showColumns([ "column_a", "column_b" ]);
The hideColumns
and showColumns
methods may also take a regex as an argument. Any column name matching the regex will be hidden/shown, respectively.
dw.hideColumns(/^column_[ab]$/i);
All columns can be hidden with the hideAllColumns
method:
dw.hideAllColumns();
All hidden columns can be revealed with the showAllColumns
method:
dw.showAllColumns();
Alternatively, you may retrieve all columns (visible AND non-visible) by using the getAllColumns
method:
dw.getAllColumns(function (columns) {
allColumns = columns;
});
Clear Dataset
If you want to completely clear the dataset so that you can add new data while leaving any custom handlers intact, you may call clearDataset
.
dw.clearDataset();
This will remove references to all columns and rows. The function takes no parameters.
Sort
The following sorts the dataset on column_a
:
dw.sort("column_a")
To reverse sort, prepend the column name with a -
:
dw.sort("-column_a")
You may also sort on multiple columns:
dw.sort("column_a", "-column_b");
In this case, the sort will fallback to column_b
if the contents of column_a
are equal.
Note that the following does the same thing:
dw.sort([ "column_a", "-column_b" ]);
When child rows exist in the dataset, parents and children are kept together. The dataset will be sorted first by the parent, then by the children, using the same column. Using the example from Add Child Rows, dw.sort("-numbers");
will produce the following dataset:
[
[ "xyz", 789 ],
[ "xyz", 789 ],
[ "abc", 123 ],
[ "abc", 579 ],
[ "abc", 456 ],
[ "def", 0 ]
]
Add Child Rows
Child rows can be added to the dataset by callingaddChildRows
. The columns must be the same as the original dataset as in the following example:
var dataset = [
[ "letters", "numbers" ],
[ "abc", 579 ],
[ "def", 0 ],
[ "xyz", 789 ]
], childRows = [
[ "abc", 123 ],
[ "abc", 456 ],
[ "xyz", 789 ]
], dw = DataWorker(dataset);
The call to addChildRows
expects a column that will be used to determine to which row the children belong. It also expects a dataset of child rows.
You may either pass in an array of rows:
dw.addChildRows(childRows, "letters");
or you may pass in another DataWorker object:
var d2 = DataWorker([[ "letters", "numbers" ]].concat(childRows), "letters");
dw.addChildRows(d2);
In either case, the result passed into the callback for getRows
will be the same:
[
[ "abc", 123 ],
[ "abc", 456 ],
[ "abc", 579 ],
[ "def", 0 ],
[ "xyz", 789 ],
[ "xyz", 789 ]
]
The default visibility for a child row depends on its parent. If the parent row was set to hidden then the child row will still be added to the parent, but will be hidden as well.
If a parent row cannot be found for the children, those child rows will be ignored. If multiple parent rows exist for a given child row, the result is undefined.
Get rows
Visible dataset rows can be retrieved for use via the getRows
method.
If called with just the callback function, getRows
will get all rows. The next two arguments are the start and end of the range. If unspecified, they are the start and end of the dataset. These arguments are 0-based, so a dataset with 15 rows will have rows 0 - 14. If a number larger than the last row is used DataWorker will simply return anything in the range up to (and including) the last row.
var dataset = [
[ "column_a", "column_b", "column_c" ],
[ "apple", "violin", "music" ],
[ "cat", "tissue", "dog" ],
[ "banana", "piano", "gum" ],
[ "gummy", "power", "star" ]
];
var dw = new DataWorker(dataset);
var records;
dw.getRows(function (result) { records = result; });
The following is the contents of records
:
[
[ "apple", "violin", "music" ],
[ "cat", "tissue", "dog" ],
[ "banana", "piano", "gum" ],
[ "gummy", "power", "star" ]
]
If you would only like certain columns, you may provide those columns after the range of rows, either as an array or as extra arguments. All of the following calls to getRows are valid examples:
var callback = function (rows) { /* Do something */ };
dw.getRows(callback);
dw.getRows(callback, 5);
dw.getRows(callback, undefined, 10);
dw.getRows(callback, 5, 10);
dw.getRows(callback, 5, 10, "column_a", "column_b");
dw.getRows(callback, 5, 10, [ "column_a", "column_b" ]);
dw.getRows(callback, undefined, undefined, [ "column_a", "column_b" ]);
Get hashed rows
The getHashedRows
function makes it possible to get records as a hash with the column names instead of as a simple array. It works exactly the same as getRows
but returns the data in a different format. In the following example
var dataset = [
[ "column_a", "column_b", "column_c" ],
[ "apple", "violin", "music" ],
[ "cat", "tissue", "dog" ],
[ "banana", "piano", "gum" ],
[ "gummy", "power", "star" ]
];
var dw = new DataWorker(dataset);
var records;
dw.getHashedRows(function (result) { records = result; });
the contents of records
will be:
[
{
"column_a": "apple",
"column_b": "violin",
"column_c": "music"
},
{
"column_a": "cat",
"column_b": "tissue",
"column_c": "dog"
},
{
"column_a": "banana",
"column_b": "piano",
"column_c": "gum"
},
{
"column_a": "gummy",
"column_b": "power",
"column_c": "star"
}
]
Get columns
The getColumns
method is used to get the visible columns of the dataset:
dw.getColumns(function (columns) {
visibleColumns = columns;
});
Use getAllColumns
to retrieve both visible and non-visible columns.
Get columns and records
Visible columns may be retrieved simultaneously with visible records with getColumnsAndRecords
.
dw.getColumnsAndRecords(function (columns, records) {
// Do something.
});
Columns will be given as a dictionary with the columnName
as the key and its properties (also in a dictionary) as the value.
Records will be returned the same as in getRows
.
Get number of records
Get number of records
The getNumberOfRecords
returns the number of visible rows currently in the dataset:
dw.getNumberOfRecords(function (num) {
numberOfRows = num;
});
Note that for streaming datasets, this value will be the current number of rows it has (and not the total number of rows expected). Use getExpectedNumberOfRecords
to determine the total number of rows in a streaming dataset.
Get expected number of records
The getExpectedNumberOfRecords
method returns the expected number of records in a streaming dataset.
dw.getExpectedNumberOfRecords(function (num) {
expectedNumberOfRows = num;
});
Get distinct
Get distinct consecutive rows
Distinct rows can be retrieved using getDistinctConsecutiveRows
. The function takes a callback
and columnName
as its parameters:
function doSomethingInteresting(records) { }
dw.getDistinctConsecutiveRows(doSomethingInteresting, "column_a");
The value passed into the callback is an array of records. Each record contains three values: value
of the column, startRow
of that value, and endRow
of that value.
If values are repeated again later, but not consecutively, another record will exist in the results. For example, with the following dataset:
[
[ "column_a", "column_b" ]
[ "abc", "123" ],
[ "abc", "456" ],
[ "abc", "789" ],
[ "def", "123" ],
[ "ghi", "123" ],
[ "ghi", "456" ],
[ "def", "456" ],
[ "def", "789" ]
]
the function getDistinctConsecutiveRows
will pass the following records
into the callback as a sole parameter if "column_a" were passed in as columnName
:
[
[ "abc", 0, 2 ],
[ "def", 3, 3 ],
[ "ghi", 4, 5 ],
[ "def", 6, 7 ]
]
but if "column_b" were passed in as columnName
then the results would be:
[
[ "123", 0, 0 ],
[ "456", 1, 1 ],
[ "789", 2, 2 ],
[ "123", 3, 4 ],
[ "456", 5, 6 ],
[ "789", 7, 7 ]
]
The number of records returned will always be equal to or less than the number of records in the original dataset.
Pagination
Pagination eases incremental access of the records. The following sets DataWorker to display 10 rows per page
dw.paginate(10);
Get next page
Now you may grab the next 10 rows using the getNextPage
method:
var next10Rows, currentPage;
dw.getNextPage(function (result, pageNumber) {
next10Rows = result;
currentPage = pageNumber;
});
The callback will provide the requested rows as well as the current page number.
The pagination system will not let you change to a page outside of the dataset. Grabbing the next page when you're on the last page will still return the last page.
Get previous page
The previous 10 rows can be grabbed using the getPreviousPage
method:
var previous10Rows, currentPage;
dw.getPreviousPage(function (result, pageNumber) {
previous10Rows = result;
currentPage = pageNumber;
});
Grabbing a previous page from the 1st page will simply return the 1st page again.
Get page
You may jump to a specific page using the getPage
method. The following grabs page 4:
var page, currentPage;
dw.getPage(function (result, pageNumber) {
page = result;
currentPage = pageNumber; // pageNumber == 4
}, 4);
Note that this also sets your current page to the page you grab.
Ask for specific columns
The previous three functions (getNextPage
, getPreviousPage
, and getPage
) can all take extra arguments defining which columns to return. This will allow you to get a reduced dataset or specify columns that would normally be hidden. The columns can be defined as additional names, or as an array of names. The following are both valid:
var callback = function (result, pageNumber) { /* Do something */ };
dw.getNextPage(callback, "column_a", "column_b");
dw.getNextPage(callback, [ "column_b", "column_c" ]);
Set page
Use the setPage
method to set a new current page. Attempting to set the page to 0 or a negative number will set the page to page 1. Setting to a page past the max number of current pages will set the page to the last page. The following sets your current page to page 4:
dw.setPage(4);
Get number of pages
Use the getNumberOfPages
method to get the total number of pages in the dataset with the current pagination. Note that this is also the same as the last page in the dataset.
var lastPage;
dw.getNumberOfPages(function (totalNumberOfPages) {
lastPage = totalNumberOfPages;
});
Partition
DataWorker can partition its dataset into multiple smaller datasets. The partitioned datasets can be retrieved afterwards using their partition key(s). The following partitions the dataset by the contents of column_a
and uses the getPartitioned
method to push each partition onto the partitioned
array:
var partitioned = [];
dw.partition("column_a");
dw.getPartitionKeys(function (keys) {
keys.forEach(function (key) {
dw.getPartitioned(function (result) { partitioned.push(result); }, key);
});
});
They getPartitioned
method returns an array of records.
You may also partition by multiple keys:
dw.partition("column_a", "column_b");
The following does the same thing:
dw.partition([ "column_a", "column_b" ]);
You may also sort partitions with the sortPartition
method:
dw.sortPartition(partitionKey, columnsToSortOn);
Render
The render
method allows you to pass DataWorker a function to render the dataset. When render
is called without arguments, DataWorker will call the rendering function that the user passed in, or do nothing if the user has not set a rendering function.
dw.render(function () { /* code for rendering the dataset */ });
/* Make some changes to the dataset. */
dw.render(); // Renders the new dataset according to the user-defined function.
Compile
To generate the distribution files, we use the node module grunt
. If you would like to play around with the source and create your own distribution files, you must first have node.js and NPM installed on your machine. From the DataWorker's root directory, run the command npm install
in order to get the latest dev dependencies for the package. Then type grunt dist
to generate the distribution files. You may also type grunt watch
instead, which will automatically generate new distribution files whenever you change one of the source files.