DataWorker is designed to manage large datasets client-side, using modern web technologies such as Web Workers and WebSockets. DataWorker allows users to perform expensive operations (such as sorting, joining, searching, filtering, grouping, and much more) in a multi-threaded environment to maintain responsiveness. Data can be provided locally as an array, or remotely via AJAX or WebSockets. DataWorker has no dependencies on external libraries, but works well with others.

Not all browsers fully support all of the provided features (such as using a WebSocket from within a Web Worker) because some of the technology is quite new, so fallbacks are in place to make sure data can still be accessed across the most amount of devices possible.

To get started, you will need to include one of the distribution files in the dist/ folder: dataworker.js for development/testing, and dataworker.min.js for production. For example:

    <script src="dist/dataworker.min.js"></script>

Local Data

DataWorker takes an array of records to construct the initial dataset. The first record defines column properties.

    var dataset = [
        [
            {
                name: "column_a",
                title: "Column A",
                aggType: "max",
                sortType: "alpha"
            },
            {
                name: "column_b",
                title: "Column B",
                aggType: "max",
                sortType: "alpha"
            },
            {
                name: "column_c",
                title: "Column C",
                aggType: "min",
                sortType: "alpha"
            }
        ],

        [ "apple",      "violin",    "music" ],
        [ "cat",        "tissue",      "dog" ],
        [ "banana",      "piano",      "gum" ],
        [ "gummy",       "power",     "star" ]
    ];

    var dw = new DataWorker(dataset);

It is also possible to just supply column names:

    var dataset = [
        [ "column_a", "column_b", "column_c" ],

        [ "apple",      "violin",    "music" ],
        [ "cat",        "tissue",      "dog" ],
        [ "banana",      "piano",      "gum" ],
        [ "gummy",       "power",     "star" ]
    ];

    var dw = new DataWorker(dataset);

If only column names are supplied, the following default column properties are used:

  • title: columnName
  • aggType: "max"
  • sortType: "alpha"

Additionally, you should give DataWorker an error handler (by default DataWorker uses console.error) using the onError method or by adding it as a property on the dataset object:

    dataset.onError = function (msg) { alert(msg) };
    new DataWorker(dataset);

or

    var dw = new DataWorker(dataset).onError(function (errorMsg) {
        alert(errorMsg);
    });
Error handling

DataWorker can be provided with an error handler:

        dw.onError(function (msg) {
            alert(msg);
        });

By default, errors are printed to the console.

Streaming Data

Via Websockets

Alternatively, DataWorker can stream data from a WebSocket server:

    var dw = new DataWorker({
        datasource   : "ws://127.0.0.1:8888",
        authenticate : "{}",
        request      : "{\"cmd\":\"requestDataset\"}"
    });

datasource should be the URL of the websocket server.

authenticate is optional. If present, will be the first message sent to the server upon connecting.

request is sent after authenticate.

After the request is sent, DataWorker will expect a reply in the form of a JSON object with expectedNumRows and columns properties such as:

    {
        expectedNumRows : 10,
        columns         : [ "column_a", "column_b", "column_c" ]
    }

expectedNumRows should be the number of rows to expect in that dataset.

columns should be the columns of the expected dataset. They can be provided as simple column names or as hashes with custom column properties.

Note DataWorker cannot proceed without first knowing the columns and the total number of expected rows. That being said, these two properties may be transmitted separately. For example:

    // First message:
    { "expectedNumRows": 10 }

    // Second message:
    { columns: [ "column_a", "column_b", "column_c" ] }

Afterwards, DataWorker will expect arrays of dataset rows from the server. These rows will be appended to the dataset. The reply will be similar to the following:

    [
        [ "apple",      "violin",    "music" ],
        [ "cat",        "tissue",      "dog" ],
        [ "banana",      "piano",      "gum" ],
        [ "gummy",       "power",     "star" ]
    ]

The onReceiveColumns callback will be called when columns and the expected number of rows are received from the Websocket server. This callback can be set with the onReceiveColumns method:

    dw.onReceiveColumns(function () {
        alert("Columns have been received!");
    });

Alternatively, this callback can be passed into the constructor:

    var dw = new DataWorker({
        datasource       : "ws://127.0.0.1:8888",
        authenticate     : "{}",
        request          : "{\"cmd\":\"requestDataset\"}",
        onReceiveColumns : function (numRows) {
            alert("Columns have been received!");
        }
    });

The onReceiveRows callback will be called whenever rows are received from the Websocket server. This callback can be set with the onReceiveRows method:

    dw.onReceiveRows(function (numRows) {
        alert("Received " + numRows + " rows.");
    });

Alternatively, this callback can be passed into the constructor:

    var dw = new DataWorker({
        datasource    : "ws://127.0.0.1:8888",
        authenticate  : "{}",
        request       : "{\"cmd\":\"requestDataset\"}",
        onReceiveRows : function (numRows) {
            alert(numRows + " rows received.");
        }
    });

If not set, this callback defaults to doing nothing.

The onAllRowsReceived callback is called when all expected rows have been received from the Websocket server. This callback can be set with the onAllRowsReceived method:

    dw.onAllRowsReceived(function () {
        alert("All rows have been received!");
    });

Alternatively, this callback can be passed into the constructor:

    var dw = new DataWorker({
        datasource         : "ws://127.0.0.1:8888",
        authenticate       : "{}",
        request            : "{\"cmd\":\"requestDataset\"}",
        onReceiveRows      : function (numRows) {
            alert(numRows + " rows received.");
        }
        onAllRowsReceived : function () {
            alert("All rows have been received!");
        }
    });

If not set, this callback does nothing.

Via AJAX

If WebSockets are not available, data can be received via AJAX as well:

    var dw = new DataWorker({
        datasource           : "http://127.0.0.1:8888",
        request              : "?this=is;a=query;string",
        onAllRowsReceived : function () {
            alert("all rows have been received!");
        }
    });

datasource should be the base url to send GET requests to.

request should be the query string to append to the base url. The prepended question mark is optional; if it is missing, one will automatically be inserted. Optionally, request could be a javascript object or a JSON string; DataWorker will convert this into a query string.

Using fallbacks

If multiple data sources exist, it is possible to provide fallbacks when instantiating DataWorker. This can be useful if you have a server down for maintenance or your WebSocket service is not running. Simply provide the possible sources as an array to datasource in priority order:

    var dw = new DataWorker({
        request    : { cmd: "requestDataset" },
        datasource : [
            "https://example.com",
            "ws://127.0.0.1:8888",
            "http://127.0.0.1:9999"
        ]
    });

The sources will be tried in order. If they all fail, DataWorker will throw an error.

For finer-grain control, authentications can be supplied on a per-datasource basis:

    var dw = new DataWorker({
        request    : "\"cmd\":\"requestDataset\"}",
        datasource : [
            {
                source       : "ws://127.0.0.1:8888",
                authenticate : "zxcv"
            },
            {
                source       : "http://127.0.0.1/data-origin",
                authenticate : "asdf"
            }
        ]
    });
Requesting further data

A new dataset can be requested using the requestDataset method:

    // For WebSockets or AJAX:
    dw.requestDataset("{\"cmd\":\"requestDataset\"}");

    // For AJAX only:
    dw.requestDataset("query=string");

The newly-requested dataset will completely replace the previously-requested dataset.

A new dataset can be requested and appended to the current dataset using the requestDatasetForAppend method:

    // For WebSockets or AJAX:
    dw.requestDatasetForAppend("{\"cmd\":\"requestDataset\"}");

    // For AJAX only:
    dw.requestDatasetForAppend("query=string");

Note that the newly-requested dataset should have the same columns as the previously-requested dataset.

Cancelling ongoing requests

Ongoing requests can be cancelled with the cancelOnoingRequests method:

    dw.cancelOngoingRequests();

When requestDataset is called, ongoing requests are automatically cancelled (requestDatasetForAppend does not cancel ongoing requests).

When connected to a Websocket server, DataWorker will send the cancelRequestsCmd message signalling the server to cancel the previous requests if such a message has been provided:

    var dw = new DataWorker({
        datasource: {
            source            : "ws://127.0.0.1:8888",
            cancelRequestsCmd : "CANCEL",
            cancelRequestsAck : "ACK_CANCEL"
        }
    });

If cancelRequestsAck has been provided, DataWorker will wait for the server to acknowledge with a reply that matches cancelRequestsAck exactly before proceeding.

Triggers

DataWorker can pass messages to and from the remote server via the postMessage method and triggers:

    dw.onTrigger(function (reply) {
        alert(reply);
    }).postMessage("MESSAGE TO SERVER");

In the above example, the string "MESSAGE TO SERVER" is sent to the server, and any reply will by shown as an alert. Server replies that are triggers should be stringified JSON objects that look like this:

    {
        trigger: true,
        msg: "REPLY FROM SERVER"
    }

The onTrigger callback can be passed into the constructor as well:

    var dw = new DataWorker({
        datasource       : "ws://127.0.0.1:8888",
        authenticate     : "{}",
        onTrigger: function (reply) { alert(reply) }
    });

Web Workers

Web Workers are enabled by default, to get the most performance out of DataWorker. Without any extra configuration, a Blob will be used to create the Web Worker. However, some browsers do not support remote data from within a Web Worker created in that fashion, though they work fine when created using a standard URL. DataWorker will detect these errors and look for a file named dw-helper.js in the same folder as dw.js. THe method used to determine the source path does not work well in some circumstances, for example when a module loader like RequireJS. To provide alternative sources, extra options can be provided when instantiating DataWorker.

If workerSource is provided, it will be used first. If that source fails to create a Web Worker, or an error occurs on connection to the datasource, a Blob will be created next to try again. If the Blob fails, then backupWorkerSource will be used next. If that fails, or wasn't provided, DataWorker will make it's best guess as to the location of dw-helper.js. When all else fails, DataWorker will revert to a single-threaded environment. Finally, an error will be thrown if a connection still cannot be made and no local dataset is provided.

    new DataWorker({
        datasource: "https://example.com/dataset.json",
        workerSource: "/path/to/dw-helper.js",
        backupWorkerSource: "/other/path/to/dw-helper.js"
    });

There is an initial cost to creating a Web Worker, and they are meant to stay alive as long as possible. This can be important for avoiding memory leaks in single page applications. Rather than terminating a Web Worker, DataWorkerHelper is designed to idle when not in use and be reused when needed. To clean up an instance of DataWorker, simply call finish(optionalCallback) on the instance. If a callback is included it will be called once the Web Worker has responded that it is ready to be reused. Simply creating a new DataWorker will reuse that Web Worker.

Web Workers are still an experimental technology, and may have some issues. If you prefer to avoid using Web Workers at all, you may pass in a flag to force it to be single-threaded.

    new DataWorker({
        datasource: "https://example.com/dataset.json",
        forceSingleThread: true
    });

Clone

A DataWorker dataset can be deep-copied by calling clone:

    dw.clone(function (newD) {
        newDataset = newD;
    });

Note that any streaming datasources are not copied as part of the cloning process.

Child Rows

DataWorker has a concept of child rows. Child rows are rows that are a subset of another row, and may be used, for example, to provide more detail. The functionality is currently limited to a small subset of functions. Where use of child rows is not explicitly defined, they are treated as if they are distinct rows that have been added to the dataset. These are the currently supported functions:

Display Values

DataWorker can use differing values for display and dataset operations. This enables you to include HTML formatting for cell values but not have to consider them when actually working with the dataset. To take advantage of this feature, simply pass the cell value an object with the display and raw properties defined:

    var dataset = [
        [ "column_a",                                "column_b", "column_c" ],

        [ { display: "<i>apple</i>", raw: "apple" }, "violin",    "music"   ],
        [ "cat",                                     "tissue",    "dog"     ],
        [ "banana",                                  "piano",     "gum"     ]
    ];

    var d = new DataWorker(dataset);

    d.getRows(function (rows) {
        // rows is [
        //      [ "<i>apple</i>", "violin",    "music"   ],
        //      [ "cat",          "tissue",    "dog"     ],
        //      [ "banana",       "piano",     "gum"     ]
        // ]
    });

    d.applyFilter(/^apple^/, "column_a"); // Operates on the raw value

    d.getRows(function (rows) {
        // rows is [
        //      [ "<i>apple</i>", "violin",    "music"   ],
        // ]
    });

The display value will be returned by the methods that allow access to the dataset, while the methods that manipulate the dataset will operate on the raw value. Also note that the two methods of specifying cell values can be mixed.

Alter columns

Column properties can be altered after DataWorker is instantiated.

Alter column name

To change the name of column_a to column_a1:

    dw.alterColumnName("column_a", "column_a1");
Alter column title

To change the title of column_a to Things I Love:

    dw.alterColumnTitle("column_a", "Things I Love");
Alter column aggregate type

To change the aggregate type of column_a to min:

    dw.alterColumnAggregateType("column_a", "min");

Valid aggregate types are:

  • max
  • min
  • sum
Alter column sort type

To change the sort type of column_a to num:

    dw.alterColumnSortType("column_a", "num");

Valid sort types are:

  • alpha
  • num

Alternatively, you may pass in a sort function that takes two arguments (a, b) and returns -1 for a < b, 1 for a > b or 0 for a == b.

Prepend column names

To prepend a_ to all column names:

    dw.prependColumnNames("a_");

Append

The append method is used to concatenate two datasets together. The following appends dataset2 to dataset1:

    var dataset1 = [
        [ "column_a", "column_b", "column_c" ],

        [ "apple",      "violin",    "music" ],
        [ "cat",        "tissue",      "dog" ],
        [ "banana",      "piano",      "gum" ],
    ];
    var dataset2 = [
        [ "column_a", "column_b", "column_c" ],

        [ "gummy",       "power",    "apple" ],
        [ "car",        "screen",    "phone" ],
        [ "sign",        "bagel",    "chips" ]
    ];

    var dw = new DataWorker(dataset1);
    dw.append(dataset2);

Alternatively, you may also append a DataWorker dataset:

    var dataset1 = [
        [ "column_a", "column_b", "column_c" ],

        [ "apple",      "violin",    "music" ],
        [ "cat",        "tissue",      "dog" ],
        [ "banana",      "piano",      "gum" ]
    ];
    var dataset2 = [
        [ "column_a", "column_b", "column_c" ],

        [ "gummy",       "power",    "apple" ],
        [ "car",        "screen",    "phone" ],
        [ "sign",        "bagel",    "chips" ]
    ];

    var d1 = new DataWorker(dataset1);
    var d2 = new DataWorker(dataset2);
    d1.append(d2);

Note that column names must match up; an error will be thrown otherwise.

Filter

The applyFilter method is used to filter out rows that do not contain the specified regex. The following filters out any row that does not contain the word "apple":

    var dataset = [
        [ "column_a", "column_b", "column_c"         ],

        [ "apple",    "red",      "fuji"             ],
        [ "apple",    "green",    "granny smith"     ],
        [ "apple",    "yellow",   "golden delicious" ],

        [ "banana",   "green",    "unripe"           ],
        [ "banana",   "yellow",   "ripe"             ],
        [ "banana",   "brown",    "beyond ripe"      ],
        [ "banana",   "black",    "rotten/frozen"    ]
    ], dw = new DataWorker(dataset);

    dw.applyFilter(/\bapple\b/)

    /* This results in the following rows:
        [ "apple", "red",    "fuji"             ],
        [ "apple", "green",  "granny smith"     ],
        [ "apple", "yellow", "golden delicious" ]
    */

You may also filter only on certain columns:

    dw.applyFilter(/\bapple\b/, "column_a", "column_b");

Note that the following also works (and results in the exact same dataset):

    dw.applyFilter(/\bapple\b/, [ "column_a", "column_b" ]);
Complex Filters

You may also use the complex syntax, in which all filters provided must find a match for the row to be visible. The complex filters can take any of the following arguments:

  • columns (array of strings or single string): Columns on which to filter (defaults to all columns)
  • column (single string): Single column on which to filter (defaults to all columns) (note: if both columns and column are defined, the latter will be used)
  • matchAll (boolean): If this flag is set to true then all columns provided must match the filter for the row to stay visible. By default, if any of the columns match the row will stay visible
  • accentInsensitive (boolean): If this flag is set to true then characters with accent marks will be treated as normal ascii character (e.g., applé matches both applé and apple).
  • regex (string or RegExp): Columns must match this regular expression (if a string is provided it will be converted to a RegExp)
  • !regex (string or RegExp): Columns must not match this regular expression (if a string is provided it will be converted to a RegExp)
  • eq (value): Columns must equal (==) this value
  • ne (value): Columns must not equal (!=) this value
  • gte (value): Columns must be greater than or equal to (>=) this value
  • gt (value): Columns must be greater than (>) this value
  • lte (value): Columns must be lesser than or equal to (<=) this value
  • lt (value): Columns must be lesser than (<) this value

For example:

    dw.applyFilter(
        {
            column : "column_a",
            eq     : "apple"
        },
        {
            column   : "column_b",
            regex    : /yellow/,
            "!regex" : /blue/
        },
        {
            columns : [ "column_a", "column_b" ],
            gte: "apple",
            lt: "zebra"
        }
    );

    /* This results in the single row:
        [ "apple", "yellow", "golden delicious" ]
    */

The filter can be cleared by calling the clearFilters method:

    dw.clearFilters();

Filters stack on top of each other:

    dw.applyFilter(/banana/);
    /* This results in the following rows:
        [ "banana", "green",  "unripe"        ],
        [ "banana", "yellow", "ripe"          ],
        [ "banana", "brown",  "beyond ripe"   ],
        [ "banana", "black",  "rotten/frozen" ]
    */

    dw.applyFilter(/yellow/);
    /* The results are further filtered down to one row:
        [ "banana", "yellow", "ripe" ],
    */

    dw.clearFilters().applyFilter(/yellow/);
    /* Old filters are cleared and a new one is applied:
        [ "apple",  "yellow", "golden delicious" ],
        [ "banana", "yellow", "ripe"             ],
    */

To permanently remove rows from a dataset with a filter, use the filter method:

    dw.filter(/\bapple\b/)

Group

Similar to grouping in SQL, the group method allows you to group rows together.

    dw.group("column_a");

You may also group by multiple rows:

    dw.group("column_a", "column_b");

Note that the following is also valid:

    dw.group([ "column_a", "column_b" ]);

Rows with the same value for the specified column(s) will be combined; the column property aggType determines how values for non-specified columns are combined.

Join

DataWorker also supports joining via the join method. It can inner join, left outer join, or right outer join.

The following inner joins d1 with d2 on column_a from d1 and column_d from d2:

    d1.join(d2, "column_a", "column_d");

The following left outer joins d1 with d2 on column_a from d1 and column_d from d2:

    d1.join(d2, "column_a", "column_d", "left");

The following right outer joins d1 with d2 on column_a from d1 and column_d from d2:

    d1.join(d2, "column_a", "column_d", "right");

Joins can also be performed on multiple columns:

    d1.join(d2, [ "column_a", "column_b" ], [ "column_d", "column_e" ]);

Limit

The applyLimit method limits the amount of visible rows in the dataset. The following allows only the first 10 rows in the dataset to be visible:

   dw.applyLimit(10);

The limit can be cleared by calling the clearFilters method:

   dw.clearFilters();

To permanently remove rows from a dataset with a limit, use the limit method:

   dw.limit(10)

Remove columns

You may completely delete columns from a dataset with the removeColumns method.

   dw.removeColumns("column_a", "column_b");

Note that the following is also valid:

   dw.removeColumns([ "column_a", "column_b" ]);

Hide columns

Instead of permanently deleting the columns, you may also temporarily hide them from view using the hideColumns method:

   dw.hideColumns("column_a", "column_b");

Note that the following is also valid:

   dw.hideColumns([ "column_a", "column_b" ]);

Hidden columns can be shown with the showColumns method:

   dw.showColumns("column_a", "column_b");

Note that the following is also valid:

   dw.showColumns([ "column_a", "column_b" ]);

The hideColumns and showColumns methods may also take a regex as an argument. Any column name matching the regex will be hidden/shown, respectively.

   dw.hideColumns(/^column_[ab]$/i);

All columns can be hidden with the hideAllColumns method:

   dw.hideAllColumns();

All hidden columns can be revealed with the showAllColumns method:

   dw.showAllColumns();

Alternatively, you may retrieve all columns (visible AND non-visible) by using the getAllColumns method:

    dw.getAllColumns(function (columns) {
        allColumns = columns;
    });

Clear Dataset

If you want to completely clear the dataset so that you can add new data while leaving any custom handlers intact, you may call clearDataset.

   dw.clearDataset();

This will remove references to all columns and rows. The function takes no parameters.

Sort

The following sorts the dataset on column_a:

    dw.sort("column_a")

To reverse sort, prepend the column name with a -:

    dw.sort("-column_a")

You may also sort on multiple columns:

    dw.sort("column_a", "-column_b");

In this case, the sort will fallback to column_b if the contents of column_a are equal.

Note that the following does the same thing:

    dw.sort([ "column_a", "-column_b" ]);

When child rows exist in the dataset, parents and children are kept together. The dataset will be sorted first by the parent, then by the children, using the same column. Using the example from Add Child Rows, dw.sort("-numbers"); will produce the following dataset:

    [
        [ "xyz", 789 ],
            [ "xyz", 789 ],
        [ "abc", 123 ],
            [ "abc", 579 ],
            [ "abc", 456 ],
        [ "def", 0   ]
    ]

Add Child Rows

Child rows can be added to the dataset by calling addChildRows. The columns must be the same as the original dataset as in the following example:

    var dataset = [
        [ "letters", "numbers" ],

        [ "abc",     579       ],
        [ "def",     0         ],
        [ "xyz",     789       ]
    ], childRows = [
        [ "abc",     123       ],
        [ "abc",     456       ],
        [ "xyz",     789       ]
    ], dw = DataWorker(dataset);

The call to addChildRows expects a column that will be used to determine to which row the children belong. It also expects a dataset of child rows.

You may either pass in an array of rows:

    dw.addChildRows(childRows, "letters");

or you may pass in another DataWorker object:

    var d2 = DataWorker([[ "letters", "numbers" ]].concat(childRows), "letters");

    dw.addChildRows(d2);

In either case, the result passed into the callback for getRows will be the same:

    [
        [ "abc", 123 ],
            [ "abc", 456 ],
            [ "abc", 579 ],
        [ "def", 0   ],
        [ "xyz", 789 ],
            [ "xyz", 789 ]
    ]

The default visibility for a child row depends on its parent. If the parent row was set to hidden then the child row will still be added to the parent, but will be hidden as well.

If a parent row cannot be found for the children, those child rows will be ignored. If multiple parent rows exist for a given child row, the result is undefined.

Get rows

Visible dataset rows can be retrieved for use via the getRows method.

If called with just the callback function, getRows will get all rows. The next two arguments are the start and end of the range. If unspecified, they are the start and end of the dataset. These arguments are 0-based, so a dataset with 15 rows will have rows 0 - 14. If a number larger than the last row is used DataWorker will simply return anything in the range up to (and including) the last row.

    var dataset = [
        [ "column_a", "column_b", "column_c" ],

        [ "apple",      "violin",    "music" ],
        [ "cat",        "tissue",      "dog" ],
        [ "banana",      "piano",      "gum" ],
        [ "gummy",       "power",     "star" ]
    ];

    var dw = new DataWorker(dataset);
    var records;

    dw.getRows(function (result) { records = result; });

The following is the contents of records:

    [
        [ "apple",      "violin",    "music" ],
        [ "cat",        "tissue",      "dog" ],
        [ "banana",      "piano",      "gum" ],
        [ "gummy",       "power",     "star" ]
    ]

If you would only like certain columns, you may provide those columns after the range of rows, either as an array or as extra arguments. All of the following calls to getRows are valid examples:

    var callback = function (rows) { /* Do something */ };

    dw.getRows(callback);
    dw.getRows(callback, 5);
    dw.getRows(callback, undefined, 10);
    dw.getRows(callback, 5, 10);
    dw.getRows(callback, 5, 10, "column_a", "column_b");
    dw.getRows(callback, 5, 10, [ "column_a", "column_b" ]);
    dw.getRows(callback, undefined, undefined, [ "column_a", "column_b" ]);

Get hashed rows

The getHashedRows function makes it possible to get records as a hash with the column names instead of as a simple array. It works exactly the same as getRows but returns the data in a different format. In the following example

    var dataset = [
        [ "column_a", "column_b", "column_c" ],

        [ "apple",      "violin",    "music" ],
        [ "cat",        "tissue",      "dog" ],
        [ "banana",      "piano",      "gum" ],
        [ "gummy",       "power",     "star" ]
    ];

    var dw = new DataWorker(dataset);
    var records;

    dw.getHashedRows(function (result) { records = result; });

the contents of records will be:

    [
        {
            "column_a": "apple",
            "column_b": "violin",
            "column_c": "music"
        },
        {
            "column_a": "cat",
            "column_b": "tissue",
            "column_c": "dog"
        },
        {
            "column_a": "banana",
            "column_b": "piano",
            "column_c": "gum"
        },
        {
            "column_a": "gummy",
            "column_b": "power",
            "column_c": "star"
        }
    ]

Get columns

The getColumns method is used to get the visible columns of the dataset:

    dw.getColumns(function (columns) {
        visibleColumns = columns;
    });

Use getAllColumns to retrieve both visible and non-visible columns.

Get columns and records

Visible columns may be retrieved simultaneously with visible records with getColumnsAndRecords.

    dw.getColumnsAndRecords(function (columns, records) {
        // Do something.
    });

Columns will be given as a dictionary with the columnName as the key and its properties (also in a dictionary) as the value.

Records will be returned the same as in getRows.

Get number of records

Get number of records

The getNumberOfRecords returns the number of visible rows currently in the dataset:

    dw.getNumberOfRecords(function (num) {
        numberOfRows = num;
    });

Note that for streaming datasets, this value will be the current number of rows it has (and not the total number of rows expected). Use getExpectedNumberOfRecords to determine the total number of rows in a streaming dataset.

Get expected number of records

The getExpectedNumberOfRecords method returns the expected number of records in a streaming dataset.

    dw.getExpectedNumberOfRecords(function (num) {
        expectedNumberOfRows = num;
    });

Get distinct

Get distinct consecutive rows

Distinct rows can be retrieved using getDistinctConsecutiveRows. The function takes a callback and columnName as its parameters:

    function doSomethingInteresting(records) { }
    dw.getDistinctConsecutiveRows(doSomethingInteresting, "column_a");

The value passed into the callback is an array of records. Each record contains three values: value of the column, startRow of that value, and endRow of that value.

If values are repeated again later, but not consecutively, another record will exist in the results. For example, with the following dataset:

    [
        [ "column_a", "column_b" ]

        [ "abc",      "123"      ],
        [ "abc",      "456"      ],
        [ "abc",      "789"      ],
        [ "def",      "123"      ],
        [ "ghi",      "123"      ],
        [ "ghi",      "456"      ],
        [ "def",      "456"      ],
        [ "def",      "789"      ]
    ]

the function getDistinctConsecutiveRows will pass the following records into the callback as a sole parameter if "column_a" were passed in as columnName:

    [
        [ "abc", 0, 2 ],
        [ "def", 3, 3 ],
        [ "ghi", 4, 5 ],
        [ "def", 6, 7 ]
    ]

but if "column_b" were passed in as columnName then the results would be:

    [
        [ "123", 0, 0 ],
        [ "456", 1, 1 ],
        [ "789", 2, 2 ],
        [ "123", 3, 4 ],
        [ "456", 5, 6 ],
        [ "789", 7, 7 ]
    ]

The number of records returned will always be equal to or less than the number of records in the original dataset.

Pagination

Pagination eases incremental access of the records. The following sets DataWorker to display 10 rows per page

    dw.paginate(10);
Get next page

Now you may grab the next 10 rows using the getNextPage method:

    var next10Rows, currentPage;

    dw.getNextPage(function (result, pageNumber) {
        next10Rows = result;
        currentPage = pageNumber;
    });

The callback will provide the requested rows as well as the current page number.

The pagination system will not let you change to a page outside of the dataset. Grabbing the next page when you're on the last page will still return the last page.

Get previous page

The previous 10 rows can be grabbed using the getPreviousPage method:

    var previous10Rows, currentPage;

    dw.getPreviousPage(function (result, pageNumber) {
        previous10Rows = result;
        currentPage = pageNumber;
    });

Grabbing a previous page from the 1st page will simply return the 1st page again.

Get page

You may jump to a specific page using the getPage method. The following grabs page 4:

    var page, currentPage;

    dw.getPage(function (result, pageNumber) {
        page = result;
        currentPage = pageNumber; // pageNumber == 4
    }, 4);

Note that this also sets your current page to the page you grab.

Ask for specific columns

The previous three functions (getNextPage, getPreviousPage, and getPage) can all take extra arguments defining which columns to return. This will allow you to get a reduced dataset or specify columns that would normally be hidden. The columns can be defined as additional names, or as an array of names. The following are both valid:

    var callback = function (result, pageNumber) { /* Do something */ };

    dw.getNextPage(callback, "column_a", "column_b");
    dw.getNextPage(callback, [ "column_b", "column_c" ]);
Set page

Use the setPage method to set a new current page. Attempting to set the page to 0 or a negative number will set the page to page 1. Setting to a page past the max number of current pages will set the page to the last page. The following sets your current page to page 4:

    dw.setPage(4);
Get number of pages

Use the getNumberOfPages method to get the total number of pages in the dataset with the current pagination. Note that this is also the same as the last page in the dataset.

    var lastPage;

    dw.getNumberOfPages(function (totalNumberOfPages) {
        lastPage = totalNumberOfPages;
    });

Partition

DataWorker can partition its dataset into multiple smaller datasets. The partitioned datasets can be retrieved afterwards using their partition key(s). The following partitions the dataset by the contents of column_a and uses the getPartitioned method to push each partition onto the partitioned array:

    var partitioned = [];

    dw.partition("column_a");

    dw.getPartitionKeys(function (keys) {
        keys.forEach(function (key) {
            dw.getPartitioned(function (result) { partitioned.push(result); }, key);
        });
    });

They getPartitioned method returns an array of records.

You may also partition by multiple keys:

    dw.partition("column_a", "column_b");

The following does the same thing:

    dw.partition([ "column_a", "column_b" ]);

You may also sort partitions with the sortPartition method:

    dw.sortPartition(partitionKey, columnsToSortOn);

Render

The render method allows you to pass DataWorker a function to render the dataset. When render is called without arguments, DataWorker will call the rendering function that the user passed in, or do nothing if the user has not set a rendering function.

    dw.render(function () { /* code for rendering the dataset */ });

    /* Make some changes to the dataset. */

    dw.render(); // Renders the new dataset according to the user-defined function.

Compile

To generate the distribution files, we use the node module grunt. If you would like to play around with the source and create your own distribution files, you must first have node.js and NPM installed on your machine. From the DataWorker's root directory, run the command npm install in order to get the latest dev dependencies for the package. Then type grunt dist to generate the distribution files. You may also type grunt watch instead, which will automatically generate new distribution files whenever you change one of the source files.