small improvements to the duckdb node module Dec 21, 2021
I’ve begun building data tools that utilize DuckDB, an in-process columnar database designed for fast analytic queries. It’s such a powerful paradigm for manipulating and recasting datasets that are small enough for you to download on your computer. Also, the logo is pretty cute.
The node.js library for DuckDB, however, is missing a number of features. While Node isn’t currently quite as “data-native” as R and Python, it’s obviously still a powerful ecosystem and application development runtime environment. A solid Node module for DuckDB means DuckDB is available and accessible for wider data tool development.
A wishlist of things I’m looking for / working on / have already fixed:
parquet by default– ~~the parquet extension should be available to the node.js bindings by default (Hannes Mühleisenmerged a PR or this, thankfully!)- promisify by default – there should be a Promise-based approach to complement the traditional Node callbacks.
- full support for all data types – I recently submitted a set of Mocha tests to provide test coverage for the data types I’ve been adding. This would be a good way to ensure that all data types are supported. Someone who’s better at C++ and the Node API will probably know how to swoop in and add generic coverage for all data types. This includes
INTERVAL
(both for type conversion and for binding back) (PR merged),strangely enough theBOOLEAN
type isn’t supported (PR also merged),MAP
(for use ofhistogram(column)
), and other arbitrary lists / structs
- support for binding struct-based data types – e.g. binding an
INTERVAL
with a javascript object that has amonths
,days
, andmicros
field. This isn’t the highest priority since one can just concatenate a string to produce anINTERVAL
. - a way to interrupt any running queries - I would love a way to be able to send a signal to a query and have it stop running. I see in
interrupt.test.js
there is use ofdb.interrupt()
. I don’t know the status of this functionality, but it doesn’t seem to actually stop my queries when I test it myself. - documentation – the node.js API should be better documented. The test suite should not be the definitive source of documentation. This one should be easy enough for anyone to do.