DataTurbine programming gotchas and tips
Submitted by hubbard on Tue, 01/22/2008 - 21:37.
- DataTurbine is divided into Sources, Sinks and Plugins.
- Sources generate data (writers)
- Sinks consume data (readers)
- Plugins operate on data and produce derived output, similar to a Unix pipe.
- Names in the turbine are hierarchical. The simplest is Source/Channel where a source can have more than one channel, but a channel belongs to exactly one source.
- If you have parent-child routing, you will see Parent/Child/Source/Channel
- Loops in the graph are very possible due to shortcuts and mirroring. Tree traversal algorithms should expect and deal with cycles.
- If you connect with a duplicate Source name, DataTurbine will silently give you an auto-incremented name, e.g. Source_1. However, if you specify "append" when connecting, and you are the only instance, it'll reconnect to the old data and append.
- All sources should call Detach before Close, otherwise your data will vanish when you disconnect. This loss is rarely the desired behaviour.
- Server names can be a TCP name or user-defined from the command line via the -n argument.
- Sources can be in several native types - INT16/32, float, double, string or blob.
- Numeric data is usually stored as float or double (e.g. instrument data)
- Audio data, currently experimental, is stored as INT16
- Event markers are XML, stored as string type
- Video is actually discrete JPGs, stored as binary blobs, one per image
- Time synchronization is critical - all machines must be NTP-synced. That includes the people running data viewers like RDV! See this writeup for more details.
- Timestamps are stored as doubles, with 32 bits for integer time_t and 32 bits for fractional seconds.
- RDV can't display data sampled faster than 1khz at present.
- Metadata in DataTurbine is split into two types.
- Invariant. This are things that are set once and never change, for example units. These are set using the PutUserInfo call, where the contents are "name=value,name2=value2..." pairs in a string.
- Time-varying. These are stored as normal source feeds. An example would be GPS position of a datalogger.
- Each and every source defines its own cache parameters. When a source connects, you specify cache and archive size. This allows you to tune server usage, source by source.
- Source data is aggregated into 'frames', which are pushed to the server via the Flush call. You can aggregate (buffer) data to make larger packets, reduce server/network load and increase efficiency.
- DataTurbine exposes its internal metrics as 'hidden' channels in the _Metrics folder - memory usage, bandwidth, disk used and more. Very useful.
- It also sends its logfiles out as hidden text channels in the _Logs source.
- Think of DataTurbine as a very robust abstraction layer: Once data is sent, no sink need worry what kind of device it came from, or how it got there. Enforced device abstaction, network transparency and more!
- The DataTurbine server has IP-level access control (read, write) similar to /etc/hosts.deny. There's also a currently-unused mechanism for requiring passwords for sources, but this really needs encryption to be useful and secure.
- Getting data into DataTurbine is often the easy part. Once there, you need a good viewer that lets users interact with the data in ways that they find useful.
- There are many clients (sinks) as well as DataTurbine->SQL code, file writers, etc so you can use existing tools.
- The ChannelMap.PutDataAsXXXX calls do not copy the data. They just save a reference to it, so be sure to leave the data in a valid variable until you call Flush. Otherwise multiple channels will have the exact same value, very puzzling.