Big Industries recently entered into a partnership with StreamSets. This San Francisco-headquartered company, founded in 2014, has already built up momentum with its open source offering Data Collector.
StreamSets' Data Collector is a tool for building and operating data movement pipelines that was conceived during an era in which data flows were quite complicated - data was being moved around organization's data platforms and analytic tools at different speeds, including batch or real-time, and in different formats such as structured, semi-structured and unstructured.
The founders of StreamSets, with their backgrounds in both the data integration (Informatica) and Hadoop (Cloudera) worlds, set out to build a new technology that was designed with these latest data movement, monitoring and optimization challenges in mind. One of the specific challenges it is aimed at overcoming is the problem of 'data drift', which can see data pipelines disrupted when a particular data source changes, sine integrations are often brittle. Cleverly, Data Collector is able to do this either in real time, by sitting in the data pipeline thanks to its in-memory architecture, or in batch, when sitting on Hadoop.
Recently support for Hadoop distributions from Cloudera, MapR and Hortonworks were added. It's also certified with the MapR Converged Data Platform, including extended support for MapR Streams. Other integrations exist with MongoDb and Cassandra and Kafka.
StreamSets customers include Forbes Global 2000 pharmaceutical and financial services firms, as well as a government agency that recently moved from Apache NiFi to StreamSets Data Collector - all of which are using StreamSets to modernize their data movement infrastructure.