Hadoop Matters Blog

Big Industries' blog

Announcing New partnership with Streamsets


Big Industries recently entered into a partnership with StreamSets. This San Francisco-headquartered company, founded in 2014, has already built up momentum with its open source offering Data Collector.

Read More →

Hadoop Hands-on workshop organised together with ba4all and MapR

Hadoop Hands-on Workshop

Read More →

Building Real Time Data Pipelines with Apache Kafka


Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. This open source project – licensed under the Apache license – has gained popularity within the Hadoop ecosystem, across multiple industries. Its key strength is the ability to make high volume data available as a real-time stream for consumption in systems with very different requirements—from batch systems like Hadoop, to real-time systems that require low-latency access, to stream processing engines like Apache Spark Streaming that transform the data streams as they arrive. Kafka’s flexibility makes it ideal for a wide variety of use cases, from replacing traditional message brokers, to collecting user activity data, aggregating logs, operational application metrics and device instrumentation.

Read More →