Hadoop Matters Blog

Big Industries' blog

Cloudera Hadoop Essentials Life

On Friday September 16, 2016 Cloudera, together with the support of the Belgium Cloudera User Group, organized a Cloudera Hadoop Essentials life session in the DIAMANT Conference & Business Centre in Brussels.

Read More →

Creating a Data Pipeline using Flume, Kafka, Spark and Hive


The aim of this post is to help you getting started with creating a data pipeline using flume, kafka and spark streaming that will enable you to fetch twitter data and analyze it in hive.

Read More →

Hadoop Hands-on workshop organised together with ba4all and MapR

Hadoop Hands-on Workshop

Read More →

Building Real Time Data Pipelines with Apache Kafka


Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. This open source project – licensed under the Apache license – has gained popularity within the Hadoop ecosystem, across multiple industries. Its key strength is the ability to make high volume data available as a real-time stream for consumption in systems with very different requirements—from batch systems like Hadoop, to real-time systems that require low-latency access, to stream processing engines like Apache Spark Streaming that transform the data streams as they arrive. Kafka’s flexibility makes it ideal for a wide variety of use cases, from replacing traditional message brokers, to collecting user activity data, aggregating logs, operational application metrics and device instrumentation.

Read More →

Big Data processing with Apache Spark


What is Spark?

Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010 as an Apache project.

Read More →