Hadoop Matters Blog

Big Industries' blog

Why Do I need a Data Lake?

What is a data lake?

A Data Lake is an enterprise-wide system for storing and analyzing disparate sources of data in their native formats. A Data Lake might combine sensor data, social media data, click-streams, location data, log files, and much more with traditional data from existing RDBMSes. The goal is to break the information silos in an enterprise by bringing all the data into a single place for analysis without the restrictions of schema, security, or authorization. Data Lakes are designed to store vast amounts of data, even petabytes, in local or cloud-based clusters consisting of commodity hardware.

Read More →

Cloudera User Group Meetup

CUG Meetup.png

Big Industries is the main sponsor and driving force behind the Belgian chapter of the Cloudera User Group. This is a group for Cloudera customers and anyone interested in Cloudera solutions in Belgium to network, share best practices, and exchange ideas around the Cloudera Big Data platform and eco-system.

Read More →

Cloudera Hadoop Essentials Life

On Friday September 16, 2016 Cloudera, together with the support of the Belgium Cloudera User Group, organized a Cloudera Hadoop Essentials life session in the DIAMANT Conference & Business Centre in Brussels.

Read More →

Creating a Data Pipeline using Flume, Kafka, Spark and Hive

tweets-by-hashtag-hive-hiveql

The aim of this post is to help you getting started with creating a data pipeline using flume, kafka and spark streaming that will enable you to fetch twitter data and analyze it in hive.

Read More →

Hadoop Hands-on workshop organised together with ba4all and MapR

Hadoop Hands-on Workshop

Read More →

Happy Birthday, Hadoop: Celebrating 10 Years

index

Read More →