Hadoop Matters Blog

Big Industries' blog

How to read compressed mnist data when using Tensorflow on Hadoop

tensor-flow_Machinelearning.png

Our team at Big Industries has been recently working on implementing TensorFlow open-source deep learning system on a Cloudera Hadoop Cluster. We found out that one of the challenges was trying to read the compressed MNIST data files from the Hadoop File System (HDFS). The example code that comes out of the box with TensorFlow assumes that the compressed files GZIP format reside on a local filesystem:

Read More →

Cloudera User Group: Big Data analytics

Cloudera User Group.jpg

  

 

Belgium Cloudera User Group Meetup

Big Industries, as main sponsor of the Belgium Cloudera User Group, organised on Wednesday May 31st, 2017 a Meetup in our offices at Cronos in Kontich with Big Data Analytics as central topic.

Read More →

Apache spark market survey

top use cases for spark.jpg

 

Read More →

Infrastructure as Code: Managing Servers in the Cloud

Managing Servers in the cloud.png

Read More →

Big Data Architectures: beyond hadoop

Big_Data_Architectures_beyond_hadoop.png

 

Read More →

Why Do I need a Data Lake?

What is a data lake?

A Data Lake is an enterprise-wide system for storing and analyzing disparate sources of data in their native formats. A Data Lake might combine sensor data, social media data, click-streams, location data, log files, and much more with traditional data from existing RDBMSes. The goal is to break the information silos in an enterprise by bringing all the data into a single place for analysis without the restrictions of schema, security, or authorization. Data Lakes are designed to store vast amounts of data, even petabytes, in local or cloud-based clusters consisting of commodity hardware.

Read More →