Big Industries, a member of the Cronos Group, is Belgium’s leading Big Data & Analytics systems integrator delivering technology solutions at scale. We offer full-service expert consulting for Architecture definition, Data Lake on premise or Cloud Deployments, integration and operation, advanced DevOps, Data Engineering and Data Driven Application Development. We hold strong partnerships with leading Cloud Providers AWS and Azure, Hadoop vendor Cloudera and Confluent, the company supporting Apache Kafka.
We Love Big Data
Big Data means lots of data in all kinds of forms. Raw data, structured data, streaming data, …. Many Use Cases drive the need for next generation Data Lakes which can handle the large volumes of (streaming) data like sensors, smart meters, machine logs, … combined with structured data offloaded from traditional Data Warehouses.
In short: Customers come to us when they want to start their Data Lake Journey.
In the current big data world, there are two main roles that stand out in junior profiles. A DevOps role and a Data Engineering role. In the internship you will immersed in both roles to give you a broad overall experience.
The assignment will evolve around an end-to-end project that you will develop and deliver. The first step of the assignment is creating a Data Lake and ingest a large open dataset. This could be achieved with both streaming and batch technologies. By setting up the infrastructure of the Data Lake you will encounter several AWS networking tools as well as the concept of serverless computing and streaming. The next step will be more focused on the data engineering role. You will explore, clean, structure and analyze the data. In a third step you will have to visualize the data.
In the assignment there is room for improvisation and we would definitely love to see some personal touches in the project. The solution you develop will form the basis for a technical blog post on our website. In this way you can show off your work to the big data world.
During this internship you have the chance to work with cutting-edge technologies that matter today. You are going to use the AWS stack as the basis of your project. An overview of the technologies and tools you are going to interact with:
- Networking: VPC, Subnet, Security Groups
- Serverless: Lambda
- Data lake: S3
- Streaming: Kinesis
- Data exploration: Glue
- Data Analytics: Athena
- Data Visualization: QuickSight
- Data analytics/Batch processing: EMR (Hadoop/Spark/Zeppelin/PySpark)
- Monitoring: CloudWatch | SES
- Version control: Git (Gitlab)
- CI/CD: Gitlab Pipelines, Ansible, CloudFormation
The technologies and tools listed above are a good indication of what you are going to use, but it’s definitely not written in stone. There is always room for alternatives and your opinion is also important to us.
In our vision the goal of an internship is to create a real-world working experience. To achieve this, you are going to deliver an end-to-end project that will be approached in an agile way. This means that you, in close collaboration with your mentor, will tackle the project in sprints. Each sprint will start with a sprint planning where we define the user stories and goals we want to achieve and ends with a sprint retrospective.
What do we expect from you?
Interns are expected to be highly-motivated with a healthy appetite for big data and problem solving.
We expect you to be a self-starter, who can think outside the box and be able to manage a small project. And don’t forget, the most important thing is learning a lot for yourself.