1 The internship
In the current big data world, there are two main roles that stand out in junior profiles. A DevOps role and a Data Engineering role. In the internship you will immersed in both roles to give you a broad overall experience.
1.1 The assignment
The assignment for the internship will evolve around an end-to-end project that you will develop and deliver. The first step of the assignment is creating a data lake of a large open dataset. This could be achieved with both streaming and batch technologies. By setting up the infrastructure of the data lake you will encounter several AWS networking tools as well as the concept of serverless computing and streaming. The next step will be more focused on the data engineering role. You will explore, clean, structure and analyze the data. In a third step you will have to visualize the data.
In the assignment there is room improvisation and we would definitely love to see some personal touches in the project. The solution you develop will form the basis for a technical blog post on our website. In this way you can show off your work to the big data world.
During this internship you have the chance to work with cutting-edge technologies that matter today. You are going to make use of the AWS stack as the basis of your project. An overview of the technologies and tools you are going to interact with:
- • Networking: VPC, Subnet, Security Groups
- • Serverless: Lambda
- •. Data lake: S3
- • Streaming: Kinesis
- • Data exploration: Glue
- •. Data Analytics: Athena
- • Data Visualization: QuickSight
- • Data analytics/Batch processing: EMR (Hadoop/Spark/Zeppelin/PySpark)
- •. Monitoring: CloudWatch | SES
- • Version control: Git (Gitlab)
- • CI/CD: Gitlab Pipelines, Ansible, CloudFormation
The technologies and tools listed above are a good indication of what you are going to use, but it’s definitely not written in stone. There is always room for alternatives and extensions.
1.3 Working environment
In our vision the goal of an internship is to create a real-world working experience for the student. To achieve this, you are going to deliver an end-to-end project that will be approached in an agile way. This means that you, in close collaboration with your mentor, will tackle the project in sprints. Each sprint will start with a sprint planning where we define the user stories and goals we want to achieve and ends with a sprint retrospective.
2 What do we expect?
Interns are expected to be highly-motivate with a healthy appetite for big data and problem solving.We expect you to be a self-starter, who can think outside the box and be able to manage a small project.