The Business Case
Millions of customers choose to bank with ING, and therefore use their ING accounts to transfer money and make payments. Apart from the transactions, ING also records personal data and, if you gave permission via cookies, your browsing behavior on the ING website.
By doing so, it makes it possible to give their customers tailored information and special offers. For example, information about savings or making down payments on your mortgage’ can be shared with customers who actually have a mortgage, and 15-year old customers will not receive offers regarding pensions.
ING has been looking at Big Data technologies to allow for the possibility to pool large amounts of data and make calculations based on this data that, until recently, would have required too much time. These new technologies have multiple uses, such as in helping to prevent fraud. Imagine if someone uses a debit card to make a payment in location ‘A’, while shortly after another payment is made 200 kilometers away with a debit card belonging to the same person. Using Big Data allows ING to contact the customer to find out if this constitutes fraud, and if subsequent measures (such as blocking the debit card) are necessary. Not so long ago, these kind of improvements in customer service were not possible.
Implementing these use cases is not an easy challenge, as managing more data sources, combining data streams, and – ultimately – taking both internal and external data and making sense of it is a complex task.
In ING’s case, to bring this data together, the firm operates several data lakes and enterprise data warehouses, where static (batch) and streaming data are processed. Raw data is archived, while structured data goes to enterprise data warehouses, and is fed into a Hadoop cluster for ad-hoc analytics. Open source technologies used are Spark, Scala, Kafka, Flink,…
Our consultants, as part of the dedicated Advanced Analytics Scrum team, were actively involved in securing the installation, configuration and setting up of the monitoring of the Hadoop environment. We were influential in designing the Lambda architecture to process and query efficiently large volumes of data. We worked together with Data owners in order to design and build Data Pipelines and developed in Java, Scala and shell to implement the distribution of machine learning algorithms on Hadoop and Spark. We teamed up with the bank’s data scientist to understand their desired algorithm behavior.