Finding links in data: It’s all about the WHYs

Published July 26, 2021   |   
Team Crayon Data

We can complicate discussions on data till eternity. But at the root of it all, data can help us solve 2 primary clusters of problems.

  • Finding patterns
  • Establishing a causal link

Finding patterns is the easy part. Machines, algorithms, and even ‘analysts who do not grasp the business’ are all very good at this. Common instances include search engines, image analytics and suggestions to add your neighbor as a friend on social media.

On a large scale, you can dissect the video of a 100-member orchestra to find which violin was a scale lower, or who had a food stain on their dress. But, It takes an exceptional algo/machine to determine if the violin is playing the artist or the artist is playing the violin. Is the melody because of the violin, the player, or the hall?

Leaders need answers from data, not patterns

Cut to the world of financial services. Business leaders are often frustrated that they are being shown patterns when they’re actually looking for answers. They want to know what is causing their issues and what they need to do about it. Despite client acquisitions being on target, balance sheet targets get missed. Sales productivity is sluggish, but the competition seems to be doing well. Data science is often subject to many such problems. And the outcome is usually, “Nothing special came out of it” or “It stated the obvious”.

‘Causal Inference’ in terms of establishing X -> Y is a very deep science. To determine if X causes Y and, if yes, the strength of X in determining Y usually takes a lot of experiments and subsequent math. Pharma is the king of such experiments, but these are invariably difficult to simulate in the world of finance.

Building experiments in causal links

A common way to look at it is through a series of counterfactuals, like measuring the instances when X was absent, yet Y happened. Or vice versa. Firms naively launch a group of data scientists around this problem. They can see Y as an apparent result, but do not know where to look for their library of Xs. They struggle to find scenarios that can help them simulate their experiments.

Take this problem. A portfolio of clients who were doing investments suddenly get disengaged and passive. A deep understanding of financial returns, as well as performance in comparison to peers and benchmarks, is mandatory. A data science professional needs to run his experiments across returns, asset performance, risk-reward ratio, and diversification to help the business manager with some value. Make no mistake, the same person can calculate a crowded list of 15 ratios and show patterns, but it will not solve why they disengage!

Another common mistake is bringing in the obvious. Clients who revolve on their card have a higher chance of moving to a lower-priced 6-month instalment plan. Or if they engage on their app, they will pick installments. Identifying the right set of independent variables is as much an art as a science. Starting from unsupervised learning models to building a set of simple dashboards, the choice of uniquely independent variables is a very simple, yet commonly misunderstood step.

Yes, its indeed about the data. But don’t get in without answers to the WHYs.

More from the #BankingOnVidhya series here.