Data

Alpha Data: the revolution in financial services

How AI-powered 'alpha data' is overtaking traditional data sources, and why charging your phone could help you take out a loan.

Michael Lewis’s 2014 book Flashboys investigated the phenomenon of High Frequency Trading, and the lengths that financial organizations go in order to out-compete the competition. Like spending $300 million to build an 827-mile fiber-optic cable that cut straight through mountains and rivers from Chicago to New Jersey – with the sole goal of reducing data transmission time from 17 to 13 milliseconds. It wasn’t crazy either. According to various insiders, that 4 millisecond time advantage made traders billions.

This is an example of ‘alpha’ – the ability of financial companies to find an edge that they can leverage against competitors and outperform markets.

Today, the infrastructure battle has leveled-off. The new alpha lies in the changing nature of data, and advances in AI that can identify probabilistic structure and patterns in complex data at super-massive scale and speed.

For decades, the finance sector recruited the best maths & science talent from top universities and set them to work on complex regression techniques in order to model and forecast the future. Feeds into Reuters and Bloomberg terminals was their bedrock data and the ‘edge’ was the quality of the talent they were able to recruit.

But times are changing. As Yann LeCun, Professor of AI at New York University and VP of AI at Meta Group states: “Most of the data in the world will be created by machines and will reside within machines, so we need new techniques to access and interpret what’s going on”.

This is where the edge is today: the new world of complex, automated, heterogeneous unstructured data at massive scale. It can be a trove of new opportunity for forward thinking organizations, or an existential threat for those that cannot make the pivot.

The data explosion in financial services

The big change for Banking and Financial Services is the volume of what’s called ‘alternative’ data: data that isn’t from traditional sources. That is often non-numerical (text, speech, image, geospatial), unstructured and semi structured, and can come from almost anywhere: customer interactions, social media, news articles, financial announcements. According to Grandview research, the alternative data market is now worth US $11.65 billion, and is projected to grow at a compound annual growth rate of 63.4 per cent from 2025 to 2030.

The expansion of artificial intelligence (AI) and machine learning technologies is fueling this growth, by making it easier to turn this complex data into applied insight. Alternative data sources, which are often unstructured and come from various formats, are processed and analyzed with the help of AI to uncover patterns and trends that were previously difficult to detect.

As AI and machine learning become more sophisticated, they will enhance the value of alternative data by providing deeper insights faster, especially in areas like customer intent, sentiment, changes in behavior and other fluctuating market dynamics.

The growing volume of data generated from the ‘Internet of Things’ via connected devices, sensors and wearable technology, will provide companies with an even deeper understanding of personalized consumer behaviors.

Think of domestic car insurance, as an example. Pricing models were built on historical actuarial data, with risk calculated annually to generate the next year’s premium. Now, geospatial data from a black-box in the car generates real-time driver data unique to every individual and premiums for new drivers can be adjusted on a monthly (or even weekly) basis as the insurer begins to understand their driving patterns. This allowed insurance disruptor Ingenie to create a market just for learner and new drivers, an area traditionally too complex for many established insurers.

Who's ready for alpha data?

Things always move fast in financial services – so it’s hardly surprising that three quarters of financial services institutions are already using AI in one way or another, with the rest planning to very soon. But the application of AI is still some way behind the potential. For all the excitement and buzz, most still haven’t worked out how exactly to grasp this opportunity fully – and how to find the alpha in new data sources. Nearly two-thirds say the impact of AI on their work has been limited, mainly dominated by productivity improvement, rather than reinventing business models or designing new products and services. Most financial services are still ‘stuck in the AI sandbox’ as IBM puts it.

This is a very costly missed opportunity. Because the revolution in data science has the potential to transform the industry: it’s perhaps the biggest revolution in risk analysis since Scottish Widows published the world’s first ever actuarial report, 'The Rise and Progress of the Fund', nearly 300 years ago.

The age of ‘alpha data’ is exciting, since its animating principle is experimentation. It’s about finding and using new – and sometimes unexpected – data to spot novel patterns, drive personalisation, and create new dynamic products.

The practical application of alpha data is best understood through example. Take lending. Typically, lending institutions, such as banks, will ask applicants a series of questions, which are turned into feature-predictions and ultimately a credit score. An algorithm will then predict how likely that person is to repay the loan, and place them in one of a dozen or so pre-designed risk categories. The applicant will be made an offer (or not) based on whatever bucket they’ve landed in.

This model has served us well for decades. But for how much longer?

Alpha data offers a very different way of approaching the problem. Consider the Chinese company Smart Finance, which offers AI-driven micro-loans. They don’t use a standardized questionnaire or credit score to measure risk. Instead, they grab as much data as they can about applicants – currently about 1,200 data points per person – and let a machine learning algorithm figure it out.

Rather than asking a series of carefully designed finance related questions, these data points are from almost anywhere – their history of loan repayments of course – but also things we humans would assume are irrelevant: how charged the applicant’s phone is; number of errors corrected in the application; what other apps are on the user’s phone; and so on.

By analyzing 1,200 data points per person – plus repayment records for millions of customers – Smart Finance has generated around 100,000 risk profiles for customers. In February 2024 alone they granted 1.2 million loans: and making a decision on each applicant takes just eight seconds.

Unsupervised Machine Learning. Let the algorithm explore the data...

In a world of millions of connected devices, billions of signals and trillions of data points, data is now too large and unstructured for traditional applications to cope. As many businesses will tell you, the problem is no longer having data, it’s sifting the signal from the noise. An AI technique called ‘unsupervised learning’ tries to spot patterns, anomalies and change-points in data beyond human scale, without humans giving a specific objective – allowing patterns and correlations to emerge from the data itself.

An unsupervised learning algorithm independently finds structure in unlabeled data, removing the bias and heuristics inherent in much human decision making. It could be in repaying loans, creating risk categories, or identifying good drivers. And it will find things that a human would never think of – and wouldn’t have the time to calculate even if they did.

That’s how Smart Finance realized that phone battery level was a good indicator of whether a person would pay back a loan. No human analyst figured out it was the mark of an organized person. A machine had trawled through millions of data points from people with loans, and spotted it.

Smart Finance – and countless other challenger banks in China, and beyond – is an example of how newcomers in well-established markets suddenly have an upper hand with unsupervised learning techniques. They have lower sunk costs, fewer legacy systems that can slow things down, and are willing to try new approaches. In countless industries where pricing risk is a key part of decision making – think insurance, think investment management – it’s the same pattern.

The new world of alpha data

Alpha data opens up a world of opportunity. A new way to find patterns that you never knew were there; based on data you never realized was relevant. There is so much data out there that can be brought to bear on understanding new trends.

Every investment firm of course uses Bloomberg and Reuters market data in their analytics – and have done for decades now. But how about modeling this against weather patterns (an increasing headache for insurance providers)? Macro-economic trend data? Foot fall data? Sports results? LinkedIn posts? Open source government data? Or data sources that you or I couldn’t possibly imagine could move or make markets?

This is why 40% of Goldman Sachs analysts are now computer and data scientists – not professional traders. They’ve realized that understanding data is the key to making smarter decisions. Goldman estimates that one data engineer can now replace 5 traders. ‘The masters of the universe’ used to be brokers. Most people haven’t yet realized they are now data scientists.

Here’s one example I was involved in. A little while back I worked with an insurance firm, helping them make the most of their data. Not being a subject matter specialist was an advantage, because I didn’t really know what I was looking for. But the data itself revealed something no-one expected: a certain proportion of drivers were following highly unusual, but extremely regular, routes every day. It turned out they were all delivery drivers – operating on domestic rather than business insurance.

The company was then able to transfer them to the correct business insurance, generating additional revenue. The same principal is used to identify moped drivers (most of which are learners) working for fast-food delivery services.

A major advantage alpha data offers is the potential for deeper personalisation of products and services. The pre-made buckets and categories most large financial services institutions use to measure risk (‘sub-prime loan prospect’ or ‘high-risk driver’) can be replaced with far more precise offers for each individual, based on their own unique behaviors.

As I already discussed, car insurance can be tailored to an individual’s actual driving rather than their age and postcode. (Which is good news if you’re a very diligent 17-year-old who drives a fast car). Decisions based on real-time dynamic personalized risk profiles, not generalized, historical systems.

There are data privacy issues of course, particularly in highly regulated sectors and with GDPR and the EU AI act. Collecting data in China is easier than the UK – which is one reason why Smart Finance started there. But having worked in data science for years, I think most of these issues can be resolved. We are moving quickly to products and services that provide high-levels of abstraction, allowing complex processes to be operationalised at scale, so that companies can extract maximum value from their enterprise data in ways that were previously not possible.

For years we have spoken about the importance of being ‘data-led’ and the product and service innovation that comes from this. Small, agile tech-disruptor firms are developing sophisticated alpha data techniques – and developing increasingly personalized and dynamic products.

In a highly competitive world where consumers now expect products to be tailored to their needs, those who don’t take advantage of this new approach will find themselves uncompetitive and unattractive sooner than they think.