Life insurers are the original practitioners of data science

There are huge opportunities available to the life insurance industry to apply new statistical modelling techniques, recalling their original role in helping to advance data analysis.

Rapid advances in computer science over the last two decades now allow powerful statistical techniques to be applied by small teams on large datasets at low cost and in real time. It is my career experience that these advances have happened out of sight of the life insurance industry and are waiting to be discovered and put to good use.

In the coming years the full application of data analytics and machine learning will be essential if companies are to create and/or maintain enduring competitive advantages.


It's reasonable to say that life insurance was the first practice of commercial data analysis, with the first life tables being compiled by the astronomer and physicist Edmond Halley in the 1690's1. It was not until the mid 1700's however, that the mathematical tools were in place to allow Edward Mores to establish the world's first mutual life insurer based on sound actuarial principles.

If we cut forward 250 years to the present time, the fundamental techniques used to calculate premiums and monitor the financial health of a life insurance company remain broadly similar, albeit these days the heavy lifting is done mainly by computers.

The ability to properly assess the risk of financial products, the health of a population demographic and the impacts of macroeconomic trends is essential to the strategy and business of life insurance. It's also clear from the history of actuarial science that whilst professional standards and proven mathematical models are critical, it's also vital to use the latest computational tools and analytical techniques in order to advance the practice and stay ahead of the competition.

Technological Advances

The last five decades have seen constantly accelerating advances in computer hardware2. Today the smartphone in your pocket has the same capabilities as a 90's era supercomputer and the average desktop computer has computational power unimaginable twenty years ago.

Huge innovations have also occurred in how software is designed, developed, released, used and maintained. The move - particularly in academia - away from proprietary software toward open source alternatives has led to the rapid development of industrial-strength software tools freely available to all. Open source software has especially low barriers to entry, thus speeding up development cycles and ensuring it is seen by many eyes - ironing out bugs and defining many different use cases beyond those envisaged by the original creators.

Access to dramatically cheaper and faster hardware coupled with best-in-class open source software, enabled and enhanced by the internet, allows a virtuous cycle of rapid iteration in applied mathematical science. This process of publishing, revising and reusing software tools and techniques can seem turbulent to those bound by proprietary systems, but it can deliver tremendous power and flexibility and leads to great opportunities and serious threats to the status quo.

Enter Machine Learning

Machine learning is a field of computer science combining algorithmic data processing, expert learning systems and applied statistical modelling. It has always been present in computer science to some extent, and over the last 20 years of advances in hardware capability and interdisciplinary statistical research, it has really come to the fore.

In industry, the term 'machine learning' has become broadly interchangeable with other descriptions including predictive analytics, informatics, artificial intelligence and of course, data science. The overriding principle is that we human experts can design, build and utilise artificial systems that learn patterns from data and can be used to make predictions.

For example in the context of life-insurance, we might to use machine learning to:

  • Discover the general attributes of all our policy holders most indicative of their failure to maintain premium payments. This insight can be used to help take preventative action to restructure insurance products or offer payment-holidays etc
  • Understand the survival and risk profiles of the general population using census data, news reports, customer enquiries etc. This can be used to position the company's risk profile and approach to market
  • Model macro-economic cycles and the financial exposure of the various funds. This can be used to help position the company's balance sheet and reinsurance requirements.

We have found that, in most cases, the life company's existing data is of high quality and sufficient to apply most data analysis techniques. In effect this means that life insurance companies are sitting on large stores of under used data that can easily be put to profitable use.

There is a lot of discussion about using machine learning within life insurance, for example this 2013 report from McKinsey, and this article by The Boston Consulting Group. This is a young field and there are many pitfalls that can befall the unwary, these include: data acquisition & storage, security & privacy, experimentation & repeatability, depth of insight & knowledge transfer etc. No one solution or supplier will be adequate to cover all bases and it's wise to seek an impartial advisor.

In Summary

More than ever before it is possible to learn valuable insights from a wide variety and huge scale of proprietary and openly-available data and apply this knowledge through one-off analysis and ongoing systems to everyday business operations and strategy to generate a competitive edge.

Due to an understandably conservative approach, the insurance industry and particularly the life-insurance industry has been slow to implement the advantages of a machine-learning / data-science approach to improving their operations. It's worth remembering though, that the entire industry was born from breakthroughs in applied mathematics, and the rapid advances in today's world mean that life-insurers must endeavour to keep up or be swept to the side by entities more adept with handling and quickly learning from large volumes of data.

  1. Halley was involved in a surprising amount of analytical research and as a contemporary of Flamsteed, Hooke, Wren & Newton etc. made a tremendous contribution to mathematics. This book is a good place to start reading more.

  2. Intel's Gordon Moore was remarkably prescient: his observation that processor transistor count densities double approximately every 2 years - an exponential expansion - has held quite stable since the 1970's.

Michael Crawford

Michael trained as an actuary and has over 20 years experience designing, developing & managing bespoke IT solutions for financial companies in Europe.