Plate notation is a useful visual method for describing graphical models, but the software can be awkward. Here we demonstrate daft-pgm, a solution using pure Python.
It's 2016 and most financial services companies are at least starting to implement a data science capability, here's nine questions to define the maturity of yours.
Several years into the fintech revolution, the insurance world is waking up to the disruptive possibilities of new technologies. So what's hype and what's actually useful?
Over the past year we've refined this simple model to help map, evaluate and improve our clients' data science capabilities, it might work for you too.
In the final article of this technical series we demonstrate hierarchical linear regression using PyMC3 to compare vehicle NOx emissions for a range of car manufacturers.
In the second article of this technical series we demonstrate the flexible syntax of PyMC3 with regularized linear modelling of car emissions data and model evaluation.
Bayesian inference bridges the gap between white-box model introspection and black-box predictive performance. This technical series describes some methods using PyMC3, an inferential framework in Python.
To follow our post on technical user groups, here's a hat-tip to meetups & conferences throughout UK, Ireland and Europe that we enjoyed attending in 2015.
Practical data science projects often include an aspect of anonymisation to carefully remove sensitive information prior to analysis; here we demonstrate several complimentary techniques and principles.
In this technical article we explain why and how to use Singular Value Decomposition (SVD) for feature reduction: making large datasets more compact whilst preserving information.
Data science doesn't just lead to insights and products: here we define SPEACS, a generalised analytical process that highlights the many business benefits at every stage.
Visualising data is important for aiding intuition & good understanding, but high-dimensional datasets can be hard to display. Here we demonstrate techniques to tackle the issue.
Like any collaborative business effort involving research & development, a data science function should be built carefully in order to enable the best expertise and technologies.
Here we demonstrate a standard semi-parametric regression method to create a model of harddrive failures. This model can be tested for accuracy and used for prediction.
We now have a clean, prepared, real-world dataset regarding the failures of thousands of harddrives, lets see what we can learn from a basic survival analysis.
We've reviewed the basic theory of survival analysis and discussed why it's a useful technique; now lets acquire, explore and prepare a real dataset for analysis.