28.06.22
28.06.22
28.06.22
28.06.22
28.06.22
14.06.22
14.06.22
14.06.22
14.06.22
Where is the value in analysing data?
If you’re testing a horse against a car, you don’t need an A/B test.
→
14.06.22
24.05.22
Towards general intelligence
DeepMind have built a new ML model that can do lots of tasks, called Gato.
→
24.05.22
24.05.22
24.05.22
24.05.22
25.04.22
25.04.22
25.04.22
25.04.22
25.04.22
01.03.22
A visual introduction to machine learning
A visual and intuitive introduction to some tricky concepts
→
01.03.22
Data distribution shifts and monitoring
One of the many ways your machine learning model can go wrong
→
01.03.22
01.03.22
01.03.22
23.02.22
21.02.22
19.02.22
Low cost MRI scans using deep learning
Using deep learning to remove electromagnetic interference in MRI scanners
→
17.02.22
15.02.22
09.02.22
07.02.22
Siuba: Dplyr style dataframes in Python
Pandas' group by operations are a pain to use. Siuba fixes that.
→
05.02.22
Healthcare AI is stuck in POC hell
Machine learning has been shown to be accurate in many areas of healthcare, but clinical usage of ML is still very low.
→
03.02.22
Ggplot2 style plots for Python Seaborn
Seaborn was already my favourite Python visualisation library - it just got better!
→
01.02.22
26.01.22
24.01.22
22.01.22
Post theory science?
Machine Learning is being used to drive scientific process. Is it changing how science works?
→
20.01.22
How does TikTok's algorithm work?
TikTok, more than any other company, is built on it's recommender. Here's how it works (sort of):
→
18.01.22
There's no such thing as raw data
The promise of deep learning was to eliminate feature engineering pipelines. That's probably a myth.
→
15.12.21
13.12.21
How not to sort by average rating
You are a web programmer. You have users. Your users rate stuff on your site. You want to put the highest-rated stuff at the top and lowest-rated at the bottom. You need some sort of “score” to sort by.
→
11.12.21
09.12.21
Validation for Pandas dataframes
Pandas data frames are used widely by data scientists, Pandera adds the ability to validate them.
→
07.12.21
DeepMind teach an AI pure maths
The clever people at DeepMind have built a system to assist pure mathematicians.
→
06.10.21
How to use ML in your business
Using ML in business raises many considerations outside of the normal content of ML textbooks.
→
04.10.21
Generic AI gives generic results
Machine learning models tend not to generalise from one task to another. Buying access to generic AI solutions is unlikely to give good results.
→
02.10.21
30.09.21
Just talk!
A child learns to walk before they can run, and to talk before they can read. Maybe AI should do the same?
→
28.09.21
Start without machine learning
Don't head straight for a machine learning solution, first deploy something simple.
→
28.09.21
Start without machine learning
Don't head straight for a machine learning solution, first deploy something simple.
→
18.09.21
Now they are called foundation models
Stanford University have set up a new group to study what they are calling foundation models.
→
17.09.21
What kind of bird makes that sound?
If you have ever wondered how to identify a bird from its call, this ML powered tool is for you.
→
16.09.21
IBM Watson never really existed
A decade ago IBM built an AI system called Watson that won the quiz show Jeopardy. It was supposed to change the IT giant's fortunes. What happened?
→
15.09.21
Should I be using a data mesh?
Data Mesh is the latest new buzzword in data architecture - what is it?
→
14.09.21
New SQL package from the maker of FastAPI
A new package makes integrating SQL databases with Python code easy.
→
22.06.21
Don't use deep learning for tabular data
Most business data is tabular (think Excel or SQL type data), and deep learning is generally not the best tool for modelling it.
→
22.06.21
The 5 types of recommender system
Personalisation and recommendation are one of the most most effective applications of machine learning.
→
22.06.21
Easy data transformation with dbt
Data Build Tool (dbt) is a tool that allows easy data transformations.
→
22.06.21
Do algorithms dream of simulated sheep?
Dreams are a way to stop the brain overfitting, according to a new theory.
→
22.06.21
Open super big language model
A new, open, alternative to the super big language model GPT-3 has been released. It's got 6 billion parameters and is trained on The Pile using Jax. It wrote the following about itself
→
25.05.21
ARGH! AI is going to kill us all!
The dinosaurs were killed-off by an asteroid. We'll probably be killed-off by ourselves.
→
25.05.21
The really open NLP fightback
Large language models have become the property of big tech firms. Open science is starting to fight back.
→
25.05.21
How to learn MLOps
Coursera and Andrew Ng have launched a new course on machine learning engineering for production or MLOps.
→
25.05.21
Tools to make things happen at the right time in the right order
Making things happen at the right time in the right order is much harder than it ought to be. Dagster is the latest attempt to make it easier.
→
25.05.21
11.05.21
The 28 billion dollar private AI companies
There are 28 privately held ML, AI and data companies, that are nearing IPO. Their total valuation is around $119B.
→
11.05.21
Should we stop hiring data scientists?
Over the last decade, companies have hired thousands of data scientists. These teams often fail to deliver.
→
11.05.21
Keep it simple! Deploy a model on a single machine
Deploying ML models doesn't always need GPUs or Kubernetes clusters. Sometimes a simple, single CPU machine is plenty.
→
11.05.21
Your brain is an internet
The brain has been compared to a plumbing system, a watch, a computer and now the internet.
→
11.05.21
What the hell is a feature store?
A feature store is an emerging concept for storing and retrieving data in Machine Learning applications.
→
27.04.21
New EU AI laws
The European Union has published a proposal for a new set of laws regulating the use of artificial intelligence.
→
27.04.21
Microsoft buys $20bn speech recognition firm
Microsoft has announced it will buy the speech recognition firm Nuance Communications for $20bn.
→
27.04.21
What's wrong with MLops
MLOps is the practice of building and maintaining production machine learning systems. It's new, and it's not all going well.
→
27.04.21
First, deploy your model
Should the first step in a machine learning project be to build a production system?
→
27.04.21
What is attention?
Why do some things grab our attention while we ignore others? What is going on in our brains? A recent article by AI/ML researcher Ekrem Aksoy attempts to describe the latest thinking in both neuroscience and machine learning.
→
13.04.21
Billion dollar computer vision startups
Computer vision is where it all began. In 2012 Geoff Hinton and his team built a neural network that massively out-perfomed all other approaches to recognising images. This breakthrough started the deep learning revolution.
→
13.04.21
Ray: Easier distributed computing
Ray is a framework that makes distributed computing using Python easy to set up and run.
→
13.04.21
Andrew Ng says: "Sort out your data!"
Machine learning is algorithms + data. A lot of focus goes on improving algorithms, not enough goes on improving data.
→
13.04.21
Video - the next frontier of AI
We perceive the world more like video - a stream of audio and visual signals from a single point of view - than any other medium. Most internet traffic is video and a large proportion of time online is spent looking at video.
→
13.04.21
Easier text to speech with Hugging Face + Wav2Vec
Alexa, Siri and the rest have made speaking to computers a natural thing to do. The big tech firms have spent lots of money creating automatic speech recognition (ASR) models that convert speech to text.
→
15.12.20
What is the protein folding problem?
Last week, Google offshoot Deepmind announced it had effectively solved the "protein folding problem".
→
15.12.20
Infinite Bad Guy
The best minds of my generation are working out how to make an infinite loop of Billie Eilish cover videos.
→
15.12.20
Data platforms
What's the difference between a data lake and a data warehouse? An analyst or a scientist?
→
15.12.20
Ethical implications of AI
It can be helpful to categorise AI-driven tasks according to their complexity and ethical implications.
→
15.12.20
Rapid antigen testing?
Recently the UK government tried testing everybody in Liverpool for Covid-19. Some of this testing was done using fast turnaround rapid antigen tests.
→
01.12.20
Learning like people
Until recently, models learnt using labelled data and humans learnt using language. This is starting to change.
→
01.12.20
Neurips 2020
NeurIPS Thirty-fourth Annual Conference on Neural Information Processing Systems takes place next week. This year you can get a ticket (virtually).
→
01.12.20
scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn
Almost all data has some kind of time-related element. Often this is ignored.
→
01.12.20
AI for enterprise platforms
There has been lots of recent activity amongst 'AI platform' companies planning IPOs: see Databricks, Data Robot, Domino, Palantir, and now C3.ai.
→
01.12.20
17.11.20
17.11.20
Frantic man live codes a neural network library
Joel Grus live codes a neural network library in Python.
→
17.11.20
17.11.20
17.11.20
25.10.20
Cambridge Analytica had no role in Brexit
Cambridge Analytica had no impact on Brexit and simply did bog standard online targeted advertising work, according to a report from the Informalities nation Commissioner.
→
25.10.20
Data infrastructure is maturing
As data becomes the key to businesses success, the infrastructure that supports it is maturing.
→
25.10.20
A new way to understand reinforcement learning
Reinforcement learning can be considered a type of supervised learning.
→
25.10.20
GPT-3 performance on a laptop
A team from Germany has bucked the recent trend for ever bigger NLP models by using a clever design.
→
25.10.20
AI system helps stop sepsis by working with humans
Building AI systems in healthcare is hard because it involves interacting with and changing complex human organisations.
→
11.10.20
What is it like to be a smartphone?
Can we say for sure a smartphone isn't conscious? What about a bacteria? A bat?
→
11.10.20
Keep your brain on its toes
A Google employee was bored with how great his life was (poor thing!) so he stared leaving lot of decisions to chance. His life suddenly got even better. Woo hoo!
→
11.10.20
AI investor reviews 2020
It's been a funny old year so far, and apparently it's already time to start doing roundups. Last week AI investor Matt Turk produced his review of AI and data in 2020.
→
11.10.20
11.10.20
AI in healthcare lags behind
Very little progress has been made in modelling electronic healthcare data in recent years.
→
27.09.20
Machine learning engineering book
A new book has been published about building machine learning systems.
→
27.09.20
Making language models more efficient
In the last few years, transformer based language models like GPT-3, Bert and others have revolutionised natural language processing. Here is how to make them more efficient.
→
27.09.20
How much computation does a brain need?
How much computation does a brain use? According to a new estimate its about 1014 - 1017 FLOP/s.
→
27.09.20
DuckDB - a lightweight analytics database
DuckDB is a new lightweight database, designed to support data science.
→
27.09.20
Neil Ferguson on modelling Covid-19
Prof Neil Ferguson ('Professor Lockdown') gives an overview of his life in science.
→
13.09.20
Why it's hard to scale AI
Businesses that use AI in their products are harder to scale, argues a recent article from US based VC Andreessen Horowitz.
→
13.09.20
Types of data visualisation
Engaging and popular visualisations are not always the easiest to read. Andrew Gelman always has useful things to say. In a Wired article he talks about different types of data visualisation. The best ones often drawn in the viewer and make them engage in the scientific process as if discovering something for themselves.
→
13.09.20
Covid corner: A levels algorithm fiasco
The recent A level results fiasco gives data science a bad name. It's a really useful illustration of how important it is to consider how algorithms can propagate existing inequalities.
→
13.09.20
How to do speech recognition with Python
Easily convert speech to text in Python with the SpeechRecognition library.
→
13.09.20
Embodied AI at Facebook
Embodied AI involves making software based agents that interact with and operate in the physical world.
→
16.08.20
The field of natural language processing is chasing the wrong goal
A recent article by NLP researcher Jesse Dunietz argues that the rapidly evolving field of Natural Language Processing is getting better and better at solving benchmark problems while not actually building much that is useful.
→
16.08.20
Git for data: not a silver bullet?
Recently a lot of effort has gone into improving the quality of 'production' machine learning system. One approach that has gained a lot of momentum and that I am a fan of is Data Version Control- a kind of git for data.
→
16.08.20
The U.S. has artificial intelligence competition all wrong
AI competition is about data, algorithms and compute power. This article argues that compute power is the most important of those.
→
16.08.20
Police built an AI to predict violent crime. It was seriously flawed
Police forces around the world keep trying to use machine learning to 'predict' crime. It rarely ends well.
→
16.08.20
Covid Corner: UK infection data APIs
Public Health England have released a new API (available for Python, JavaScript, R, .Net and Elixir) that allows direct access to Coronavirus data. Good news. This should make it easy to play with, interrogate and monitor data.
→
02.08.20
Shortcuts: How Neural Networks Love to Cheat
If you’ve ever built a machine learning model that failed to work in production, it might have been taking a shortcut.
→
02.08.20
Regulating Technology
Because of the growing importance of tech in our lives, and some big failures, there are growing calls to increase government oversight and regulation. According to this article, it's not going to be easy to do well.
→
02.08.20
Continuous Machine Learning
CML (short for 'Continuous Machine Learning') is a new piece of software from the makers of DVC (Data Version Control) that makes it easier to use continuous integration approaches in machine learning projects. Continuous integration is an approach to building software where all members of a team integrate their work regularly. Integrations are verified by automated builds including tests so that errors get picked up quickly. This approach can make it much easier for teams to work together on larger projects.
→
02.08.20
How to build RESTful APIs
A REST API is a common method for interacting with a computer program over the internet. REST (short for representational state transfer) is a set of guidelines and principles for creating Web services.
→
02.08.20
Neural Nets with mixed data types
pytorch-widedeep is a python package from London based machine learning specialist Javier Rodriguez Zaurin. It can be used for building neural networks that use structured (tabular) data combined with images and text. It is based on Google’s 'Wide and Deep' algorithm and uses the PyTorch framework. The algorithm was first used to power the recommender system of Google's Android Play store.
→
10.03.20
Covid Corner: Test results
Paper from the BMJ with an interactive infographic on Covid-19 testing. Useful for understanding diagnosis, and classification in general. Interpreting test results is something doctors, patients and the rest all struggle with.
→
10.03.20
Cybersecurity data science: an overview from machine learning perspective
Useful survey of cybersecurity in data science. There is an increasing trend for using big datasets to automate and augment aspects of online security.
→
10.03.20
Facial Recognition and Policing
A man is wrongly accused of a crime by a facial recognition algorithm. The man is black, but the algorithms were largely trained on white faces. Where did things go wrong?
→
10.03.20
How to teach yourself AI
The brilliant Professor Andrew Ng teaching his Stanford class on how to teach themselves AI. Really useful meta-guide on an often neglected topic - useful teaching yourself anything.
→
10.03.20
Innovation: Why Does DARPA Work?
An in depth look at how and why the USA Military research organisation has been so successful at supporting innovation. DARPA is an outlier organization in the world of turning science fiction into reality. Since 1958, it has been a driving force in the creation of weather satellites, GPS, personal computers, modern robotics, the Internet, autonomous cars, and voice interfaces, to name a few. ... Which emulatable attributes contributed to DARPA’s outlier results? What does a domain-independent “ARPA Model” look like? Is it possible to build other organizations that generate equally huge results in other domains by riffing on that model?
→
19.11.19
4 Things I Learnt at ODSC Europe 2019 (you may disagree with 4)
ODSC logo I went to the Open Data Science Conference in London last week. I really enjoyed it. There was lots of really practical, technical content and not too much bullshit/sales and lots of up to date data science.
→
31.10.19
My attempt at standup comedy
This week I performed a standup comedy show, this is the story of how I got there...
→
24.09.19
CRAP talk on machine learning
On Tuesday I spoke at CRAP Talks MCR. CRAP is a snappily titled meetup about 'Conversion Rate, Analytics and Product'. There was a range of people in attendance mostly working either in e-commerce or digital agencies. I'd been once before and I liked the range of topics and mix of people there: a bit off my usual machine learning and data science beat.
→
03.09.19
Machine Learning 101 Talk
I gave this talk at the Barclays AI Eagles Lab event on August 30th. The aim was to give a gentle introduction to some ideas of machine learning, using some GCSE (an English age 15 school qualification) maths. Its aimed at people without much ML background and is deliberately fairly simple. I started with a straight line as an example of a very simple mathematical model and built up some ideas about learning from data, non-linearity, and learning functions. I gave some advice on when to use and not use ML and some examples where I have seen things work. It seemed to go down fairly well at the event.
→
12.07.19
How to do feature engineering in R with the recipes package
I was excited to start using Max Khun (creator of Caret's) new set of 'tidymodels' packages - rsample, recipe, yardstick, parsnip and dials. These are still under development but seem promising. The one I have so far found most useful is recipe. Here I'll give a quick overview of how you use it to do some simple data preparation for machine learning.
→
29.03.19
What is Data Science?
Data Science is a young field and it means different thing to different people. Here I'll give my take on what it is and isn't.
→
22.03.19
Data Summit, Edinburgh, 2019
I have just been to the Data Summit in Edinburgh, I hadn't been before. It was a good event - a single track of fairly high quality speakers on topics around data, tech and AI organised by the DataLab. Here are some of notes on a few of the speakers that stood out to me.
→
15.03.19
Welcome to Ortom
I've set up Ortom with the goal of providing high quality data science services to any organisation. At the moment it's just me. I've been doing data science for the last 10 years in one way or another, most recently as head of data science at Peak.
→