Ortom | Data Science & AI consultancy based in Manchester

RSS

28.06.22 The end of Big Data

**Databricks is a $38 billion dollar mistake. **

Read→

28.06.22 Is AI really starting a new scientific revolution?

AlphaFold is great, but is it a revolution?

Read→

28.06.22 The worst AI ever

Man trains a bot on 4-Chan.

Read→

28.06.22 The imitation of consciousness

Have we passed the Turing test already?

Read→

28.06.22 There’s no such thing as data

Data's not the new oil: it's worthless.

Read→

14.06.22 Neon: Postgres as a service

Setting up a database should easy

Read→

14.06.22 Peter Norvig data science book

Data science need some grounding principles

Read→

14.06.22 Where is the value in analysing data?

If you’re testing a horse against a car, you don’t need an A/B test.

Read→

14.06.22 An analytics setup guide

You probably need this analytics setup

Read→

14.06.22 Unbundling unbundling

Stop the MLOps tools! All you need is SQL

Read→

24.05.22 Towards general intelligence

DeepMind have built a new ML model that can do lots of tasks, called Gato.

Read→

24.05.22 Human in the Loop AI

AI is used mostly for reducing routine human work

Read→

24.05.22 Supervised clustering for better cluster analysis

Making useful clusters using SHAP values

Read→

24.05.22 A new approach to the Travelling Salesperson Problem

Faster solutions using graph neural networks

Read→

24.05.22 Learning Transformers

All you need to understand the Transformer neural architecture

Read→

25.04.22 Do you use an optimisation solver?

Choosing a tool can be hard

Read→

25.04.22 Playing with DALLE 2

What's the point in DALL·E 2?

Read→

25.04.22 A year with R

Does base R suck?

Read→

25.04.22 Unravelling complex projects

Some tips for getting to know a new software project

Read→

25.04.22 What is churn?

How do you define and measure churn?

Read→

01.03.22 A visual introduction to machine learning

A visual and intuitive introduction to some tricky concepts

Read→

01.03.22 ML experiment tracking with Guild AI

A lightweight approach to experiment tracking

Read→

01.03.22 Data distribution shifts and monitoring

One of the many ways your machine learning model can go wrong

Read→

01.03.22 Push-Ups with Python!

Computer says: drop and give me 50!

Read→

01.03.22 Serving many models

How do you serve and manage thousands of ML models?

Read→

23.02.22 Learning prompts for language models

Write a short summary of language model prompting techniques:

Read→

21.02.22 How does a data business work?

What is a data business? How does one work?

Read→

19.02.22 Low cost MRI scans using deep learning

Using deep learning to remove electromagnetic interference in MRI scanners

Read→

17.02.22 A new ML algorithm for speech, vision, and text

One model to rule them all!

Read→

15.02.22 Bayesian or bust!

How to do Bayesian statistics in Python with PyMC3

Read→

09.02.22 IBM Watson obituary

IBM Watson Health gets taken to the morgue.

Read→

07.02.22 Siuba: Dplyr style dataframes in Python

Pandas' group by operations are a pain to use. Siuba fixes that.

Read→

05.02.22 Healthcare AI is stuck in POC hell

Machine learning has been shown to be accurate in many areas of healthcare, but clinical usage of ML is still very low.

Read→

03.02.22 Ggplot2 style plots for Python Seaborn

Seaborn was already my favourite Python visualisation library - it just got better!

Read→

01.02.22 ML and NLP Research Highlights of 2021

2021 saw some significant advances in ML and NLP research.

Read→

26.01.22 Does anybody regularly use Julia for ML?

These people use Julia, and this is why.

Read→

24.01.22 ZenML: Open source MLOps framework

A promising framework for flexible, simple MLOps.

Read→

22.01.22 Post theory science?

Machine Learning is being used to drive scientific process. Is it changing how science works?

Read→

20.01.22 How does TikTok's algorithm work?

TikTok, more than any other company, is built on it's recommender. Here's how it works (sort of):

Read→

18.01.22 There's no such thing as raw data

The promise of deep learning was to eliminate feature engineering pipelines. That's probably a myth.

Read→

15.12.21 Stop using boxplots

Only statisticians understand box plots.

Read→

13.12.21 How not to sort by average rating

You are a web programmer. You have users. Your users rate stuff on your site. You want to put the highest-rated stuff at the top and lowest-rated at the bottom. You need some sort of “score” to sort by.

Read→

11.12.21 Will cloud providers just rent hardware?

Speculation on how cloud vendors might evolve.

Read→

09.12.21 Validation for Pandas dataframes

Pandas data frames are used widely by data scientists, Pandera adds the ability to validate them.

Read→

07.12.21 DeepMind teach an AI pure maths

The clever people at DeepMind have built a system to assist pure mathematicians.

Read→

06.10.21 How to use ML in your business

Using ML in business raises many considerations outside of the normal content of ML textbooks.

Read→

04.10.21 Generic AI gives generic results

Machine learning models tend not to generalise from one task to another. Buying access to generic AI solutions is unlikely to give good results.

Read→

02.10.21 Why? Causal learning

What happens when you increase prices? More sales or less?

Read→

30.09.21 Just talk!

A child learns to walk before they can run, and to talk before they can read. Maybe AI should do the same?

Read→

28.09.21 Start without machine learning

Don't head straight for a machine learning solution, first deploy something simple.

Read→

28.09.21 Start without machine learning

Don't head straight for a machine learning solution, first deploy something simple.

Read→

18.09.21 Now they are called foundation models

Stanford University have set up a new group to study what they are calling foundation models.

Read→

17.09.21 What kind of bird makes that sound?

If you have ever wondered how to identify a bird from its call, this ML powered tool is for you.

Read→

16.09.21 IBM Watson never really existed

A decade ago IBM built an AI system called Watson that won the quiz show Jeopardy. It was supposed to change the IT giant's fortunes. What happened?

Read→

15.09.21 Should I be using a data mesh?

Data Mesh is the latest new buzzword in data architecture - what is it?

Read→

14.09.21 New SQL package from the maker of FastAPI

A new package makes integrating SQL databases with Python code easy.

Read→

22.06.21 The 5 types of recommender system

Personalisation and recommendation are one of the most most effective applications of machine learning.

Read→

22.06.21 Easy data transformation with dbt

Data Build Tool (dbt) is a tool that allows easy data transformations.

Read→

22.06.21 Don't use deep learning for tabular data

Most business data is tabular (think Excel or SQL type data), and deep learning is generally not the best tool for modelling it.

Read→

22.06.21 Do algorithms dream of simulated sheep?

Dreams are a way to stop the brain overfitting, according to a new theory.

Read→

22.06.21 Open super big language model

A new, open, alternative to the super big language model GPT-3 has been released. It's got 6 billion parameters and is trained on The Pile using Jax. It wrote the following about itself

Read→

25.05.21 ARGH! AI is going to kill us all!

The dinosaurs were killed-off by an asteroid. We'll probably be killed-off by ourselves.

Read→

25.05.21 The really open NLP fightback

Large language models have become the property of big tech firms. Open science is starting to fight back.

Read→

25.05.21 How to learn MLOps

Coursera and Andrew Ng have launched a new course on machine learning engineering for production or MLOps.

Read→

25.05.21 Google's new, new AI platform - Vertex

Google has launched another new AI platform.

Read→

25.05.21 Tools to make things happen at the right time in the right order

Making things happen at the right time in the right order is much harder than it ought to be. Dagster is the latest attempt to make it easier.

Read→

11.05.21 The 28 billion dollar private AI companies

There are 28 privately held ML, AI and data companies, that are nearing IPO. Their total valuation is around $119B.

Read→

11.05.21 Keep it simple! Deploy a model on a single machine

Deploying ML models doesn't always need GPUs or Kubernetes clusters. Sometimes a simple, single CPU machine is plenty.

Read→

11.05.21 Should we stop hiring data scientists?

Over the last decade, companies have hired thousands of data scientists. These teams often fail to deliver.

Read→

11.05.21 Your brain is an internet

The brain has been compared to a plumbing system, a watch, a computer and now the internet.

Read→

11.05.21 What the hell is a feature store?

A feature store is an emerging concept for storing and retrieving data in Machine Learning applications.

Read→

27.04.21 Microsoft buys $20bn speech recognition firm

Microsoft has announced it will buy the speech recognition firm Nuance Communications for $20bn.

Read→

27.04.21 New EU AI laws

The European Union has published a proposal for a new set of laws regulating the use of artificial intelligence.

Read→

27.04.21 First, deploy your model

Should the first step in a machine learning project be to build a production system?

Read→

27.04.21 What's wrong with MLops

MLOps is the practice of building and maintaining production machine learning systems. It's new, and it's not all going well.

Read→

27.04.21 What is attention?

Why do some things grab our attention while we ignore others? What is going on in our brains? A recent article by AI/ML researcher Ekrem Aksoy attempts to describe the latest thinking in both neuroscience and machine learning.

Read→

13.04.21 Billion dollar computer vision startups

Computer vision is where it all began. In 2012 Geoff Hinton and his team built a neural network that massively out-perfomed all other approaches to recognising images. This breakthrough started the deep learning revolution.

Read→

13.04.21 Ray: Easier distributed computing

Ray is a framework that makes distributed computing using Python easy to set up and run.

Read→

13.04.21 Andrew Ng says: "Sort out your data!"

Machine learning is algorithms + data. A lot of focus goes on improving algorithms, not enough goes on improving data.

Read→

13.04.21 Easier text to speech with Hugging Face + Wav2Vec

Alexa, Siri and the rest have made speaking to computers a natural thing to do. The big tech firms have spent lots of money creating automatic speech recognition (ASR) models that convert speech to text.

Read→

13.04.21 Video - the next frontier of AI

We perceive the world more like video - a stream of audio and visual signals from a single point of view - than any other medium. Most internet traffic is video and a large proportion of time online is spent looking at video.

Read→

15.12.20 What is the protein folding problem?

Last week, Google offshoot Deepmind announced it had effectively solved the "protein folding problem".

Read→

15.12.20 Infinite Bad Guy

The best minds of my generation are working out how to make an infinite loop of Billie Eilish cover videos.

Read→

15.12.20 Data platforms

What's the difference between a data lake and a data warehouse? An analyst or a scientist?

Read→

15.12.20 Ethical implications of AI

It can be helpful to categorise AI-driven tasks according to their complexity and ethical implications.

Read→

15.12.20 Rapid antigen testing?

Recently the UK government tried testing everybody in Liverpool for Covid-19. Some of this testing was done using fast turnaround rapid antigen tests.

Read→

01.12.20 Neurips 2020

NeurIPS Thirty-fourth Annual Conference on Neural Information Processing Systems takes place next week. This year you can get a ticket (virtually).

Read→

01.12.20 Learning like people

Until recently, models learnt using labelled data and humans learnt using language. This is starting to change.

Read→

01.12.20 scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn

Almost all data has some kind of time-related element. Often this is ignored.

Read→

01.12.20 AI for enterprise platforms

There has been lots of recent activity amongst 'AI platform' companies planning IPOs: see Databricks, Data Robot, Domino, Palantir, and now C3.ai.

Read→

01.12.20 Vaccine science by press release

AstraZeneca kind of messed up its virus announcement.

Read→

17.11.20 Data discovery platforms

Platforms are being developed to aid data discovery.

Read→

17.11.20 Frantic man live codes a neural network library

Joel Grus live codes a neural network library in Python.

Read→

17.11.20 Post election post

Polls got the US election really wrong again.

Read→

17.11.20 How to hire people

Hiring people in startups is important but is often done badly.

Read→

17.11.20 Catboost for big data

For structured, heterogenous data, gradient boosting is the way to go.

Read→

25.10.20 Data infrastructure is maturing

As data becomes the key to businesses success, the infrastructure that supports it is maturing.

Read→

25.10.20 A new way to understand reinforcement learning

Reinforcement learning can be considered a type of supervised learning.

Read→

25.10.20 GPT-3 performance on a laptop

A team from Germany has bucked the recent trend for ever bigger NLP models by using a clever design.

Read→

25.10.20 AI system helps stop sepsis by working with humans

Building AI systems in healthcare is hard because it involves interacting with and changing complex human organisations.

Read→

25.10.20 Cambridge Analytica had no role in Brexit

Cambridge Analytica had no impact on Brexit and simply did bog standard online targeted advertising work, according to a report from the Informalities nation Commissioner.

Read→

11.10.20 What is it like to be a smartphone?

Can we say for sure a smartphone isn't conscious? What about a bacteria? A bat?

Read→

11.10.20 AI investor reviews 2020

It's been a funny old year so far, and apparently it's already time to start doing roundups. Last week AI investor Matt Turk produced his review of AI and data in 2020.

Read→

11.10.20 Tidy modelling book published

A new book on modelling the tidy way in R has been published.

Read→

11.10.20 AI in healthcare lags behind

Very little progress has been made in modelling electronic healthcare data in recent years.

Read→

11.10.20 Keep your brain on its toes

A Google employee was bored with how great his life was (poor thing!) so he stared leaving lot of decisions to chance. His life suddenly got even better. Woo hoo!

Read→

27.09.20 Machine learning engineering book

A new book has been published about building machine learning systems.

Read→

27.09.20 Making language models more efficient

In the last few years, transformer based language models like GPT-3, Bert and others have revolutionised natural language processing. Here is how to make them more efficient.

Read→

27.09.20 How much computation does a brain need?

How much computation does a brain use? According to a new estimate its about 1014 - 1017 FLOP/s.

Read→

27.09.20 DuckDB - a lightweight analytics database

DuckDB is a new lightweight database, designed to support data science.

Read→

27.09.20 Neil Ferguson on modelling Covid-19

Prof Neil Ferguson ('Professor Lockdown') gives an overview of his life in science.

Read→

13.09.20 Why it's hard to scale AI

Businesses that use AI in their products are harder to scale, argues a recent article from US based VC Andreessen Horowitz.

Read→

13.09.20 Embodied AI at Facebook

Embodied AI involves making software based agents that interact with and operate in the physical world.

Read→

13.09.20 Types of data visualisation

Engaging and popular visualisations are not always the easiest to read. Andrew Gelman always has useful things to say. In a Wired article he talks about different types of data visualisation. The best ones often drawn in the viewer and make them engage in the scientific process as if discovering something for themselves.

Read→

13.09.20 Covid corner: A levels algorithm fiasco

The recent A level results fiasco gives data science a bad name. It's a really useful illustration of how important it is to consider how algorithms can propagate existing inequalities.

Read→

13.09.20 How to do speech recognition with Python

Easily convert speech to text in Python with the SpeechRecognition library.

Read→

16.08.20 The field of natural language processing is chasing the wrong goal

A recent article by NLP researcher Jesse Dunietz argues that the rapidly evolving field of Natural Language Processing is getting better and better at solving benchmark problems while not actually building much that is useful.

Read→

16.08.20 Git for data: not a silver bullet?

Recently a lot of effort has gone into improving the quality of 'production' machine learning system. One approach that has gained a lot of momentum and that I am a fan of is Data Version Control- a kind of git for data.

Read→

16.08.20 Police built an AI to predict violent crime. It was seriously flawed

Police forces around the world keep trying to use machine learning to 'predict' crime. It rarely ends well.

Read→

16.08.20 The U.S. has artificial intelligence competition all wrong

AI competition is about data, algorithms and compute power. This article argues that compute power is the most important of those.

Read→

16.08.20 Covid Corner: UK infection data APIs

Public Health England have released a new API (available for Python, JavaScript, R, .Net and Elixir) that allows direct access to Coronavirus data. Good news. This should make it easy to play with, interrogate and monitor data.

Read→

02.08.20 Shortcuts: How Neural Networks Love to Cheat

If you’ve ever built a machine learning model that failed to work in production, it might have been taking a shortcut.

Read→

02.08.20 How to build RESTful APIs

A REST API is a common method for interacting with a computer program over the internet. REST (short for representational state transfer) is a set of guidelines and principles for creating Web services.

Read→

02.08.20 Continuous Machine Learning

CML (short for 'Continuous Machine Learning') is a new piece of software from the makers of DVC (Data Version Control) that makes it easier to use continuous integration approaches in machine learning projects. Continuous integration is an approach to building software where all members of a team integrate their work regularly. Integrations are verified by automated builds including tests so that errors get picked up quickly. This approach can make it much easier for teams to work together on larger projects.

Read→

02.08.20 Neural Nets with mixed data types

pytorch-widedeep is a python package from London based machine learning specialist Javier Rodriguez Zaurin. It can be used for building neural networks that use structured (tabular) data combined with images and text. It is based on Google’s 'Wide and Deep' algorithm and uses the PyTorch framework. The algorithm was first used to power the recommender system of Google's Android Play store.

Read→

02.08.20 Regulating Technology

Because of the growing importance of tech in our lives, and some big failures, there are growing calls to increase government oversight and regulation. According to this article, it's not going to be easy to do well.

Read→

10.03.20 Cybersecurity data science: an overview from machine learning perspective

Useful survey of cybersecurity in data science. There is an increasing trend for using big datasets to automate and augment aspects of online security.

Read→

10.03.20 Covid Corner: Test results

Paper from the BMJ with an interactive infographic on Covid-19 testing. Useful for understanding diagnosis, and classification in general. Interpreting test results is something doctors, patients and the rest all struggle with.

Read→

10.03.20 Facial Recognition and Policing

A man is wrongly accused of a crime by a facial recognition algorithm. The man is black, but the algorithms were largely trained on white faces. Where did things go wrong?

Read→

10.03.20 Innovation: Why Does DARPA Work?

An in depth look at how and why the USA Military research organisation has been so successful at supporting innovation. DARPA is an outlier organization in the world of turning science fiction into reality. Since 1958, it has been a driving force in the creation of weather satellites, GPS, personal computers, modern robotics, the Internet, autonomous cars, and voice interfaces, to name a few. ... Which emulatable attributes contributed to DARPA’s outlier results? What does a domain-independent “ARPA Model” look like? Is it possible to build other organizations that generate equally huge results in other domains by riffing on that model?

Read→

10.03.20 How to teach yourself AI

The brilliant Professor Andrew Ng teaching his Stanford class on how to teach themselves AI. Really useful meta-guide on an often neglected topic - useful teaching yourself anything.

Read→

19.11.19 4 Things I Learnt at ODSC Europe 2019 (you may disagree with 4)

ODSC logo I went to the Open Data Science Conference in London last week. I really enjoyed it. There was lots of really practical, technical content and not too much bullshit/sales and lots of up to date data science.

Read→

31.10.19 My attempt at standup comedy

This week I performed a standup comedy show, this is the story of how I got there...

Read→

24.09.19 CRAP talk on machine learning

On Tuesday I spoke at CRAP Talks MCR. CRAP is a snappily titled meetup about 'Conversion Rate, Analytics and Product'. There was a range of people in attendance mostly working either in e-commerce or digital agencies. I'd been once before and I liked the range of topics and mix of people there: a bit off my usual machine learning and data science beat.

Read→

03.09.19 Machine Learning 101 Talk

I gave this talk at the Barclays AI Eagles Lab event on August 30th. The aim was to give a gentle introduction to some ideas of machine learning, using some GCSE (an English age 15 school qualification) maths. Its aimed at people without much ML background and is deliberately fairly simple. I started with a straight line as an example of a very simple mathematical model and built up some ideas about learning from data, non-linearity, and learning functions. I gave some advice on when to use and not use ML and some examples where I have seen things work. It seemed to go down fairly well at the event.

Read→

12.07.19 How to do feature engineering in R with the recipes package

I was excited to start using Max Khun (creator of Caret's) new set of 'tidymodels' packages - rsample, recipe, yardstick, parsnip and dials. These are still under development but seem promising. The one I have so far found most useful is recipe. Here I'll give a quick overview of how you use it to do some simple data preparation for machine learning.

Read→

29.03.19 What is Data Science?

Data Science is a young field and it means different thing to different people. Here I'll give my take on what it is and isn't.

Read→

22.03.19 Data Summit, Edinburgh, 2019

I have just been to the Data Summit in Edinburgh, I hadn't been before. It was a good event - a single track of fairly high quality speakers on topics around data, tech and AI organised by the DataLab. Here are some of notes on a few of the speakers that stood out to me.

Read→

15.03.19 Welcome to Ortom

I've set up Ortom with the goal of providing high quality data science services to any organisation. At the moment it's just me. I've been doing data science for the last 10 years in one way or another, most recently as head of data science at Peak.

Read→