Machine Learning & AI Libraries; Getting Started / Absurd

Introduction

‍

Machine Learning (ML) is a revolutionary force, redefining the boundaries of technology and human interaction. At its core, ML harnesses vast amounts of data, enabling machines to learn, predict, and evolve in ways we once deemed science fiction. However navigating ML waters, especially for newcomers, is daunting. With a plethora of models, libraries, and terminologies, it's unclear where to start to get the concept.

Machine Learning is used in a wide variety of applications, such as:

Spam filtering
Fraud detection
Recommendation systems
Self-driving cars
Medical diagnosis
Chatbots

In our latest blog, we aim to demystify ML and showcase its applications. We provide a comprehensive overview of popular ML libraries, comparing their strengths, weaknesses, and best use cases.

Machine Learning Models

There are a number of models that make up Machine Learning, understanding the best uses of them will help you find the right model for your project.

Supervised Learning - We train the model with data and answers for it to learn the patterns. For example, teaching an email program to detect ‘spam’ automatically.
Unsupervised Learning - We submit data and ask the model to organise it. For example, organising photos based on the people within them without telling the model who is in them.
Semi-supervised Learning - Where there’s a lot of data, but only some of it is labelled. For example, YouTube might have a few videos that are labelled ‘music video’ or ‘tutorial’ and use these to help organise the larger set of videos.
Reinforcement Learning - The model tries different things and we indicate when it’s correct or incorrect. For example, training a computer to play a human at video games.
Deep Learning - This mimics how our brain works, in a very simplified way. It’s great for things like images, speech, and text. Voice assistants like Siri or Alexa use Deep Learning models to understand our speech.
Ensemble Learning - The model creates a number of calculated outputs and takes the average result.
Time Series Forecasting - Used for data that is measured over time to predict what comes next. For example, Time Series Forecasting could be used to predict tomorrow weather based on the last 10 years of weather.
Anomaly Detections - Used to spot unusual patterns that do not conform to expected behaviour, like fraud detection that picks up on irregular spending patterns.
Natural Language Processing - This helps computers understand and interpret human language in a way that’s valuable. Often used in Chatbots for customer services.
Recommendation Systems - Takes previous behaviours to suggest recommendations, such as recommended products in your Amazon shopping cart or homepage personalisation based on previous browsing history.

These models can either be used in isolation or combined to create powerful products and services. For example, Voice Assistants won’t just use Deep Learning, they will combine Deep Learning with Natural Language Processing, Recommendation Systems and even Anomaly Detection.

‍

Machine Learning Libraries

There are lots of models and frameworks that can be adopted to take advantage of the different types of Machine Learning available, all with their own sets of Pros and Cons.

TensorFlow

TensorFlow is a library developed by Google. It is great for large-scale, production-ready machine learning applications but it is also extremely complex and has quite verbose syntax so can be quite overwhelming for smaller projects.

For example a factory could use TensorFlow to analyse sensor data from machinery to predict when a machine will fail and manage maintenance schedules.

Airbnb uses TensorFlow for creating personalised recommendations to users based on their previous search history and bookings.

Pros

The codebase is fairly versatile and extremely comprehensive making it suitable for multiple applications.
There’s great documentation to build and learn with, but there’s also a really strong community for helping solve problems.
Scalability - obviously TensorFlow can be deployed at scale if Airbnb are using it.

Cons

Learning curve - it’s not the most straightforward model to deploy with lots of dependencies to manage.
Verbose Syntax - creating the neural network model, defining layers, setting up data, and training the model can involve a relatively large amount of detailed code in comparison to the other libraries.

PyTorch

PyTorch is an open source library developed by Facebook’s AI Research lab. Researchers tend to opt for PyTorch due to the fact that it’s fairly simple to debug and can be tweaked on the fly - if this makes it good for models that are rapidly evolving.

Naturally, as the creators fo PyTorch, Facebook heavily relies on PyTorch for personalising your news feed, auto tagging images and detecting hate speech or harmful content in posts.

Pros

An intuitive codebase and syntax makes it easy to deploy and tweak.
Python uses dynamic computation graphs, which means that the graph is built on the fly as operations are executed. This makes debugging and tweaking on-the-fly easier which makes it perfect for R&D,

Cons

Historically it has been difficult to deploy in comparison to models like TensorFlow. This is changing, but it has still got a way to go.
The memory consumption can be high because of the dynamic models it contains - this means resource usage could end up being more expensive than other models.
The community and ecosystem isn’t as large as some of the other libraries, such as TensorFlow.

Scikit-learn

Scikit-learn is a lightweight library but probably the most traditional. It is very user-friendly and has loads of documentation making it a great choice for beginners, but when we say lightweight we mean it’s not designed for deep learning or large datasets.

Scikit-learn has traditionally been used behind a lot of email providers’ spam detection. Spotify also use it for its music recommendation algorithms.

Pros

Easy to use .
Excellent documentation.

Cons

Limited Deep Learning capabilities.
Not suited to large datasets.

Keras

Keras is a high-level neural network API that is excellent for beginners due to its simplicity and ease of use. It is designed to enable fast experimentation with deep neural networks, but offers less control and slightly lower performance compared to other libraries or APIs.

Netflix has one of the most famous deployments of Keras; it uses the library in its’ recommendation algorithms.

Pros‍

Extremely flexible back end and API makes it easy to integrate with applications.
Great for beginners or simple requirements.

Cons

The model offers limited customisation which makes it less suited to more advanced applications.
Performance isn’t great.

ML.NET

ML.NET is a machine learning framework developed by Microsoft for .NET applications. It’s very versatile and can suit a number of different learning models deployed directly into .NET applications.

We recently used ML.NET in an application for predicting break-out social media trends; we’ve found ML.NET to be robust in its’ output but also flexible enough to change training models on-the-fly.

Alibaba is one of the most high-profile adopters of ML.NET; it’s used in their recommendations feature.

Pros

Native .NET integration means the library can be deployed directly into applications.
It’s built to scale (efficiently).
Supports Classic ML models as well as more robust Deep Learning.

Cons

The communities and resources are more limited compared to the Python libraries.
If you’re not a .NET developer, this is going to be less intuitive.

IBM Watson

Probably fairer to call Watson a suite than a library; it’s probably one of the most famous Artificial Intelligence services and brings together a number of applications and tools to help organisations make predictions, automate complex processes, optimise organisations and analyse large amounts of unstructured data.

IBM Watson was the first service we used for creating a virtual customer service agent for a global technology client.

The most advanced implementation we know of is Woodside Energy, an Australian oil and gas company, that uses Watson to provide answers to technical questions based on decades of operational data.

Pros

IBM Watson offers a wide range of pre-built AI services, including natural language processing, visual recognition and data analytics.
A tight integration with IBM cloud allows for seamless scaling.
Strong security features make Watson a popular choice for industries with stringent data protection requirements.

Cons

Watson can become expensive at scale.
It can get complex quickly.

Azure Cognitive Services

Azure Cognitive Services is a set of APIs, SDKs, and services that make it simpler for developers to add AI such as vision, speech and language analysis to their applications.

We traditionally used Azure’s cognitive services for more advanced translations when launching clients into new regions - it offers a much better level of translation than everyday translation engines, such as Google, and whilst not as good as native translation it’s good enough to prove a business case for deploying more resouces.

Azure can be used for lots of application types; Uber uses Azure Cognitive Services to verify the identify of its drivers through facial recognition.

Pros

Azure offers a wide range of pre-built services that can be deployed simply.
The services are designed to easily integrate into applications via simple API calls.
It’s built to scale and integrates well with other Azure services.

Cons

It can get expensive based on the usage; as you scale and the number of API calls increases so will the cost.
Whilst Azure can be a convenient way of getting something setup, the pre-built models cannot be tailored to your needs as much as a custom model can.

Machine Learning; Where to Start

For complete beginners, Scikit-learn is an excellent starting point. It’s extensive documentation is very supporting and API is very beginner friendly.

Start by getting comfortable with some of the basic concepts like regression, classification, and clustering. Once you’re comfortable with these concepts, we’d suggest looking at Kera to further enhance your models; again, this is a relatively simple place to start and has an easy to use API.

Conclusion

This isn’t an exhaustive list of libraries but a good place to start; choosing a library largely depends on your needs and goals. Start simple with Scikit-Learn and Keras or build scalable, production-ready solutions with TensorFlow and .NET - although, if you find Python more intuitive, PyTorch might be the library for you.

The best way to learn is by doing, analysing and tweaking, and going again.

Get in touch if you'd like support with AI models and their implementation.

‍

// Glossary

ML Library: An ML (Machine Learning) library is a collection of pre-written code, tools, and functions that facilitate the development, training, and deployment of machine learning algorithms. These libraries are designed to help developers and data scientists streamline the process of integrating ML capabilities into their applications without having to write algorithms from scratch, e.g. TensorFlow, PyTorch, and Scikit-learn.

ML Model: An ML model refers to the output of a machine learning algorithm trained on data. It represents what the algorithm has learned from the training process. Once an ML model is trained on a specific set of data, it can start making predictions or decisions without being explicitly programmed to perform the task. These models can vary in complexity from simple linear regressions to sophisticated neural networks.

‍

Reach out to Absurd

Our expert teams are eager to collaborate, design, and develop solutions tailored to your unique challenges. Let's create a strategy that drives results and growth.

Talk to us