The exponential increase in computer processing power and storage capabilities has sparked a significant interest in artificial intelligence and opened up many related fields. The most prominent of such field in recent years has been machine learning, due to its application across various domains. However, Machine learning is known to most as a field of experts, undertaken by talented developers, statisticians, mathematicians, data scientists, etc. creating a barrier for non-experts who could benefit from basic machine learning solutions. This has led to the development of machine-learning platforms, allowing users with little to no data science experience to build predictive models, apply data-centric solutions, and identify useful insights from data. At SovTech, we embrace clients who are planning to add machine learning platforms to their software portfolio. We provide integration services to major platforms, allowing our customers to import their data with ease and integrate their databases with their platform of choice.
1. Processes on Machine-learning Platforms
Machine learning platforms are built differently from one another, we have yet to see a “one size fits all” solution. For instance, BigML is a simple platform supported by popular machine-learning libraries in order to provide more openness to users; Amazon and Microsoft have availed a combination of their internal machine learning tools to form their platforms; and a few others such as Innovance and Quantopian have tailored their platforms for specific clienteles, retail foreign exchange investors for the first and Quantitative analyst for the second.
Regardless of the approach to market, there are similarities in their processes, especially in terms of tasks performed by users. The following tasks are the most prominent on machine-learning platforms.
This process consists of collecting and manipulating data. It is crucial for the success of any machine-learning project. The quality of the input data has a considerable effect on the effectiveness of a machine-learning model. It is important to have a clean dataset, ready for modelling. The main activities that take place during this process are:
a. Data Collection
There are multiple ways and channels through which one can acquire data. All depend on the problem, e.g. if a salesman wishes to improve customer retention rates, looking at data from our Sales and CRM Cloud would be appropriate; or if a web developer wishes to analyse web traffic and find browsing patterns, it is recommended that the developer access the web log files of the website.
Machine-learning platforms facilitate data collection from relevant sources when data is presented in a suitable format. Otherwise the user will have to convert the format of the dataset into a format capable of being processed by the platform. Some platforms have embedded conversion features while others do not. In order to promote their ease of use, these platforms provide users with datasets and problems that they could tackle for a test run.
b. Data Processing
In most cases the acquired data is in a raw format and unsuitable for modelling. It needs to be analysed and prepared for modelling. This involves addressing missing data, outliers, and performing data manipulation tasks. This stage is a vital step in the conversion of raw data into high quality material for modelling. In most cases, it is also the most laborious and time-consuming step. Data scientists spend up to 80 percent of their time in acquiring and preparing data. The saying “garbage in, garbage out” illustrates the importance of this part of the process. The quality of the data and its preparation strongly impact the success of a model. Multiple data manipulation techniques are available. For instance, factor analysis, correlation analysis, and principal component analysis are useful techniques for identifying and demonstrating the relationship between variables. Feature manipulation should be conducted in order to analyse features and designate which of them are relevant to the model for the intended solution.
Machine-learning platforms provide embedded features for data acquisition and preparation. Before commencing the modelling process, the user is asked to perform data preparation tasks such as selecting predictive features, dealing with missing values, or transforming data. Some platforms allow users to insert programming scripts to implement data manipulation techniques, especially while dealing with outliers.
This is the process by which predictive models are developed. Modelling is an iterative process. Users conduct experiments with different models in order to find the most accurate. They attempt to determine the right approach and identify the right algorithm to be used for prediction, based on the desired outcome and the data they gathered. For instance, if the outcome is binary such as predicting whether or not it will rain the following day, machine-learning algorithms such as decision trees, logistic regression, and neural networks could be applied.
Models need to be tested on new data In order to ensure their accuracy. For this reason the dataset is usually divided into two sections, one for training purposes and the other for testing purposes.
The need to improve a model, compels users to review their data manipulation process, change algorithms or change parameters of the algorithm. Users are not required to code algorithms. Some platforms permit users to choose from a set of algorithms and others analyse the dataset and automatically apply their own proprietary algorithms. Platforms such as Azure machine learning and BigML allow users to modify algorithms by inserting programming scripts.
This consists of deploying the final iteration of the model into production, where it can be used to score transactions or to drive the decision-making process. Models are deployed in various ways depending on the problem at hand.
This consists of integrating the data transformation and predictive algorithm into an existing system, e.g. by integrating a predictive algorithm into a credit score system in order to identify clients most likely to pay off their mortgages on time.
b. Application Programming Interfaces
The predictive algorithm could be deployed as an application program interface (API) that can be attached to other existing systems, e.g. a machine-learning model extract users’ web browsing patterns and social graph data through Facebook’s “Like” button, could be deployed as an API, then attached to an existing recommendation system to yield better results.
c. Web Service
Models can also be deployed as a web service that can be invoked from any application and accessed on any device. This mostly occurs when the model operates in a cloud environment. One example would be a web service that constantly scans the forex market in order to identify specific trading patterns and send signals through the broker’s servers into a retail trader’s account.
Machine-learning platforms have simplified the process of deploying a model in just a few clicks. In most cases, a model built on one platform cannot be imported onto a different platform, e.g. if a user builds a model on Watson Analytics, the user will not be able to import the model to be deployed on the Microsoft Azure platform. There are some exceptions to this rule: if a model can be extracted as a programming script, it can be exported to a platform that reads such script.
The process of building a model should not end at deployment, there is a need to continuously monitor and improve the quality of the model because change is the only constant. For instance, in the forex market, price movements change overtime based on fundamentals, as a result, the behaviour of the market during the 2008 recession is different from current behaviours. New patterns emerge, and those on which the model had previously been built on, could become less relevant. Hence, the need to constantly update predictive models.
Machine-learning platforms provide features that help users understand the performance of their models and modify them when necessary. Currently this option is only offered models operating independently in a cloud environment. Users who have implemented their models off-cloud need to use third-party applications in order to monitor the performance of their models.
2. Practical information
Although the above activities are performed on machine-learning platforms, the manner in which they are performed can vary. Azure machine learning has a drag-and-drop approach to building models, while Watson analytics and BigML have a descriptive phase approach to their processes, and platforms serving specific purposes, such as Quantopian and TRAIDE for the trading community, have tailored the user experience to the characteristics of the predefined problems. For instance, TRAIDE , which allows traders to build machine-learning strategies based on technical indicators, does not provide a list of algorithms from which the users can choose, instead, these are stored in a black box and the user only selects forex parameters such as currencies, timeframes and indicators.
Users can be restricted from accessing a set of features in the platform, depending on the type of account to which they have subscribed, e.g. on Watson analytics, obtaining twitter’s hashtag data is limited to professional accounts only.
Some platforms have divided their machine-learning operations in two categories, for instance IBM Watson have a machine-learning platform for advanced developers called IBM BlueMix and another platform for ordinary users called Watson Analytics. Others such as Azure Machine Learning and BigML have platforms embedded with features for both advanced developers and ordinary users.
3. Where do we go next?
Bearing in mind that machine-learning platforms are in their early stages, the marketing hype around their supposed ease of use could be exaggerated. However this trend has similarities with website builder platforms. Just a few years ago, programming and web design skills were needed in order to build a website. Currently, an individual with basic computer skills can build an appealing and fully functional website by using a website builder platform such as Square space.
This article anticipates a similar shift in the field of machine learning, where data scientists, advanced developers and artificial intelligence specialists will no longer be the only agents in the field, instead, it will be open to anyone interested in discovering insights and relevant patterns in datasets.