Automated Machine Learning (AutoML) has become a trending topic in the industry due to the implementation of numerous algorithms by groups dedicated to academic research on artificial intelligence (AI) in recent years. AutoML is a tool for streamlining the development of AI solutions in the industry by providing explainable and reproducible results in an automatic and simple way.
Contributed by Luis Galo, AI Manager at Lantek
The development of traditional Machine Learning (ML) models takes up a lot of resources and requires a great deal of knowledge and time to generate and compare dozens of models. Thanks to automated machine learning, the time needed to obtain production-ready learning models with great efficiency and ease is reduced.
The standard process flow in data science (pipeline) consists of data pre-processing, the extraction of parameters that represent the business to be modeled, and the optimization of the algorithms’ hyperparameters, and is something which has to be done manually by data science experts. In comparison, adopting AutoML would allow for a simpler development process whereby a few lines of code could generate the necessary code to begin developing a machine learning model.
For data science teams that work with Python, like Lantek, there are a large number of freely available open source libraries like Lale by IBM, offering a semi-automatic library that integrates seamlessly with the pipelines of scikit-learn, auto-sklearn or others like TPOT or nni by Microsoft (all of them opensource).
You could think of AutoML as a brute force modeling concept, with specialized search algorithms to find the optimal solutions for each piece of the data science workflow. AutoML promises a future where democratized machine learning is a reality.
Put like this, AutoML sounds like a panacea in the application of ML that an organization could use to replace data scientists, but in reality using it requires intelligent strategies adapted to the different processes that are carried out in the production of sheet metal parts. So, how can companies use AutoML to optimize their use of time and get value from their models sooner?
The optimal workflow for including AutoML, also for sheetmetal, consists in parallelizing workloads in different specific algorithms and shortening the time spent on heavy-usage tasks. Instead of spending days adjusting the hyperparameters and selecting the most suitable parameters for the desired objective, a data scientist could automate this process on several types of models simultaneously and subsequently test which one is the most suitable.
That’s why adapted algorithms are added to the AutoML where the related parameters are enhanced, in the case of sheetmetal, with scrap generation, cutting time, delivery time, cutting parameters, cutting technologies, material price and other engineering parameters that provide specific knowledge of the sheet metal cutting world in ML algorithms
Thus, a data analyst without experience in the sector could take advantage of this adapted AutoML tool to train a predictive model using the data that they can extract on their own from the datalake by running a query. With AutoML, a data analyst can reprocess data, create a machine learning pipeline, and produce a fully trained model that can be used to validate their own hypotheses without having to consult with a full data science team and experts from the sector. All the knowledge required about sheet metal cutting would be in the algorithms adapted by Lantek.
That’s the theory, but what is the reality of applying AutoML in a real Lantek case? In our case, AutoML provides what we call the baseline from which we can start developing new models. The capabilities of today’s AutoML automate only a small part of the ML engineer and data scientist’s workloads. The data science team can shorten the development time, its predictive capabilities starting with a set of prototypes for learning models, simplifying automated feature engineering, with the automated optimization of hyperparameters and machine learning model selection.
The team of data scientists and data analysts can evaluate hypotheses, and certify the validity of models more quickly, but it would be a lie to say that this automates the whole process, for the time being, it just speeds up the model generation process and facilitates standardization in the development of models for sheetmetal, while reducing the learning curve of the data scientist in the world of sheet metal cutting.