An ML pipeline’s efficiency is best when it’s in a unified environment
Built on top of Spark, Databricks MLlib provides common machine learning algorithms and evaluation metrics (APIs). Spark MLLib is integrated in Databricks runtime and this library is programmable in Java, Scala, and Python languages.
A Pipeline is a sequence of stages , with a Transformer or an Estimator present at each stage. These stages are run as sequence for transforming the input DataFrame
This ML API fetches DataFrame from Spark SQL as an ML dataset, which may contain different types of values like text, feature vectors, labels, and predictions.
A Transformer is defined as an algorithm that transforms a features-based DataFrame into a predictions-based DataFrame.
An Estimator is a learning algorithm that trains on a DataFrame to produce a ML model.
List of Metrics for different Classification and Regression Models
Precision (Positive Predictive Value), Recall (True Positive Rate), F-measure, Receiver Operating Characteristic (ROC), Area Under ROC Curve, Area Under Precision-Recall Curve.
Confusion Matrix, Accuracy, Weighted precision, Weighted recall, Weighted F-measure
Precision, Recall, Accuracy, Precision by label, Recall by label, F1-measure by label, Hamming Loss, Subset Accuracy, F1 Measure, Micro Precision, Micro recall, Micro F1 measure.
Precision at k, Mean Average Precision, Normalized Discounted Cumulative Gain.
Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Coefficient of Determination, Explained Variance.
Please feel free to schedule a demo for understanding how Qualdo does ML model monitoring rapidly