Databricks removes the siloed style of ML pipeline through a unified platform

Databricks brings a simplified approach for managing ML models from experimentation to production extended to CI/CD

Request Demo

Implement a simple ML Pipeline with Apache Spark MLLib / Databricks

An ML pipeline’s efficiency is best when it’s in a unified environment

01
Intro to MLLib

Built on top of Spark, Databricks MLlib provides common machine learning algorithms and evaluation metrics (APIs). Spark MLLib is integrated in Databricks runtime and this library is programmable in Java, Scala, and Python languages.

02
Pipeline

A Pipeline is a sequence of stages , with a Transformer or an Estimator present at each stage. These stages are run as sequence for transforming the input DataFrame

03
DataFrame

This ML API fetches DataFrame from Spark SQL as an ML dataset, which may contain different types of values like text, feature vectors, labels, and predictions.

04
Transformer

A Transformer is defined as an algorithm that transforms a features-based DataFrame into a predictions-based DataFrame.

05
Estimator

An Estimator is a learning algorithm that trains on a DataFrame to produce a ML model.

ML Monitoring metrics for Databricks-MLLib

List of Metrics for different Classification and Regression Models

01
Binary Classification Metrics

Precision (Positive Predictive Value), Recall (True Positive Rate), F-measure, Receiver Operating Characteristic (ROC), Area Under ROC Curve, Area Under Precision-Recall Curve.

02
Multi-class Classification Metrics

Confusion Matrix, Accuracy, Weighted precision, Weighted recall, Weighted F-measure

03
Multi-label Classification Metrics

Precision, Recall, Accuracy, Precision by label, Recall by label, F1-measure by label, Hamming Loss, Subset Accuracy, F1 Measure, Micro Precision, Micro recall, Micro F1 measure.

04
Ranking Systems Metrics

Precision at k, Mean Average Precision, Normalized Discounted Cumulative Gain.

05
Regression Model Metrics

Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Coefficient of Determination, Explained Variance.

The widespread adoption of Spark in ML based applications has made Databricks a preferred environment for ML monitoring

Qualdo™ ensures to give as many analysis and visualizations as possible formonitoring ML performance

Please feel free to schedule a demo for understanding how Qualdo does ML model monitoring rapidly

Data Quality Edition
Free-trial
available
  • Data Quality Metrics
  • Data Profiling
  • Data Anomalies
  • Data Drifts
  • All KQIs
  • Quality Gates
  • Advanced Visualizations
  • APIs
Request a Demo
Model Monitoring Edition
Free-trial
available
  • Bulk Add Models to Qualdo
  • Data Drifts
  • Feature & Response Decays
  • Data Quality Metrics
  • Data Anomalies
  • Model Failure Metrics
  • Alerts & Notifications
  • Advanced Visualizations
  • APIs
Start Now
Enterprise Edition
Email Us
 
  • Installation in your Infrastructure
  • All Data Quality Metrics
  • All ML Monitoring Metrics
  • Custom DB Integrations
  • Custom ML Integrations
  • Custom Notifications
  • Custom Visualizations
  • APIs
Request a Demo

Qualdo helps you to monitor mission-critical ML & data issues, errors, and quality in your favorite modern database management tools.