Data Modeling Component Collection for
Pipeline Pilot
The Data Modeling Component Collection offers a comprehensive set of learning and data modeling capabilities, statistical filters, and clustering components optimized for large real-world data sets. This collection of components extends Pipeline Pilot's standard capabilities to include statistics and predictive modeling for data mining applications.
Analyze and model your data using methods such as:
Fast data clustering
Categorical learning using Bayesian statistics
Principal component analysis (PCA)
Linear regression, partial least squares (PLS) regression, and k-nearest neighbor (kNN) regression
ROC plots, enrichment plots, and other visualization techniques for evaluating model quality
The model-building components provide methods such as cross-validation to ensure the quality of the models built. They also provide model applicability domain (MAD) methods to assess the quality of each prediction as the model is subsequently applied. This is particularly important as models are increasingly deployed to end-users who may not be familiar with the training data or limitations of a particular model. Training data can be saved with any model, allowing the model to be extended as more experimental data becomes available.
When combined with the separately available Chemistry Collection, you can perform:
Enlarge Use the Data Modeling components to create predictive models in order to understand the important descriptors in your characterized data or to mine your uncharacterized data for promising leads.
Enlarge The Data Modeling Component Collection includes numerous viewers tailored for interpreting model results, such as the enrichment plot shown here.