The Pipeline Pilot Advanced Modeling component collection provides methods for Recursive Partitioning (RP) and Multi-Objective Pareto Optimization as well as Genetic Function Approximation (GFA) QSAR analysis.
The Recursive Partitioning components provide a variety of RP methods including single tree and forest of trees learners. The methods can learn on single or multiple response variables.
The Pareto Optimization components provide methods for multi-objective optimization problems and provide solutions whose criteria trade off amongst two or more partially conflicting goals.
Genetic Function Approximation (GFA) applies a sophisticated genetic algorithm method to calculate QSARs. These identify critical relationships between properties and the characteristics in a set of molecules.
With the Recursive Partitioning Components you can:
Perform very rapid learning and data mining experiments on very large data sets with very large numbers of descriptors
Learn molecular data sets using fingerprints as descriptors
Visualize trees to understand the relationships between descriptors and responses
Analyze descriptor usage to identify the most discriminating descriptors
Rapidly apply models to predict new data sets
With the Pareto Optimization Components you can:
Optimize solutions for problems as diverse as combinatorial library design, formulation ingredient optimization or stock portfolio risk management
Find individual samples with a data set that have the best trade-off of desired property values
Find subsets of samples from a larger data set that collectively have the best trade-offs between desired property values
With GFA Analysis you can:
Return multiple models rather than a single "best" model by creating a great number of trial models
Increase the likelihood of encountering spurious correlations specific to the training data that make a model appear better than it is in reality
Leverage this technique to generate hypotheses rather than test hypotheses
Image Gallery
EnlargePareto front progress for simple subset optimization
EnlargeVisualize trees to understand the relationships between descriptors and responses