Data Functions Reference
Complete reference for FormulaML data handling functions including datasets, data manipulation, and exploration.
Complete reference for all FormulaML functions organized by namespace. Each function includes syntax, parameters, return values, and practical examples.
Functions for loading, exploring, and manipulating data.
ML.DATASETS.*
- Built-in datasetsML.DATA.*
- Data manipulation and explorationFunctions for predicting continuous values.
ML.REGRESSION.*
- Linear, Ridge, Lasso, Elastic Net, Random ForestFunctions for categorizing data.
ML.CLASSIFICATION.*
- Logistic, SVM, Random ForestFunctions for finding groups in data.
ML.CLUSTERING.*
- K-Means clusteringFunctions for preparing data.
ML.PREPROCESSING.*
- Scaling, encoding, train-test splitML.IMPUTE.*
- Handling missing valuesFunctions for assessing model performance.
ML.EVAL.*
- Scoring, cross-validation, grid searchEssential functions for model training and prediction.
ML.FIT
- Train modelsML.PREDICT
- Make predictionsML.TRANSFORM
- Transform dataML.PIPELINE
- Create workflowsPremium and specialized functions.
ML.DIM_REDUCTION.*
- PCA, Kernel PCAML.FEATURE_SELECTION.*
- Feature selectionML.COMPOSE.*
- Column transformersML.INSPECT.*
- Model inspectionAll FormulaML functions follow a consistent naming pattern:
ML.[NAMESPACE].[FUNCTION_NAME](parameters)
Examples:
ML.DATASETS.IRIS()
- Load Iris datasetML.REGRESSION.LINEAR()
- Create linear regression modelML.EVAL.SCORE()
- Evaluate model performanceMost core functionality is available in the free tier:
Advanced capabilities require premium subscription:
ML.EVAL.CV_SCORE
)ML.EVAL.GRID_SEARCH
)ML.DIM_REDUCTION.KERNEL_PCA
)ML.DATASETS.OPENML
)Premium functions are marked with a ⭐ icon in the documentation.
Many functions return or accept “object handles” - references to complex data structures:
Cell A1: =ML.DATASETS.IRIS() → Returns: <Dataset>
Cell A2: =ML.REGRESSION.LINEAR() → Returns: <LinearRegression>
Cell A3: =ML.FIT(A2, features, target) → Returns: <LinearRegression> (with 🧠 brain icon)
These handles allow Excel to manage complex ML objects efficiently.
random_state
42
(any integer works)fit_intercept
alpha
n_estimators
max_iter
Functions return different types of values:
Object Handles: Complex objects (models, dataframes)
<SVC>
Numeric Values: Single numbers
0.95
(accuracy score)Arrays: Multiple values
DataFrames: Tabular data
Load Data:
ML.DATASETS.IRIS()
- Classification datasetML.DATASETS.DIABETES()
- Regression datasetML.DATA.CONVERT_TO_DF()
- Excel to DataFrameExplore Data:
ML.DATA.INFO()
- Data structureML.DATA.DESCRIBE()
- StatisticsML.DATA.SAMPLE()
- View rowsPrepare Data:
ML.DATA.SELECT_COLUMNS()
- Choose columnsML.PREPROCESSING.TRAIN_TEST_SPLIT()
- Split dataML.PREPROCESSING.STANDARD_SCALER()
- Scale featuresTrain Models:
ML.FIT()
- Train any modelML.PREDICT()
- Make predictionsML.TRANSFORM()
- Transform dataEvaluate:
ML.EVAL.SCORE()
- Basic scoringML.EVAL.CV_SCORE()
- Cross-validation ⭐ML.EVAL.GRID_SEARCH()
- Hyperparameter tuning ⭐Common error messages and their meanings:
“Object handle not found”
“Invalid parameter value”
“Dimension mismatch”
“Premium function”
Always use consistent data shapes
Set random_state for reproducibility
Check data types
Handle missing values
Start with simple models
Browse functions by category:
Or return to:
Complete reference for FormulaML data handling functions including datasets, data manipulation, and exploration.