Function Reference

Complete reference for all FormulaML functions. Find syntax, parameters, and usage examples organized by category.

Function Reference

Complete reference for all FormulaML functions organized by namespace. Each function includes syntax, parameters, return values, and practical examples.

Function Categories

📊 Data Functions

Functions for loading, exploring, and manipulating data.

  • ML.DATASETS.* - Built-in datasets
  • ML.DATA.* - Data manipulation and exploration

📈 Regression Models

Functions for predicting continuous values.

  • ML.REGRESSION.* - Linear, Ridge, Lasso, Elastic Net, Random Forest

🎯 Classification Models

Functions for categorizing data.

  • ML.CLASSIFICATION.* - Logistic, SVM, Random Forest

🔍 Clustering Models

Functions for finding groups in data.

  • ML.CLUSTERING.* - K-Means clustering

⚙️ Preprocessing Functions

Functions for preparing data.

  • ML.PREPROCESSING.* - Scaling, encoding, train-test split
  • ML.IMPUTE.* - Handling missing values

📏 Evaluation Functions

Functions for assessing model performance.

  • ML.EVAL.* - Scoring, cross-validation, grid search

🔧 Core ML Functions

Essential functions for model training and prediction.

  • ML.FIT - Train models
  • ML.PREDICT - Make predictions
  • ML.TRANSFORM - Transform data
  • ML.PIPELINE - Create workflows

🎨 Advanced Functions

Premium and specialized functions.

  • ML.DIM_REDUCTION.* - PCA, Kernel PCA
  • ML.FEATURE_SELECTION.* - Feature selection
  • ML.COMPOSE.* - Column transformers
  • ML.INSPECT.* - Model inspection

Function Naming Convention

All FormulaML functions follow a consistent naming pattern:

ML.[NAMESPACE].[FUNCTION_NAME](parameters)

Examples:

  • ML.DATASETS.IRIS() - Load Iris dataset
  • ML.REGRESSION.LINEAR() - Create linear regression model
  • ML.EVAL.SCORE() - Evaluate model performance

Free vs Premium Functions

✅ Free Functions

Most core functionality is available in the free tier:

  • Basic models (Linear, Logistic, SVM, K-Means)
  • Data handling and exploration
  • Model training and prediction
  • Basic evaluation

⭐ Premium Functions

Advanced capabilities require premium subscription:

  • Random Forest models
  • Cross-validation (ML.EVAL.CV_SCORE)
  • Grid search (ML.EVAL.GRID_SEARCH)
  • Kernel PCA (ML.DIM_REDUCTION.KERNEL_PCA)
  • OpenML datasets (ML.DATASETS.OPENML)

Premium functions are marked with a ⭐ icon in the documentation.

Understanding Object Handles

Many functions return or accept “object handles” - references to complex data structures:

Cell A1: =ML.DATASETS.IRIS()           → Returns: <Dataset>
Cell A2: =ML.REGRESSION.LINEAR()       → Returns: <LinearRegression>
Cell A3: =ML.FIT(A2, features, target) → Returns: <LinearRegression> (with 🧠 brain icon)

These handles allow Excel to manage complex ML objects efficiently.

Common Parameters

Frequently Used Parameters

random_state

  • Type: Integer
  • Purpose: Ensures reproducible results
  • Example: 42 (any integer works)

fit_intercept

  • Type: Boolean (TRUE/FALSE)
  • Purpose: Whether to calculate the intercept
  • Default: TRUE

alpha

  • Type: Float
  • Purpose: Regularization strength
  • Range: > 0 (higher = more regularization)

n_estimators

  • Type: Integer
  • Purpose: Number of trees in ensemble
  • Default: 100

max_iter

  • Type: Integer
  • Purpose: Maximum iterations
  • Default: Varies by algorithm

Return Value Types

Functions return different types of values:

  1. Object Handles: Complex objects (models, dataframes)

    • Example: <SVC>
  2. Numeric Values: Single numbers

    • Example: 0.95 (accuracy score)
  3. Arrays: Multiple values

    • Example: Cross-validation scores
  4. DataFrames: Tabular data

    • Example: Sample data, parameters

Quick Function Lookup

By Task

Load Data:

  • ML.DATASETS.IRIS() - Classification dataset
  • ML.DATASETS.DIABETES() - Regression dataset
  • ML.DATA.CONVERT_TO_DF() - Excel to DataFrame

Explore Data:

  • ML.DATA.INFO() - Data structure
  • ML.DATA.DESCRIBE() - Statistics
  • ML.DATA.SAMPLE() - View rows

Prepare Data:

  • ML.DATA.SELECT_COLUMNS() - Choose columns
  • ML.PREPROCESSING.TRAIN_TEST_SPLIT() - Split data
  • ML.PREPROCESSING.STANDARD_SCALER() - Scale features

Train Models:

  • ML.FIT() - Train any model
  • ML.PREDICT() - Make predictions
  • ML.TRANSFORM() - Transform data

Evaluate:

  • ML.EVAL.SCORE() - Basic scoring
  • ML.EVAL.CV_SCORE() - Cross-validation ⭐
  • ML.EVAL.GRID_SEARCH() - Hyperparameter tuning ⭐

Error Messages

Common error messages and their meanings:

“Object handle not found”

  • The referenced cell doesn’t contain a valid object
  • Solution: Check cell reference is correct

“Invalid parameter value”

  • Parameter is outside acceptable range
  • Solution: Check parameter constraints

“Dimension mismatch”

  • Data shapes don’t match
  • Solution: Ensure X and y have same number of rows

“Premium function”

  • Function requires premium subscription
  • Solution: Upgrade or use free alternative

Best Practices

  1. Always use consistent data shapes

    • Features (X) and target (y) must have same number of rows
  2. Set random_state for reproducibility

    • Use same seed value across related operations
  3. Check data types

    • Ensure numerical data isn’t stored as text
  4. Handle missing values

    • Use ML.IMPUTE or clean data before analysis
  5. Start with simple models

    • Use as baseline before complex models

Browse functions by category:

Or return to: