Getting Started with FormulaML

Learn the basics of FormulaML and create your first machine learning model in Excel. Perfect for beginners with no programming experience.

Getting Started with FormulaML

Welcome! This guide will help you get up and running with FormulaML in just a few minutes. By the end, you’ll have created your first machine learning model entirely within Excel.

Installation

Prerequisites

  • Microsoft Excel 2016 or later (Windows or Mac)
  • Active internet connection for initial setup
  • Excel add-ins enabled in your Office settings

Installing FormulaML

  1. Open Excel
  2. Go to Insert → Get Add-ins (or Office Add-ins)
  3. Search for “FormulaML” in the store
  4. Click Add to install the add-in
  5. Once installed, you’ll see FormulaML functions available with the ML. prefix

Verifying Installation

Test that FormulaML is working by typing this formula in any cell:

=ML.DATASETS.IRIS()

If you see <Dataset>, FormulaML is installed correctly!

Key Concepts

Before diving in, let’s understand three essential concepts:

1. Object Handles

FormulaML uses “object handles” to manage complex data structures in Excel. When you see something like <Dataset> or <LinearRegression>, that’s an object handle. Think of it as Excel’s way of storing a complete dataset or model in a single cell.

Example:

  • Cell A1: =ML.DATASETS.IRIS() returns <Dataset>
  • This handle represents the entire Iris dataset (150 rows, 5 columns)

2. The ML Workflow in Excel

The typical machine learning workflow in FormulaML follows these steps:

  1. Load Data → Creates a DataFrame handle
  2. Split Data → Separates features (X) and target (y)
  3. Train Model → Uses ML.FIT to create a model handle
  4. Make Predictions → Uses ML.PREDICT with new data
  5. Evaluate → Check model performance with ML.EVAL.SCORE

3. Function Namespaces

All FormulaML functions are organized into logical namespaces:

  • ML.DATASETS.* - Built-in datasets for learning
  • ML.DATA.* - Data manipulation and exploration
  • ML.REGRESSION.* - Regression algorithms
  • ML.CLASSIFICATION.* - Classification algorithms
  • ML.CLUSTERING.* - Clustering algorithms
  • ML.EVAL.* - Model evaluation functions
  • ML.PREPROCESSING.* - Data preparation tools

Your First Model

Let’s create a simple classification model using the famous Iris dataset. This dataset contains measurements of iris flowers and classifies them into three species.

Step 1: Load the Data

In cell A1, enter:

=ML.DATASETS.IRIS()

Result: <Dataset>

Step 2: Explore the Data

Let’s see what’s in our dataset. In cell A3, enter:

=ML.DATA.INFO(A1)

This shows the structure of your data: 150 samples with 4 features plus 1 target column.

To see a sample of actual data, in cell A5:

=ML.DATA.SAMPLE(A1, 5)

This displays 5 random rows from your dataset.

Step 3: Prepare Features and Target

We need to separate our features (flower measurements) from our target (species).

In cell B1 (select features - columns 0-3):

=ML.DATA.SELECT_COLUMNS(A1, {0,1,2,3})

In cell C1 (select target - column 4):

=ML.DATA.SELECT_COLUMNS(A1, 4)

Step 4: Split for Training and Testing

Good practice requires splitting data into training and testing sets.

In cell A10:

=ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, C1, 0.2, 42)

This splits 80% for training, 20% for testing. The result is an array of 4 handles:

  • X_train (training features)
  • X_test (test features)
  • y_train (training target)
  • y_test (test target)

Step 5: Create and Train a Model

Let’s use a Support Vector Machine (SVM) classifier. In cell A15:

=ML.CLASSIFICATION.SVM()

Result: <SVC> (untrained model)

Now train it with your data. In cell A16:

=ML.FIT(A15, INDEX(A10,1), INDEX(A10,3))

Result: <SVC> with 🧠 icon (trained model) This trains the SVM model using X_train and y_train from the split.

Step 6: Make Predictions

Use your trained model to predict on test data. In cell A18:

=ML.PREDICT(A16, INDEX(A10,2))

Result: Predictions for your test set

Step 7: Evaluate Performance

Check how well your model performs. In cell A20:

=ML.EVAL.SCORE(A16, INDEX(A10,2), INDEX(A10,4))

Result: A score between 0 and 1 (1 being perfect accuracy)

Congratulations! You’ve just built and evaluated your first machine learning model in Excel!

Understanding Your Results

What the Score Means

  • Classification models: Score represents accuracy (0.95 = 95% correct predictions)
  • Regression models: Score represents R² (how well the model fits the data)
  • Scores closer to 1 indicate better performance

Object Handle Management

  • Object handles persist during your Excel session
  • Save your workbook to preserve handles between sessions
  • Use descriptive cell labels to track what each handle represents

Next Steps

Now that you understand the basics, explore these areas:

📊 Work with Your Own Data

Learn to convert Excel data to DataFrames and apply ML to your datasets.

🤖 Try Different Algorithms

Explore other models like Linear Regression for numerical predictions or K-Means for clustering.

📈 Advanced Evaluation

Learn about cross-validation and hyperparameter tuning (Premium features).

💡 Complete Examples

Follow our step-by-step tutorials for real-world scenarios:

Common Questions

“What if I get an error?”

Check our troubleshooting guide for common issues and solutions.

“Can I use my own data?”

Yes! Learn how in our Working with Data guide.

“Which algorithm should I use?”

See our Models and Algorithms guide for choosing the right model.

“How do I improve accuracy?”

Explore model evaluation techniques and advanced features.

Quick Reference Card

Task Formula Example
Load Iris dataset =ML.DATASETS.IRIS()
View data info =ML.DATA.INFO(dataframe)
Select columns =ML.DATA.SELECT_COLUMNS(dataframe, {0,1,2})
Split data =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2)
Create SVM model =ML.CLASSIFICATION.SVM()
Train model =ML.FIT(model, X_train, y_train)
Make predictions =ML.PREDICT(trained_model, X_test)
Evaluate model =ML.EVAL.SCORE(trained_model, X_test, y_test)

Ready to dive deeper? Continue to Working with Data →