Table of Contents
Tips and Troubleshooting
This guide helps you resolve common issues, optimize performance, and use FormulaML effectively.
Common Error Messages
#NAME? Error
Problem: Excel doesn’t recognize the FormulaML function.
Solutions:
-
Check installation:
- Go to Insert → Get Add-ins
- Search for FormulaML
- Ensure it’s installed and enabled
-
Verify function syntax:
Wrong: =MLDATASETS.IRIS() Right: =ML.DATASETS.IRIS()
-
Restart Excel:
- Save your work
- Close and reopen Excel
- Try the function again
“Object handle not found”
Problem: Referenced cell doesn’t contain a valid FormulaML object.
Common Causes:
# Wrong cell reference
Cell A1: =ML.DATASETS.IRIS()
Cell A2: =ML.DATA.INFO(B1) # Should be A1, not B1
Solutions:
- Check you’re referencing the correct cell
- Ensure the referenced cell contains an object handle
- Verify the object hasn’t expired (restart Excel can clear handles)
“Dimension mismatch”
Problem: Data shapes don’t match between operations.
Example:
X_train has 100 rows
y_train has 80 rows # Mismatch!
Solutions:
- Ensure X and y have same number of rows
- Check train/test split consistency
- Verify column selection is correct
“Invalid parameter value”
Problem: Function parameter is outside acceptable range.
Examples:
=ML.CLUSTERING.KMEANS("five") # n_clusters must be integer
=ML.REGRESSION.RIDGE(-1.0) # alpha must be positive
=ML.DATA.SAMPLE(data, 0) # n must be positive or -1
Solutions:
- Check parameter documentation
- Use correct data types (numbers vs text)
- Verify ranges (e.g., percentages 0-1)
“Premium function requires subscription”
Problem: Trying to use premium feature without subscription.
Premium Functions:
- ML.REGRESSION.RANDOM_FOREST_REG
- ML.CLASSIFICATION.RANDOM_FOREST_CLF
- ML.EVAL.CV_SCORE
- ML.EVAL.GRID_SEARCH
- ML.DIM_REDUCTION.KERNEL_PCA
- ML.DATASETS.OPENML
Solutions:
- Use free alternatives (e.g., Linear instead of Random Forest)
- Upgrade to premium subscription
- Check documentation for free equivalent workflows
Performance Optimization
Large Dataset Issues
Problem: Slow performance with big datasets (>10,000 rows).
Solutions:
-
Sample your data:
# Work with subset first Cell A1: =ML.DATA.SAMPLE(large_data, 5000, , , 42)
-
Reduce features:
# Select only important columns Cell A1: =ML.DATA.SELECT_COLUMNS(data, {1,3,5,7})
-
Use efficient algorithms:
- Linear models are fastest
- Avoid deep trees
- Limit iterations
-
Progressive workflow:
# Start small 1. Test with 100 rows 2. Validate with 1,000 rows 3. Train with 10,000 rows 4. Deploy on full dataset
Memory Management
Problem: Excel becomes unresponsive or crashes.
Solutions:
-
Clear unused objects:
- Delete cells with old object handles
- Save and reopen workbook periodically
-
Optimize data types:
# Remove unnecessary precision Round numerical data to needed decimal places Convert text to categories where possible
-
Batch processing:
# Process in chunks Chunk 1: Rows 1-10000 Chunk 2: Rows 10001-20000 Combine results after
Calculation Speed
Problem: Formulas take too long to calculate.
Solutions:
-
Disable automatic calculation:
- File → Options → Formulas
- Set to Manual calculation
- Press F9 to calculate when ready
-
Minimize volatile functions:
- Avoid unnecessary SAMPLE calls
- Cache results in cells
-
Use simpler models for exploration:
# Development: Use Linear Cell A1: =ML.REGRESSION.LINEAR() # Production: Use Random Forest Cell A1: =ML.REGRESSION.RANDOM_FOREST_REG()
Best Practices
Data Preparation
1. Always Clean Your Data First
# Check for issues
Cell A1: =ML.DATA.INFO(data) # Look for nulls
Cell A2: =ML.DATA.DESCRIBE(data) # Check for outliers
# Handle missing values
Cell B1: =ML.DATA.DROP_MISSING_ROWS(data)
# OR
Cell B1: =ML.IMPUTE.SIMPLE_IMPUTER("mean")
2. Consistent Data Types
- Don’t mix text and numbers in columns
- Format dates consistently
- Remove special characters
3. Feature Engineering in Excel
# Create new features before ML
Column F: =D2/E2 # Ratio feature
Column G: =MONTH(A2) # Extract month from date
Column H: =IF(B2>100,1,0) # Binary feature
Model Development
1. Start Simple, Add Complexity
Step 1: =ML.REGRESSION.LINEAR() # Baseline
Step 2: =ML.REGRESSION.RIDGE(1.0) # Add regularization
Step 3: =ML.REGRESSION.RANDOM_FOREST() # Complex model
2. Always Split Your Data
# Never evaluate on training data
Cell A1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2, 42)
↑
Always use seed
3. Compare Multiple Models
# Same data, different models
Linear Score: 0.72
Ridge Score: 0.74
Lasso Score: 0.73
Best: Ridge (free and performs well)
Production Deployment
1. Document Your Workflow
# Add comments to cells
Cell A1: Data loading - 10,000 customer records
Cell B1: Feature selection - top 5 predictors
Cell C1: Model - Ridge regression with alpha=0.5
Cell D1: Performance - R²=0.82 on test set
2. Version Control
# Save versions of your workbook
CustomerModel_v1.xlsx - Initial model
CustomerModel_v2.xlsx - Added features
CustomerModel_v3.xlsx - Production version
3. Monitor Performance
# Track model performance over time
Week 1 Score: 0.85
Week 2 Score: 0.84
Week 3 Score: 0.81 # Degrading - retrain needed
Common Pitfalls
1. Data Leakage
Problem: Test data information influences training.
Wrong:
# Scaling before splitting - WRONG!
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.FIT_TRANSFORM(A1, all_data)
Cell A3: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(A2, y, 0.2)
Right:
# Split first, then scale - RIGHT!
Cell A1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2)
Cell B1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell B2: =ML.FIT(B1, INDEX(A1,1)) # Fit on train only
Cell B3: =ML.TRANSFORM(B2, INDEX(A1,2)) # Transform test
2. Overfitting
Signs:
- Training score: 0.99
- Test score: 0.65
- Perfect memorization, poor generalization
Solutions:
# Add regularization
=ML.REGRESSION.RIDGE(10.0) # Higher alpha
# Simplify model
=ML.DIM_REDUCTION.PCA(5) # Reduce features
# Get more data
Combine multiple data sources
3. Wrong Metric
Problem: Using accuracy for imbalanced data.
Example:
95% customers don't churn
Model predicts "no churn" always
Accuracy: 95% (misleading!)
Catches 0% of actual churners
Solution: Consider business impact, not just accuracy.
Excel-Specific Tips
1. Cell Naming
# Name your cells for clarity
Name cell A1: "TrainingData"
Name cell B1: "Model"
Name cell C1: "Predictions"
# Use in formulas
=ML.FIT(Model, TrainingData, Labels)
2. Array Formulas
# INDEX for accessing split results
Cell A1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2)
Cell B1: =INDEX(A1, 1) # X_train
Cell B2: =INDEX(A1, 2) # X_test
Cell B3: =INDEX(A1, 3) # y_train
Cell B4: =INDEX(A1, 4) # y_test
3. Conditional Formatting
# Highlight predictions
Conditional Format:
- Green if prediction > 0.8
- Yellow if 0.5-0.8
- Red if < 0.5
4. Data Validation
# Ensure valid inputs
Data Validation on input cells:
- List for categorical inputs
- Number ranges for parameters
- Prevent invalid entries
Debugging Techniques
1. Check Intermediate Results
# Debug step by step
Cell A1: =ML.DATASETS.IRIS()
Cell A2: =ML.DATA.INFO(A1) # Check loaded correctly
Cell A3: =ML.DATA.SAMPLE(A1, 5) # View sample
Cell A4: =ML.DATA.DESCRIBE(A1) # Check statistics
2. Simplify to Isolate Issues
# Start with minimal example
1. Use built-in dataset (not your data)
2. Use simple model (Linear)
3. Skip preprocessing
4. If works, add complexity gradually
3. Verify Data Shapes
# Check dimensions match
Cell A1: =ML.DATA.INFO(X_train) # Should have n rows
Cell A2: =ML.DATA.INFO(y_train) # Should have n rows
Cell A3: =ML.DATA.INFO(X_test) # Should have m rows
Cell A4: =ML.DATA.INFO(y_test) # Should have m rows
FAQ
Q: How do I save trained models?
A: Save the Excel workbook. Object handles are preserved with the file.
Q: Can I use FormulaML offline?
A: Yes, once installed. But you need internet for initial installation and updates.
Q: What’s the maximum dataset size?
A: Depends on your system RAM. Generally handles up to 1 million rows efficiently.
Q: Can I export models to Python?
A: Not directly. FormulaML models live in Excel. Consider using the same algorithms in scikit-learn for Python.
Q: How do I update FormulaML?
A: Updates are automatic through Excel’s add-in system.
Q: Can multiple users share models?
A: Yes, share the Excel file. Each user needs FormulaML installed.
Getting Help
Resources
- Documentation: You’re here!
- Examples: See Examples and Tutorials
- Function Reference: Check Function Reference
- Support: Email support@formulaml.com
Before Contacting Support
- Check error message carefully
- Try with built-in dataset
- Verify FormulaML version is current
- Document steps to reproduce issue
- Include Excel version and OS
Community
- Share tips and tricks
- Report bugs
- Request features
- Help other users
Quick Reference Card
Essential Functions
# Data
=ML.DATASETS.IRIS()
=ML.DATA.CONVERT_TO_DF(range, TRUE)
=ML.DATA.SELECT_COLUMNS(df, {0,1,2})
# Models
=ML.REGRESSION.LINEAR()
=ML.CLASSIFICATION.SVM()
=ML.CLUSTERING.KMEANS(4)
# Training
=ML.FIT(model, X, y)
=ML.PREDICT(model, X)
=ML.TRANSFORM(transformer, X)
# Evaluation
=ML.EVAL.SCORE(model, X, y)
=ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2, 42)
Parameter Defaults
test_size: 0.2 (20% test)
random_state: 42 (or any integer)
alpha: 1.0 (regularization)
n_clusters: 8 (K-means)
n_estimators: 100 (Random Forest)
Performance Tips
- Sample large data first
- Use simple models for testing
- Add complexity gradually
- Monitor memory usage
- Save work frequently
Next Steps
- Return to Documentation Home
- Review Function Reference
- Try Examples
- Explore Advanced Features
Happy machine learning in Excel with FormulaML!