## What is a (Mathematical) Model?

By Bob Carpenter

I was shocked and dismayed when I heard from a reader that I’d used the term “model” over 200 times in the LingPipe book without ever saying what it meant. This kind of thing is why it’s so hard to write introductory material.

Perhaps I shouldn’t have been surprised at this comment, because other people had expressed some uncertainty about the term “model” to me in the past.

### What is a (Mathematical) Model?

In short, when I say “model”, I mean it in the bog-standard scientific sense, as explained on:

- Wikipedia: Mathematical Model

Quite simply, it’s just a bunch of math used to describe a phenomenon. Nothing interesting here either philosophically or conceptually, just the usual scientific method.

For instance, Newton’s equation (force equals mass times acceleration) is a mathematical model that may be used to describe the motions of the planets, among other things. Newton derived his model from Kepler’s observation that the planets picked out equal area in equal time in their orbits. Newton realized that by introducing the notion of “gravity”, he could model the orbits of the planets. Of course, he had to invent calculus to do so, but that’s another story.

### Prediction vs. Postdiction

Typically models are used for predicting future events, but sometimes they’re used to retroactively try to understand past events (“backcasting” ["aftcasting" if you're nautical] is the time opposite of “forecasting”, and “postdiction” the opposite of “prediction”). For instance, climate scientists attempt to postdict/backcast earth temperatures from data such as tree rings; we’re working on fitting models of such data with Matt Schofield as part of our Bayesian inference project at Columbia.

### All Models are Wrong, but …

As the statistician George E. P. Box said, “Essentially, all models are wrong, but some are useful.” For instance, Newton’s model is wrong in that it doesn’t correct for relativistic effects at very high velocities. But it proved useful at predicting everything from eclipses to the tides.

The models we’ve used in LingPipe, such as the HMM model of part-of-speech tagging, are also clearly wrong. Language just isn’t Markovian (meaning that the n-th word only depends on a fixed window of previous few words). But we can still do pretty well at predicting part of speech tags with the simplified model.

# What Is a Model?

- In its most common usage, a model is an abstract representation of an item or a concept—a car, a plane, or a building—or a part of something, such as a tire, a wing, or a room. Models are created in order to view, manipulate, or test the thing they represent without having to build the real thing. People use models and modeling every day to improve their work and their world.

# What is an Algorithm?

In its most general sense, an algorithm is any set of detailed instructions which results in a predictable end-state from a known beginning. Algorithms are only as good as the instructions given, however, and the result will be incorrect if the algorithm is not properly defined.

A common example of an algorithm would be instructions for assembling a model airplane. Given the starting set of a number of marked pieces, one can follow the instructions given to result in a predictable end-state: the completed airplane. Misprints in the instructions, or a failure to properly follow a step will result in a faulty end product.