In progress … (but feel free to read anyway!)
In this post I’ll tell you about some progress I’ve made in automatically detecting morphological features in fly embryo images. But first, a little background. Why would one want to automatically detect features in fly embryo images? The short story is this: embryos develop from a single cell to a full-fledged organism. This amazing process is governed by some complex genetic machinery. Imagine if we could observe the process and figure out how this machinery makes it happen. This is what biologists studying development do – they find meaningful patterns and try to understand this complex system.
The problem is, there is actually so much going on that it’s hard to figure out what data to look at. There are thousands of genes in a fruit fly. As the embryo develops, one can track where each one of these genes is expressed, or active, over time. Invariably, biologists look at only a small subset of the available data, picked out carefully using the knowledge and judgement that comes from years of experience. But what if we could an algorithm to recognize the same kinds of patterns as biologists do, but at a much larger scale, or even surprising new kinds of patterns that come from synthesizing tons of data?
This is where computer vision comes in. We can take pictures of fly embryos as they develop. Analyzing these pictures can give us insight into the patterns of development. So what does feature identification have to do with it? When analyzing how an object changes over time, it is useful to identify and refer to landmarks. Think of a map of the world and the continents drifting apart. If we could reliably recognize Greenland and India, then we could talk about the continental drift with reference to these landmarks. A similar idea applies to analyzing how an embryo develops over time.
It’s not hard to see how the solution to such a problem applies to other problems of detecting objects in images.
Okay! Onward to the computer vision.
What do we want to recognize?
We’ll be looking at the “germ band” feature that occurs in fly embryos at a certain stage of development. It’s that semicircular thing in the upper right of the embryo (which is the big oval thing):
Great! So how do we recognize it? One approach is to look at each pixel and determine if it’s part of the germ band or not. This is a classification problem, and I used supervised learning to do it.
I used a supervised learning approach, which means I feed in to the algorithm some collection of labeled data, and it spits out a model of how the data maps to labels. I can then use my model to predict the labels of previously unseen data points.
What are these data points exactly? Suppose I had a photo of a person and I wanted to classify each pixel in the photo as being part of the person or not. What information should I store about each pixel? I could store it’s location, and each of its color channels. But that doesn’t tell me anything about whether it’s part of a person or not. A more fruitful approach is to look at what’s going on in some neighborhood of the pixel. Let’s take an even simpler problem to make the point clearer. Suppose you have a blank white image with a black filled in circle in the middle. Suppose you wanted to classify each pixel as being part of the circle or not (of course in this simple case you could just do it by whether it’s black or white, but you can’t do anything analogous if you want to recognize a person in an image). If a pixel is near the edge of the circle for example, you could tell it’s near a circle by the fact that it has certain edges in its neighborhood.