A few weeks ago I was playing with scientific figures and wondering how I might extract insights from them. One idea I had was to find all the colors in scientific images and rank them.
Given that different cells, cell parts, and tissues often have different colors—especially when stained to do so—that could be a productive path. For example, if cancer cells are stained a different color in an image than healthy cells, the higher the percentage of the cancer color, the worse it is.
Turns out this isn’t an easy problem to solve. Why? Because while we see a few colors in an image, there are actually many variations of those colors which are imperceptible to us.
The solution is to cluster images by color. For example, group all the reddish colors together, then all the bluish ones, and so on. And then you can determine the relative amounts of each color.
PS: I owe a debt to various places where I learned how to do this and copied some code snippets. I don’t remember them all but now wish I did. If you recognize this as something you’ve worked on, and want credit, please let me know as I’m happy to give it!