Family Photo Management

Post still in work

Our family pictures were scattered over several computers, each with a unique photo management application. In an effort to get a good backup in place. I moved all my pictures to one computer where I accidentally deleted them. (Long story.) I was able to recover them all, but I had numerous duplicates and huge amounts of other image junk. To make matters much more complicated. I accidentally moved all files into one directory with super-long file names that represented their paths. (Another long story.) Yes, I should have built backups. Lesson learned. In any case, while scripting super-powers can sometimes get you into trouble, the only way to get out of them is with some good scripts.

We have decided to use Lightroom on Windows as a photo-management application. Our windows machines have a huge amount of storage that we can build out quickly with cheap hard drives. However, you can imagine one problem I have to solve is to eliminate a huge amount of duplicate images at different size, and to get rid of junk images.

Removing duplicates

I wrote the following code in Matlab to find duplicates and bin very dark images. It scans the directory for all images, reduces their size, computes an image histogram, which it then wraps into 16 sections, that are summed and normalized. I then run a 2-d correlation coefficient on each possible combination.

$$
r = \frac{
\sum_m \sum_n \left(A_{mn} – \hat A \right) \left(B_{mn} – \hat B \right)
}{
\sqrt{
\left(
\sum_m \sum_n \left(A_{mn} – \hat A \right)^2
\right)
\left(
\sum_m \sum_n \left(B_{mn} – \hat B \right)^2
\right)
}
}
$$

The result are comparisons such as this one.

comparison

And a histogram of the correlation coefficients shows a high degree of correlation in general.

histogram

My goal is to use this to keep the biggest photo and put the others in a directory of the same name. More to come, after I get some help.