I take the archiving of my digital photos seriously. My photos are the archived memories of my family. I have developed over the years, a means whereby I sort, store, and archive them. I’ve been asked repeatedly how I do this, so I thought I would write it up once and for all.
I use a Canon 1Ds Mark II camera which is 16.7 Megapixels. I also shoot exclusively in Raw with the DSLR, which yields files that range in size between 13 and 22 megabytes each. Each of thes .CR2 raw files must be “developed” using special software. The resulting .jpg images create an additional file of about two to four megabytes. Then I may crop or alter the file, making a new copy of the full-sized .jpg. Then there are the web-sized versions and the thumbnails which are only 100k or so. After all my editing, each single image capture from my camera might consume a total of 25 megabytes of disk space with all copies considered – more if there are many versions.
My first rule, is that I let the camera name the files according to whatever scheme it uses. I may configure it once, but I do not rename my image files. Thus I might get an image name entitled P1040730.jpg from my Panasonic point and shoot camera, or _B0Z6573.CR2 from my DSLR. Back in the days of my 1.2MP Kodak DC120 I would rename the photos, but now I take in excess of 10,000 images a year, and I just don’t have the time. Honestly I just don’t care about the image names anyway, opting instead to use directory names to identify each event.
As soon as I copy the images from the camera’s card, they are put onto a mirrored RAID pair of drives. I once had a drive fail during the transfer and I lost 6 Gig worth of pics. That was not a happy day. The next day I set up RAID so that a drive failure wouldn’t hurt me. Once they’re on the RAID pair, I delete them from the card. I have two drives for photography:
- Current – two 250G (now 3TB) mirrored drives
- Archive – One 1TB (now 3TB) drive
The Archive drive gets upgraded every year because it fills up. Luckily my disk space needs seem to run right behind what $100 will buy me that year, so it works out. Every couple of years I have to upgrade the mirror pair as well.
All of my photography web pages have an underlying hierarchy on the server. The top level will be a master archival index called archive-index.html. The next level will contain all of the years. Within each year are all of the event folders for that year. Each event folder contains the images and HTML for that event. The hierarchy can be thought of like this:
Archive-Index \-YYYY \- YY-MM-DD_EventNameWithNoSpaces \- YY-MM-DD_EventNameWithNoSpaces \-YYYY \- YY-MM-DD_EventNameWithNoSpaces \- YY-MM-DD_EventNameWithNoSpaces
On my computer, there is more complexity than on the web server, but the basic format is the same. The folder hierarchy for both the Archive and Current drives is the same.The root contains only year folders. The year folders contain event folders. The event folders contain that event’s raw files. The event folder might contain a Develops folder for processed jpgs, but will not if a point and shoot camera was used. The Event folder will also contain a duplicately named folder that will contain web-sized copies, thumbnails and HTML code. This folder will eventually be copied or moved to my HTML drive for inclusion into my Family page.
YY=Year, MM=Month and DD=Day. Thus:
YYYY # There is nothing here but folders \- YY-MM-DD_EventNameWithNoSpaces # This folder contains all the Raw files \- Develops # .jpgs developed from raw \- YY-MM-DD_EventNameWithNoSpaces # Web sized versions of .jpgs
In practice, it might look like this:
2009 \- 09-08-16_AnnieAndTheTrash \- Develop \- 09-08-16_AnnieAndTheTrash \- 09-08-17_GibsonLesPaulR8 \- Develops \- 09-08-17_GibsonLesPaulR8
You might have notice the odd way in which I depict the date in my folder names. The year is listed first, then the month, then the day. In this way the folders always sort properly. If I were to use a normal American date format like 08-16-09_Event, then August-2009 would sort with August-2008 which makes me twitch. By using my format, the system will always sort properly. This is less of an issue with the event folders separated into year folders, but being this detailed always pays off in the long run.
There are some very specific aspects to the folder names. They have saved me countless hours of coding and have let me do some pretty cool things over the years.
- Each section of the date: YY-MM-DD is separated by a hyphen
- The date is separated from the event name by an underscore
- There are never spaces or any non-alphanum characters in the event name
- Each word in the event name is capitalized
These may looks like the random rantings of a crazed old programmer, and they are, but there is logic, and logic is our friend. By separating the date from the event name with an underscore, I can write scripts and trust that everything to the left of the underscore is the date, and everything to the right is the event name. By using hyphens, I can always parse the date. By never using spaces, I can guarantee that the folder name will work in all operating systems, and be understood by all browsers. Similarly, by not allowing characters like apostrophes, I can ensure that my script will work on multiple operating systems.
The folder with the web-sized versions of the images has the same name as the original because it will be copied in whole to my HTML drive where I will add it to my web page. The same logic works there as well. Once this folder is copied to the HTML drive, the folder names will be the same, and will be sorted the same way as the originals. Additionally, I’ll easily be able to tell from the folder name – which is included in the thumbnail HTML page – where to find the image on my drive.
Once I have all the pics processed and settled, I make two copies to DVD-DL. One DVD-DL stays home in the safe, and the other one goes to a safety deposit box at the bank. Seriously. I have a box at the bank that has nothing but hundreds of DVD disks in it. I could rebuild both my home PC and my servers in the event of a catastrophe.
At the end of each year, I copy the entire year over to the archive drive and create a new folder on the Current drive for the new year. I also delete the current year from the RAID pair which frees up space for the new year. Since the archived files are also backed up to multiple DVDs, there is no longer a need for RAID.