Using Tags in GThumb
I guess most people have their own way of organizing lots of digital images efficiently. Most of them probably use databases for organization even though there are standardized ways to put metadata into the image files themselves. This blog post will detail my approach based on manipulating only the image files themselves rather than putting a database "on top", or other abstractions external to the files themselves.
The GThumb application from the GNOME Project is my preferred tool to organize digital images. Up until I wanted to work only on subsets of the original images, the standard functionality was good enough for my very amateurish needs. In the previous years we used to manually copy individual files into separate directories for different further processing. Such a procedure leaves a lot of duplicate files behind and does not record the selection process in the original files themselves. After cleaning up the left behinds from those previous years I set out to establish a better procedure this time. The aim was to enable an easy process to select pictures relevant for different groups of persons all the while recording the process in the original image files. The tagged files are then collected with a short script into individual directories for further processing, e.g. for creating a picture book out of them.
Keywords In Exif Data
All digital images from cameras or mobile phones already carry a great
many of metadata in the Exif section of the files. This data
includes fields for creation dates of the picture, camera make, camera
settings, etc. This section also already contains the field Keywords
as can be seen with the exiftool command line tool:
dzu@krikkit:/tmp/exif$ exiftool -G -D -keywords DSC02657.JPG
[IPTC] 25 Keywords : Favorit
dzu@krikkit:/tmp/exif$
The "-G" option adds the group prefix for the attribute name and the
"-D" adds the ID of the field from the spec. As I know now, there are
many such groups in the standard and as we can check, the IPTC Group
defines the ID 25 as Keywords
with the data type string[0,64]+
,
i.e. a list. This allows us to add many such keywords to a single
file. For our use case I would like to see keywords represent the
target audience group.
The configuration file .config/gthumb/tags.xml
simply contains a
list of strings that GThumb will present in the tag menu:
<?xml version="1.0" encoding="UTF-8"?>
<tags version="1.0">
<tag value="Bildschirmfotos"/>
<tag value="Familie"/>
<tag value="Favorit"/>
<tag value="Geburtstag"/>
<tag value="Party"/>
<tag value="Radfahren"/>
<tag value="Spiele"/>
<tag value="Temporär"/>
<tag value="Urlaub"/>
<tag value="Wichtig"/>
<tag value="Wissenschaftlich"/>
</tags>
With this setup, we can use GThumb's tag menu to add one or multiple of these tags to a file.
Reading Keywords From Files
Now that we have the tags in the files, we can use our tools to
process them. To scan all files and extract the keywords, we can
again use exiftool
and its JSON output format. We use the -if
'$keywords'
construct of exiftool
to limit the output to files that
really carry keywords. Leaving this option out would create an entry
for every files in the JSON output, including empty ones where no
keywords were found, but I prefer to see only the relevant entries.
Here is an example with a made up directory hierarchy
/tmp/pictures
. The subdirectory 2023
contains 9 pictures from 2023
and two of them are tagged files. One of them carries only a single
keyword, and the other carries two. Note that in the latter case, the
value of Keywords
is a string and not an array of a string.
Actually I would have liked to see arrays all the time and it seems
that -strict
should work this way, but it does not work for me as
advertised. Any help would be appreciated, but for now we need to
remember this for the script in the next section.
dzu@krikkit:/tmp/pictures$ exiftool -r -if '$keywords' -keywords -json 2023 > 2023-keywords.json 1 directories scanned 7 files failed condition 2 image files read dzu@krikkit:/tmp/pictures$ jq . 2023-keywords.json [ { "SourceFile": "2023/IMG_20231029_160718.jpg", "Keywords": [ "Radfahren", "Urlaub" ] }, { "SourceFile": "2023/IMG_20231029_154527.jpg", "Keywords": "Radfahren" } ] dzu@krikkit:/tmp/pictures$
Mass Editing Keywords
Remember, the json file we created in the previous step is not really a database but only a cached version of the data that allows us quicker processing. If something changed in the folder, we need to recreate the file again with exiftool, but until then it is way faster to work on the cached version.
Here is an example where we decided to rename the keyword Radfahren
to Radeln
with plain sed:
dzu@krikkit:/tmp/pictures$ sed 's/"Radfahren"/"Radeln"/g' < 2023-keywords.json > new.json dzu@krikkit:/tmp/pictures$ jq . new.json [ { "SourceFile": "2023/IMG_20231029_160718.jpg", "Keywords": [ "Radeln", "Urlaub" ] }, { "SourceFile": "2023/IMG_20231029_154527.jpg", "Keywords": "Radeln" } ] dzu@krikkit:/tmp/pictures$
Flushing The JSON Database To The Files
This change did not change the tags in the original files, however.
In order to do this, we can use exiftool
again, but this time in
writing mode:
dzu@krikkit:/tmp/pictures$ exiftool -json=new.json 2023 No SourceFile '2023/IMG_20231029_162416.jpg' in imported JSON database (full path: '/tmp/pictures/2023/IMG_20231029_162416.jpg') No SourceFile '2023/IMG_20231029_153602.jpg' in imported JSON database (full path: '/tmp/pictures/2023/IMG_20231029_153602.jpg') No SourceFile '2023/IMG_20231029_155812.jpg' in imported JSON database (full path: '/tmp/pictures/2023/IMG_20231029_155812.jpg') No SourceFile '2023/IMG_20231029_162414.jpg' in imported JSON database (full path: '/tmp/pictures/2023/IMG_20231029_162414.jpg') No SourceFile '2023/IMG_20231029_162422.jpg' in imported JSON database (full path: '/tmp/pictures/2023/IMG_20231029_162422.jpg') No SourceFile '2023/IMG_20231029_155814.jpg' in imported JSON database (full path: '/tmp/pictures/2023/IMG_20231029_155814.jpg') No SourceFile '2023/IMG_20231029_162413.jpg' in imported JSON database (full path: '/tmp/pictures/2023/IMG_20231029_162413.jpg') 1 directories scanned 2 image files updated dzu@krikkit:/tmp/pictures$
Note that exiftool
found only two entries in new.json
and complained
about all the other files that it encountered. As we intentionally
created the JSON file without empty entries, this was to be
expected. We can now recreate our JSON database and see that the
contents of the original picture files indeed changed:
dzu@krikkit:/tmp/pictures$ exiftool -r -if '$keywords' -keywords -json 2023 > 2023-keywords.json 1 directories scanned 7 files failed condition 2 image files read dzu@krikkit:/tmp/pictures$ jq . 2023-keywords.json [ { "SourceFile": "2023/IMG_20231029_160718.jpg", "Keywords": [ "Radeln", "Urlaub" ] }, { "SourceFile": "2023/IMG_20231029_154527.jpg", "Keywords": "Radeln" } ] dzu@krikkit:/tmp/pictures$
Summary Of Data Structures
This setup keeps the tags in the images themselves, and so they will
not be lost if images are moved, renamed or copied to other
locations. Our <dir>-keywords.json
file is only a cached
representation for a directory hierarchy, but needs to be recreated
when the images have been manipulated by whatever way, be it GThumb,
the command line or any other means.
Collecting Images For Tags Into Directories
Keeping this in mind, we can now implement a small script to collect
all files for keywords into their own directory. As we use JSON for
our data structures, it is convenient to use a scripting language with
explicit support for JSON data. I will use the elvish shell which I
have come to like a lot lately. This shell allows very concise
scripts that are immune to the "special characters in file names" and
"unexpected error conditions in pipelines" problems of traditional
shells. elvish
is so nice that it warrants its own blog post, but
today you can get a glimpse of it from a real use case. Option
parsing, command line help and colored output are included in this
pretty terse implementation.
Without further ado, here is the script:
#!/usr/bin/elvish
#
# populate-tag-dirs.elv: Populate directories with links to tagged
# files.
#
# Extracting the tags into a JSON data structure is done with exiftool
# and so it needs to be installed for this script to run. The data
# structure is cached to "<dir>-keywords.json" to speedup further
# processing. There is an automatic check if the cache is invalid
# because there were changes to files more recent than the cache.
#
# Sample call:
# populate-tag-dirs.elv -v 2022
use flag
use str
use path
use re
fn usage {
echo "usage: "(path:base (src)[name])" [-h] [-n] <dir>" >&2
echo " Populates category directories according to database." >&2
echo " The database needs to be in '<dir>-keywords.json' and the" >&2
echo " directories will be named '<dir>-<keyword>'" >&2
exit 1
}
# Command argument parsing
var specs = [
[&short=h &long=help]
[&short=v &long=verbose]
[&short=n &long=dry-run]
]
var flags args = (flag:parse-getopt $args $specs)
set flags = (each {|f| put [ $f[spec][long] $f[arg] ] } $flags | make-map)
if (or (has-key $flags help) (== (count $args) 0) (> (count $args) 1)) {
usage
}
var verbose = (has-key $flags verbose)
var dry-run = (has-key $flags dry-run)
var path = $args[0]
var dbfile = $path""-keywords.json
# UI functions
fn error {|@msg|
echo (path:base (src)[name]): (styled error: red) $@msg >&2
}
fn info {|@msg|
echo (styled info: yellow) $@msg >&2
}
fn verbose {|@msg|
if $verbose {
echo (styled info: yellow) $@msg >&2
}
}
fn update_db {|path|
info Creating database for $path - this may take a while
exiftool -r -if '$keywords' -keywords -json $path > $path""-keywords.json
}
# Check argument
if ?(test ! -d $path) {
error $path missing or is not a directory
exit 1
}
# Check for exiftool
if (not ?(exiftool >/dev/null 2>&1)) {
error exiftool not installed
exit 1
}
# If the DB is missing, simply create it
if ?(test ! -f $dbfile) {
error $dbfile "not found"
update_db $path
} else {
# Check if DB needs update
var recent_file = (find $path -type f -printf "\n%T@ %p" | sort -nr | head -1 | awk '{print $2}')
if ?(test $recent_file -nt $dbfile) {
info $dbfile is outdated
update_db $path
} else {
verbose $dbfile is still current, no rescan needed
}
}
from-json < $dbfile | each {|entry|
# The entries can be a string or an array of strings. Normalize to
# an array to map over
if (is (kind-of $entry[Keywords]) "string") {
put [ $entry[Keywords] ]
} else {
put $entry[Keywords]
} | each {|kw|
var dir = $path"-"$kw
if $dry-run {
echo "ln "$entry[SourceFile]" "$dir
continue
}
if ?(test ! -d $dir) {
info Creating directory $dir
mkdir $dir
}
var target = $dir/(basename $entry[SourceFile])
if ?(test ! -f $dir/(basename $entry[SourceFile])) {
verbose Linking $entry[SourceFile] to $dir
ln $entry[SourceFile] $dir
} else {
verbose File $entry[SourceFile] already linked to $dir
}
} (one)
} (one)
This is how the script works in practice:
dzu@krikkit:/tmp/pictures$ populate-tag-dirs.elv 2023 info: Creating directory 2023-Radeln info: Creating directory 2023-Urlaub dzu@krikkit:/tmp/pictures$ ls -l insgesamt 20 drwxr-xr-x 2 dzu dzu 4096 23. Nov 17:12 2023/ -rw-r--r-- 1 dzu dzu 164 23. Nov 17:15 2023-keywords.json drwxr-xr-x 2 dzu dzu 4096 23. Nov 17:15 2023-Radeln/ drwxr-xr-x 2 dzu dzu 4096 23. Nov 17:15 2023-Urlaub/ -rw-r--r-- 1 dzu dzu 164 23. Nov 17:11 new.json dzu@krikkit:/tmp/pictures$
Note that the files are not copied, but hard linked to the tag directories. In this way, the copies do not take up any more space, as they really point to the same storage location as the original images. This way, we can easily create and "rm -r" the tag directories without any additional storage space. But it is much easier to work with the selected files as members of a real directory, so we can apply all the standard Unix tools to the set.
Summary
I like the Unix ecosystem so much, because it allows me to adapt the
tools to my workflows instead of the other way round. As I always
strive to keep things to where they belong, the idea of storing
metadata in the images itself rather than an external database is very
natural to me, but may seem completely freakish to others. But
equipped with powerful tools like exiftool
, sed
, jq
and
elvish
, it is easy to build an efficient workflow on top of it.
Comments
Comments powered by Disqus