David Faure
2013-04-14 21:48:40 UTC
To implement a maximum size for the trash directory, one needs to check the
size every time a new item is being trashed. With the current spec, the only
solution is to do a recursive traversal, which is pretty expensive.
To make this efficient, we need a cache.
My initial idea of a global "total size" cache doesn't work well with older
implementations which don't update that value, so it gets out of date quickly.
Instead, Ryan Lortie and I came up with the following idea, which we would
like to standardize into the trash spec:
For files, we get the file from stat. For dirs, we use a cache:
in every trash directory, a metadata file is created, with one entry per
directory (that was trashed by the user).
That entry contains the total size in bytes of the directory, and the
modification time of the trashinfo file [*].
The metadata file uses desktop file syntax, where the key is the directory
name, and the value is a pair: size, and mtime.
However the desktop file standard restricts the available characters for keys,
so instead of just writing out the directory name, we write the sha1 of the
directory name (a bit like the thumbnail spec uses sha1s too).
In summary, it would look like this:
[Directories]
# One entry per sub-directory of the "files" directory
# key = sha1 of the directory name
# value = size in bytes, timestamp of the trashinfo file, in UTC
cb58e5c11a6802db43fd82ca8d3c7393353c0eab=25383,2009-07-11T20:18:30
f1d2d2f924e986ac86fdf7b36c94bcdf32beec15=2315,2012-04-12T10:05:20
To determine size of the trash directory, this leads to the following
algorithm:
totalsize = 0
prepare empty set of sha1s
list "files" directory, and for each entry:
stat the entry
if a file, totalsize += file size
if a directory,
stat the trashinfo file to get its mtime
calculate sha1 of the directory name
read entry from metadata file
if entry found
extract cached_size and cached_mtime
if mtime != cached_mtime
re-calculate directory size
update entry (size of directory, mtime of trashinfo file)
else
calculate directory size
write entry (size of directory, mtime of trashinfo file)
totalsize += directory size
add sha1 to set of seen sha1s
done
for each entry in the metadata file,
if entry key is not in the set of seen sha1s
remove entry
[*] This way, if an older trash implementation deletes and recreates this
entry, we can detect that the cache entry is stale [even if the directory got
restored and trashed again, so the mtime of the directory itself didn't
change, this is why we use the mtime of the trashinfo file, instead].
If there is no objection, I will make a patch for the trash spec.
size every time a new item is being trashed. With the current spec, the only
solution is to do a recursive traversal, which is pretty expensive.
To make this efficient, we need a cache.
My initial idea of a global "total size" cache doesn't work well with older
implementations which don't update that value, so it gets out of date quickly.
Instead, Ryan Lortie and I came up with the following idea, which we would
like to standardize into the trash spec:
For files, we get the file from stat. For dirs, we use a cache:
in every trash directory, a metadata file is created, with one entry per
directory (that was trashed by the user).
That entry contains the total size in bytes of the directory, and the
modification time of the trashinfo file [*].
The metadata file uses desktop file syntax, where the key is the directory
name, and the value is a pair: size, and mtime.
However the desktop file standard restricts the available characters for keys,
so instead of just writing out the directory name, we write the sha1 of the
directory name (a bit like the thumbnail spec uses sha1s too).
In summary, it would look like this:
[Directories]
# One entry per sub-directory of the "files" directory
# key = sha1 of the directory name
# value = size in bytes, timestamp of the trashinfo file, in UTC
cb58e5c11a6802db43fd82ca8d3c7393353c0eab=25383,2009-07-11T20:18:30
f1d2d2f924e986ac86fdf7b36c94bcdf32beec15=2315,2012-04-12T10:05:20
To determine size of the trash directory, this leads to the following
algorithm:
totalsize = 0
prepare empty set of sha1s
list "files" directory, and for each entry:
stat the entry
if a file, totalsize += file size
if a directory,
stat the trashinfo file to get its mtime
calculate sha1 of the directory name
read entry from metadata file
if entry found
extract cached_size and cached_mtime
if mtime != cached_mtime
re-calculate directory size
update entry (size of directory, mtime of trashinfo file)
else
calculate directory size
write entry (size of directory, mtime of trashinfo file)
totalsize += directory size
add sha1 to set of seen sha1s
done
for each entry in the metadata file,
if entry key is not in the set of seen sha1s
remove entry
[*] This way, if an older trash implementation deletes and recreates this
entry, we can detect that the cache entry is stale [even if the directory got
restored and trashed again, so the mtime of the directory itself didn't
change, this is why we use the mtime of the trashinfo file, instead].
If there is no objection, I will make a patch for the trash spec.
--
David Faure, ***@kde.org, http://www.davidfaure.fr
Working on KDE, in particular KDE Frameworks 5
David Faure, ***@kde.org, http://www.davidfaure.fr
Working on KDE, in particular KDE Frameworks 5