Bug 293832

Summary: Digitaglinktree - Multi tag level combination - find any image though directory browsing [patch]
Product: [Applications] digikam Reporter: cyril.raphanel
Component: Plugin-Generic-WishForNewToolsAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: wishlist CC: caulier.gilles, krienke
Priority: NOR    
Version: 2.1.1   
Target Milestone: ---   
Platform: Unlisted Binaries   
OS: Linux   
Latest Commit: Version Fixed In: 4.0.0
Sentry Crash Report:
Attachments: First dirty test version to illustrate how result could look like in your image database
Add -e option to exclude some tags
"-Y" working now in with this multilevel option
"Date" root added
HELP -- PERL SUPPORT NEEDED
Recursion buggy but some hash & array issue fixed
Good base for recursion version - Performance issue still
Recursion version working with all flexibility to control performance.
Version 1.8.2
Version 1.8.2
minor correction in 1.8.2 for usage
man page updated for 1.8.2 version
digitaglinktree version 1.8.3
digitaglinktree 1.8.3 man page

Description cyril.raphanel 2012-02-11 11:17:27 UTC
Created attachment 68698 [details]
First dirty test version to illustrate how result could look like in your image database

Version:           2.1.1
OS:                Linux

Hi,

the idea is if you have a picture with Peter & Paul in France, you should be able to find the same image wether you think about:
Peter-> Paul-> France
or
France->Paul->Peter
or 
Paul->France->Peter

Reproducible: Didn't try

Steps to Reproduce:
test script
digitaglinktree -l my_directory -d my_database -M


Expected Results:  
any path directory leads to the picture you think independantly of the tag logic you follow when browsing directories.
Comment 1 caulier.gilles 2012-02-11 13:12:31 UTC
Another entry for you Krienke...

Gilles Caulier
Comment 2 cyril.raphanel 2012-02-12 03:05:27 UTC
Hi,

it should be also possible to include:
-list of keyword/tag that should be excluding (ex: _Digikam type)
-include as tag some other information from either:
-- exif
-- any info managed by digikam at album or image level
-"Date" should be managed as any other tag within this multi-level logic.
Comment 3 cyril.raphanel 2012-02-12 03:20:31 UTC
Or we keep it simple for information not as tagged, it could be managed up-front by updating the tag list in the image then in digikam database.

in my case it should be possible to manage they year as a tag from the exif using some tools.
Comment 4 cyril.raphanel 2012-02-12 03:30:54 UTC
Are those tags the ones managed in digikam?
Subject                         : 
Tags List                       : 
Last Keyword XMP                : 
Hierarchical Subject            : 

Which ones should I update if I want to create a tag category base on year? For example:
Tags List: Year/YYYY
Comment 5 cyril.raphanel 2012-02-12 17:03:30 UTC
Created attachment 68731 [details]
Add -e option to exclude some tags

-e option added to enable tag exclusion
"-e Digikam,People" would exclude all images which have the string "Digikam" or "People" in their tag path.
Comment 6 cyril.raphanel 2012-02-12 18:40:56 UTC
Created attachment 68734 [details]
"-Y" working now in with this multilevel option 

It would need some code cleansing/optimization.

Hardlink, flat structure & Archive options have not been implemented with the multilevel option.

Documentation (usage & man page) not updated.
Comment 7 krienke 2012-02-13 08:12:28 UTC
Hello cyril.raphanel@gmail.com,
I still do not understand the need of a fix:
If you have a photo and tag it with tags France, Peter, Paul, where Peter and Paul are Subtags of eg people and France is a subtag of say countries.

Then digitaglinktree will create a directory structure like

---countries
   --- France
---people
   ---Paul
   ---Peter

In each of the directories Peter, Paul and France you will find a link to the photo in question. So this is exactly what you described in your initial request. The photo is accessible via all the tags that have been assigned to it. What do you think is missing?
Comment 8 cyril.raphanel 2012-02-13 09:56:54 UTC
hi,

That is more a new feature than a fix.

The orginal purpose of the script is to reflect strictly the structure of the tags. What I propose here is to translate user thinking which typically combine many tags structure.

Typically the structure is organized as you describe it:
- Place
-- France

People
-- Peter
-- Paul

Date (even if most of time not manage as tag structure but exif data)

If you are sticking to this logic it is very difficult to find all pictures you have of Peter & Paul taken in France in 2004 just by browsing independantly if you thing Paul, Peter, France first.

This is what my patch propose:
People
-- Peter
--- _all => All pictures of Peter
--- Date => All pictures of Peter classified by dates
---- 2004
----- _all => All pictures of Peter in 2004
----- People
------ _all => All pictures of Peter in 2004 with some other People
------ Paul
------- _all => All pictures of Peter in 2004 with Paul
------- Places
------- _all => All pictures of Peter in 2004 with Paul within a specific Place
------- France
-------- _all => All pictures of Peter in 2004 with Paul in France

This way it is straight forward from the hierarchy to generate a jalbum on a specific topic (here being all pictures of Peter and paul together taken in France in 2004).

In addtion this is then straight forward to export the structure in a upnp server that support softlink file system wihtout messing around tag structure compatible with the upnp server installed.
Comment 9 cyril.raphanel 2012-02-13 10:00:28 UTC
Just to be consistant in my example
People
- Peter
-- _all => All pictures of Peter
-- Date => All pictures of Peter classified by dates
--- 2004
---- _all => All pictures of Peter in 2004
---- People
----- _all => All pictures of Peter in 2004 with some other People
----- Paul
------ _all => All pictures of Peter in 2004 with Paul
------ Places
------- _all => All pictures of Peter in 2004 with Paul within a specific Place
------- France
-------- _all => All pictures of Peter in 2004 with Paul in France
Comment 10 cyril.raphanel 2012-02-13 10:36:10 UTC
And if the user thinks first about his/her trip in France in 2004 with Paul where they met Peter the script will also translate this way of thinking in folder hierarchy and ends up to the same set of files
Place
- France
-- _all => All pictures taken in France
-- Date => All pictures taken in France classified by dates
--- 2004
---- _all => All pictures taken in France in 2004
---- People
----- _all => All pictures taken in France in 2004 with some other People
----- Paul
------ _all => All pictures taken in France in 2004 with Paul
------ People
------- _all => All pictures taken in France in 2004 with Paul and other people
------- Peter
-------- _all => All pictures taken in France in 2004 with Paul and Peter
Comment 11 cyril.raphanel 2012-02-13 15:56:42 UTC
(In reply to comment #10)
> And if the user thinks first about his/her trip in France in 2004 with Paul
> where they met Peter the script will also translate this way of thinking in
> folder hierarchy and ends up to the same set of files
> Place
> - France
> -- _all => All pictures taken in France
> -- Date => All pictures taken in France classified by dates
> --- 2004
> ---- _all => All pictures taken in France in 2004
> ---- People
> ----- _all => All pictures taken in France in 2004 with some other People
> ----- Paul
> ------ _all => All pictures taken in France in 2004 with Paul
> ------ People
> ------- _all => All pictures taken in France in 2004 with Paul and other people
> ------- Peter
> -------- _all => All pictures taken in France in 2004 with Paul and Peter

And the "pwd" would look like 
Place/France/Date/2004/People/Paul/People/Peter

you can play even with standard command like "find" and "grep" together to find anything from command line and even display your image with console base image viewer.
Comment 12 cyril.raphanel 2012-02-13 17:31:45 UTC
Created attachment 68765 [details]
"Date" root added

- Add "Date" root
- usage updated with "-e" and "-M" option
Comment 13 cyril.raphanel 2012-02-13 17:42:04 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > And if the user thinks first about his/her trip in France in 2004 with Paul
> > where they met Peter the script will also translate this way of thinking in
> > folder hierarchy and ends up to the same set of files
> > Place
> > - France
> > -- _all => All pictures taken in France
> > -- Date => All pictures taken in France classified by dates
> > --- 2004
> > ---- _all => All pictures taken in France in 2004
> > ---- People
> > ----- _all => All pictures taken in France in 2004 with some other People
> > ----- Paul
> > ------ _all => All pictures taken in France in 2004 with Paul
> > ------ People
> > ------- _all => All pictures taken in France in 2004 with Paul and other people
> > ------- Peter
> > -------- _all => All pictures taken in France in 2004 with Paul and Peter
> 
> And the "pwd" would look like 
> Place/France/Date/2004/People/Paul/People/Peter
> 
> you can play even with standard command like "find" and "grep" together to find
> anything from command line and even display your image with console base image
> viewer.

example of find command:
find my_directory -path "*Paul*Peter*France*2004*"

and voila :-)
Comment 14 cyril.raphanel 2012-02-13 17:52:28 UTC
> example of find command:
> find my_directory -path "*Paul*Peter*France*2004*"
> 
> and voila :-)

Erratum 
find my_directory -path "*Paul*Peter*France*2004*_all"
Comment 15 krienke 2012-02-14 13:49:47 UTC
@cyril.raphanel@gmail.com
Some feedback:
I just tried to run your patched version of digitaglinktree to see what the results look like (cause I still do not really understand in depth the semantics of -M for )an arbitrary tag. However the script would not terminate. After half an hour I killed it (the default digitaglinktree version took 7 minutes to terminate).  

I looked into one of the already created tag directories where the tag is named "000" and has subtags like blue, blau, colored, coloured and many more. I found for example this directory path that digitaglinktree had created:

<tagbasedir>/000/blau/000/blue/000/colored/000/coloured/000/

To me this does not seem to make sense.
Another problem is runtime which is too long at the moment.
Comment 16 cyril.raphanel 2012-02-14 17:56:54 UTC
Ok so volum might be a challenge, for your example I guess the tag structure is:
000->blau
000->blue
000->colored
000->coloured

and you have pictures tags with all of them?

If yes you get the result as expected.

I will do some performance test and introduce a level limitation parameter.
Comment 17 cyril.raphanel 2012-02-14 18:24:02 UTC
The results:
I have in total:
- 44 differents tags
- I have 13 088 of tag/images
- 15123 images have at least one tag
  -- 10676 have one tag + Year
  -- 1042 have 2 tags + Year
  -- 82 have 3 tags + Year
  -- 15 have 4 tags + Year
  -- 6 have 5 tags +Year

I took 1 minute 9 sec to make the complete tree.

Probably my test scenario is not significant enought to test performance.
Comment 18 cyril.raphanel 2012-02-14 20:22:08 UTC
Ouch ... I managed to generate a test scenario with 10000 images with 10 tags each and this is a disaster :-(

I am sure there is a way to make it efficient using recursivity ...
Comment 19 cyril.raphanel 2012-02-14 20:34:59 UTC
And I do not know how I have done my math but my results are non-sense!
Comment 20 cyril.raphanel 2012-02-16 18:39:54 UTC
I need help 

I call the function by
# Manage tags
createLinkTreeMultiBranch($refImages->{$i},${linktreeRootDir},$photoRootDir,0);


sub createLinkTreeMultiBranch{

	my(@tag_list,$linktreeRootDir,$photoRootDir,$counttag)=@_;
	$path=$tag_list->{"path"};
	$image=$tag_list->{"image"};
	$imgid=$tag_list->{"imgid"};
	$imgYear=$tag_list->{"imgYear"};
	$counttag++;
	$toto=$tag_list->{"tagpath"};
	print "IMAGEID - $tag_list - $path - $image - $imgid - $imgYear\n";
	for( $j=0;  $j<=$#{${tag_list}->{"tag"}}; $j++){
        	$tag=$tag_list->{"tagpath"}->[$j];
	        print "NB -- COUNTTAG -- $counttag -- TAG -- $tag\n";
        	# Check if we have not tried the path before
        	if(${linktreeRootDir} !~ /($tag)/) {
        		# print "$item -- $image --$item/$tag $tagcount $j\n";
        		# Create subdirectories needed
        		if( ! -d "${linktreeRootDir}/$tag/_all" ){
        			$ret=mkpath("${linktreeRootDir}/$tag/_all", 0, 0755);
        			
        			if( !$ret ){
        				die "Cannot mkdir \"${linktreeRootDir}/$tag/_all\"\n";
        			}
        		}
        		# Avoid conflicts for images with the same name in one tag directory
        		$linkName=$imgid."_".$image;
        		
        		# Get relative path from absolute one
        		$relPath="";
        		$relPath=File::Spec->abs2rel("${photoRootDir}$path", "${linktreeRootDir}/$tag/_all" ) ;
        		if( !length($relPath)){
        			die "Cannot create relative path from \"${linktreeRootDir}/$tag/_all\" to \"${photoRootDir}$path\" . Abort.\n";
        		}
        		# Create link
        		$sourceDir="$relPath/$image"; 
        		$destDir="${linktreeRootDir}/$tag/_all/$linkName";
        		
        		if (!-e $destDir){
        			$ret=symlink($sourceDir, $destDir);
        			if( !$ret ){
        				$ret="$!";
        				warn "Warning: Failed symbolic linking \"$sourceDir\" to \"$destDir\": ",
        				"\"$ret\"\n";
        			}
        		}
        		
        		if ($counttag < $#{$refImages->{$i}->{"tag"}}){
        			# create subdir
        			createLinkTreeMultiBranch(\%refImages,${linktreeRootDir}/$tag,$photoRootDir,$counttag);
        		}
        		
        	}
	}
}

I get
IMAGEID - HASH(0xaa06d0c) -  -  -  -
Comment 21 cyril.raphanel 2012-02-16 18:41:22 UTC
Created attachment 68853 [details]
HELP -- PERL SUPPORT NEEDED

I think I am lost with hash & table reference ....
Comment 22 cyril.raphanel 2012-02-16 19:14:00 UTC
Created attachment 68855 [details]
Recursion buggy but some hash & array issue fixed
Comment 23 cyril.raphanel 2012-02-16 19:59:01 UTC
My test case is
tag->tag1
tag->tag2
tag->tag3
tag->tag4
tag->tag5
tag->tag6
tag->tag7
tag->tag8
tag->tag9
tag->tag10

all images have all the 10 tags it create only the following directories
/tag/tag1/tag/tag2/tag/tag3/tag/tag4/tag/tag5/.../tag/tag10/

where it should also create all alternative combination such as
tag/tag2/tag/tag1/..../tag/tag10/
until
tag/tag10/tag/tag9/..../tag/tag1/

Any idea what I am doing in my recursion logic?
Comment 24 cyril.raphanel 2012-02-17 05:49:04 UTC
Created attachment 68863 [details]
Good base for recursion version - Performance issue still

Bottleneck seems to be directory creation (and read?).
Comment 25 cyril.raphanel 2012-02-17 05:59:02 UTC
Comment on attachment 68863 [details]
Good base for recursion version - Performance issue still

Performance improvement possibility:
- use mkdir instead of mkpath
- limit the number of levels managed with directory logic
- created directories stored in memory to avoid disk read when checking if directory exist or not
Comment 26 cyril.raphanel 2012-02-17 06:50:21 UTC
Comment on attachment 68863 [details]
Good base for recursion version - Performance issue still

Performance:
- nb of link write operation also impacting
Comment 27 krienke 2012-02-17 12:21:18 UTC
Just to give some feedback. Today I tried the latest patch. This time I was away for an hour but the script was still busy. Afterwards I wanted to remove the base directory and this also took more than 20min (on a Core I5 8GB Ext4 fs). It seems that simply the number of directory entries created by your algorithm is to big. 

Perhaps it would be a good idea to go the other way round: To make it mandatory that -M takes a list of tags to be included where all tags are excluded by default. This at least would ensure a positive user experience in a way that the script terminates after a short while and the user can inspect if the result is what he intended to to. The other way round depending on the number of photos and tags the script runs virtual for ever and the user might decide not to use it any longer.


This
Comment 28 cyril.raphanel 2012-02-17 14:59:54 UTC
Thanks,

I agree the volum of link & directories it creates is not manageable. It is actually not a processor or memory issue (during the whole process both are low in term of resource) but a disk access challenge.
Noticed also the delete is then a pain ....

I will look in the possibility you mention, I am also testing a level option to limit (only x levels ....) the volum.
Thinking also about another tradeoff option as for know the logic create a lot of duplicate link and directory in order to follow all possible human logic:
tag1->tag2->tag3
tag1->tag3->tag2
tag2->tag1->tag3
tag2->tag3->tag1
tag3->tag2->tag1
tag3->tag1->tag2
contains the same set of link ... some potential here if you can limit the number of "thinking" you can allow like People->Places->People->Places might be an overkill human thinking path :-)

Combine then keyword filtering (positive or negative) and level setting should give enought flexibility.
Comment 29 cyril.raphanel 2012-02-19 11:26:23 UTC
Created attachment 68923 [details]
Recursion version working with all flexibility to control performance.

 -M            Make multi-level directory combining all tag classification 
               including in each image.
               Number specifies the maximum tree level number.
-i             Including tags having specific string in their tag path.
               Use all if all tags needed.
-e             Excluding tags having specific string in their tag path. 
               Use all if all tags needs to be exluded. Default:all with -M and none without -M
               example Digikam,People would exclude all tags having Digikam or People in their path
-V             Verbose mode
Comment 30 krienke 2012-02-21 13:57:36 UTC
Hi, looks better now. I guess that -M should always be used with a number as argument defining the number of levels? At the moment one can call digitaglinktree with -M and nothing happens. In this case an error should be printed that gives a hint.  

The same is true for -i . Without using -i nothing happened even when I called the script with  -M 100. This is correct and safe for the user, but the script should print an error message that -M was used without -i 

By the way last time I called  the previous version of digitaglinktree using -M where it did not terminate even after a hour, I noticed that at least 8.000.000 inodes  in my filesystem had been eaten up after I stopped the script. So the defaults as they are now (do nothing without explicit option from the user) are good but some error handling is missing as well as the manual pages.
Comment 31 cyril.raphanel 2012-02-26 19:59:44 UTC
Thanks for the feedback

(In reply to comment #30)
> Hi, looks better now. I guess that -M should always be used with a number as
> argument defining the number of levels? At the moment one can call
> digitaglinktree with -M and nothing happens. In this case an error should be
> printed that gives a hint.  

Good point!
This was what I had in mind but did not have time to find how to make the test in perl ... I am fairly new in parameter management in this language. 


> The same is true for -i . Without using -i nothing happened even when I called
> the script with  -M 100. This is correct and safe for the user, but the script
> should print an error message that -M was used without -i 

I wanted to put a default value based but error message is better solution.

A bit overloaded right now but will try to man the change in the coming days ... no idea how to update the man package but willing to learn :-)
Comment 32 krienke 2012-02-29 06:15:02 UTC
The man page is simply a text file in roff format (it contains special commands for indentation, boldface etc). You can edit it with any text editor. Just look at the existing file to see how it works and add a description of the new options and how your extension to digitaglinktree works and what it can do for the user. 

If you want to preview your writings on linux you can do this by:
nroff -man digitaglinktree.1|more

Rainer
Comment 33 caulier.gilles 2012-02-29 06:21:39 UTC
Under KDE desktop, you can use this url into Dolphin or Konqueror :

man:/digitaglinktree/

Gilles Caulier
Comment 34 cyril.raphanel 2012-03-28 17:20:37 UTC
Created attachment 69972 [details]
Version 1.8.2

# 1.8.0 -> 1.8.1 2012/03/28 Cyril Raphanel, cyril.raphanel@gmail.com
# Added (-i) option to include only some tags - default 'all'
# Added (-e) option to exclude some tags - default ''
# Added (-M) option to create directory hierarchy base on tags selection, number of hierarchy level to be specified
# Added (-V) option to enable verbose mode
Comment 35 cyril.raphanel 2012-03-28 17:23:00 UTC
Created attachment 69973 [details]
Version 1.8.2

# 1.8.1 -> 1.8.2 2012/03/28 Cyril Raphanel, cyril.raphanel@gmail.com
# Added (-i) option to include only some tags - default 'all'
# Added (-e) option to exclude some tags - default ''
# Added (-M) option to create directory hierarchy base on tags selection, number of hierarchy level to be specified
# Added (-V) option to enable verbose mode
Comment 36 cyril.raphanel 2012-03-28 17:26:20 UTC
Probably some stuff missing to edit the man page
****
nroff -man digitaglinktree.1|more
troff: fatal error: can't open `digitaglinktree.1': No such file or directorynroff -man 
****
man:/digitaglinktree/
KDE Man Viewer Error
No man page matching to digitaglinktree/ found.

Check that you have not mistyped the name of the page that you want.
Check that you have typed the name using the correct upper and lower case characters.
If everything looks correct, then you may need to improve the search path for man pages; either using the environment variable MANPATH or using a matching file in the /etc directory.
****
Comment 37 cyril.raphanel 2012-03-28 17:34:46 UTC
Created attachment 69975 [details]
minor correction in 1.8.2 for usage
Comment 38 krienke 2012-03-30 06:28:57 UTC
Is nroff installed? Try "which nroff" to see if it exists. On my openSuSE system its in /usr/bin/nroff and partt of the groff RPM-package:
$ which nroff
/usr/bin/nroff
$ rpm -qf /usr/bin/nroff
groff-1.21-5.1.2.x86_64

When you call nroff  -man  digitaglinktree.1, then the digitaglinktree.1  file must reside in your current directory of course.
Comment 39 cyril.raphanel 2012-04-03 16:24:09 UTC
Created attachment 70122 [details]
man page updated for 1.8.2 version
Comment 40 caulier.gilles 2013-11-26 09:13:22 UTC
krienke,

What's about this entry ?

Gilles Caulier
Comment 41 caulier.gilles 2013-11-26 13:28:07 UTC
krienke,

digikamlinktree version 1.8.2 is commited to git/master through bug #293297

Please review files attached here and look how to merge these contents with git/master

Thanks in advance

Gilles Caulier
Comment 42 krienke 2013-11-27 14:01:32 UTC
I once again tried the multi-level feature added by cyril.raphanel@gmail.com.  I started it with a maximum level of 3. After an hour I stopped the script that still had not terminated.  I have about 14000 photos managed by digikam and about 2400 tags. My system is still deleting the link tree from disk  for in between 30min.

Because I am scared that users might complain that digitaglinktree locks up their systems, I changed the option logic so that by default multi-level uses no tags instead of all tags by default. So users have to use -i <taglist> to include tags in order to get a real multilevel result. Because the default is no longer to include all tags this should reduce run time.

These changes are reflected in the man pages as well as in the online help of the script.

I attached the new versions of the script (1.8.3) as well as the man page.

Rainer
Comment 43 krienke 2013-11-27 14:05:05 UTC
Created attachment 83792 [details]
digitaglinktree version 1.8.3

Fixed bugs concerning filesystem uuid recognition, which could make the script do nothing when called. Changed default tag inclusion (-i) from "all" to "none" for multi-level linktree (-M). User has to give a list with -i now when using -M.
Comment 44 krienke 2013-11-27 14:06:04 UTC
Created attachment 83793 [details]
digitaglinktree 1.8.3  man page
Comment 45 caulier.gilles 2013-11-27 14:21:09 UTC
Git commit 0a648e3051c4cbe807ced1ef9c6e3fd9c1127a59 by Gilles Caulier.
Committed on 27/11/2013 at 14:19.
Pushed by cgilles into branch 'master'.

update digikamlinktree to 1.8.3

M  +324  -46   utilities/scripts/digitaglinktree/digitaglinktree
M  +58   -8    utilities/scripts/digitaglinktree/digitaglinktree.1

http://commits.kde.org/digikam/0a648e3051c4cbe807ced1ef9c6e3fd9c1127a59