Bug 435120 - Data protection: Removing or editing annotation timestamp should be possible
Summary: Data protection: Removing or editing annotation timestamp should be possible
Status: REOPENED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: 20.12.3
Platform: Other Linux
: NOR wishlist
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-29 15:50 UTC by Rainer Klute
Modified: 2021-04-24 13:52 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Klute 2021-03-29 15:50:53 UTC
Metadata of annotations allow to track who annotated what and when. Due to data protection reasons, it should be possible to record these highly personal data on a voluntary basis only, i. e., the default should be to not records these data. For existing annotations, it should be possible to remove the author and/or timestamp information from annotations or set them to arbitrary values.
Comment 1 Yuri Chornoivan 2021-03-29 16:05:13 UTC
Actually, the first statement is misleading. You can change the date before annotating and change the name using the Okular configuration window.

https://docs.kde.org/trunk5/en/kdegraphics/okular/configannotations.html

The other things are not unambiguous as well. Is it something like PDF Studio 2020 "Sanitize" mode?

https://kbpdfstudio.qoppa.com/anonymizing-annotations-comments-in-pdfs/

Thanks in advance for your answer.
Comment 2 Albert Astals Cid 2021-03-29 16:56:43 UTC
I'm going to close this as out of scope.

Okular is not a PDF editor so "set things to arbitrary values" is not what it is supposed to do, if you want to edit a PDF, get a PDF editor.
Comment 3 Rainer Klute 2021-03-29 17:08:41 UTC
Ah, thanks, setting the author before editing is a viable way to go. Changing it afterwards on a per annotation basis might be tedious, but much better than not being able to change it at all. So I am fine with authors.

However, browsing the okular manual, I haven’t found any option to “change the date before annotating”. If you meant changing the date on the system level, I don’t think this a user-friendly option. For many, if not most, users it might not be an option at all.

Thanks for pointing me to the description of PDF Studio 2020’s “sanitize” functionality! That indeed seems to be what I meant. Being able to remove all dates from all (or from selected) annotations would really save my day. (I am not so much concerned about removing author information, but the mileage of others might vary.)

Thanks for consideration!
Comment 4 Yuri Chornoivan 2021-03-29 17:10:56 UTC
(In reply to Rainer Klute from comment #3)
> Ah, thanks, setting the author before editing is a viable way to go.
> Changing it afterwards on a per annotation basis might be tedious, but much
> better than not being able to change it at all. So I am fine with authors.
> 
> However, browsing the okular manual, I haven’t found any option to “change
> the date before annotating”. If you meant changing the date on the system
> level, I don’t think this a user-friendly option. For many, if not most,
> users it might not be an option at all.
> 
> Thanks for pointing me to the description of PDF Studio 2020’s “sanitize”
> functionality! That indeed seems to be what I meant. Being able to remove
> all dates from all (or from selected) annotations would really save my day.
> (I am not so much concerned about removing author information, but the
> mileage of others might vary.)
> 
> Thanks for consideration!

There are methods to batch remove all metadata:

https://sc015020.medium.com/removing-metadata-from-pdf-files-using-exiftool-and-qpdf-20090b75d7f0
Comment 5 Rainer Klute 2021-03-29 17:11:39 UTC
Albert Astals Cid, if okular is not a PDF editor, you should consider to remove the annotation capability completely and restrict it to pure viewing.
Comment 6 Albert Astals Cid 2021-03-29 17:16:44 UTC
(In reply to Rainer Klute from comment #5)
> Albert Astals Cid, if okular is not a PDF editor, you should consider to
> remove the annotation capability completely and restrict it to pure viewing.

No
Comment 7 2wxsy58236r3 2021-03-30 02:01:25 UTC
(In reply to Rainer Klute from comment #5)

Then would you expect Adobe Reader to be a pure viewer with no annotation capability?
Comment 8 Rainer Klute 2021-03-30 06:50:40 UTC
(In reply to 2wxsy58236r3 from comment #7)
> (In reply to Rainer Klute from comment #5)
> 
> Then would you expect Adobe Reader to be a pure viewer with no annotation
> capability?

As a Linux user, I don’t know or care what an “Adobe Reader” might be or not and what to expect from it or not.

However, what I am expecting from nowadays’ software is to stick to data protection principles, namely to those laid out in Article 5 GDPR https://gdpr-info.eu/art-5-gdpr/ and to data minimization in particular.

I could well understand if developers don't have enough resources or there are other difficulties to implement privacy features quickly. But I don't understand how someone can lightly dismiss privacy concerns and pretend that everything is just fine the way it is.
Comment 9 Rainer Klute 2021-03-30 07:34:41 UTC
(In reply to Yuri Chornoivan from comment #4)
> There are methods to batch remove all metadata:
> 
> https://sc015020.medium.com/removing-metadata-from-pdf-files-using-exiftool-
> and-qpdf-20090b75d7f0

Thanks for the link! However, while exiftool can manipulate metadata of the PDF file itself, it seems not to be able to access the metadata of annotations within the PDF file.
Comment 10 Oliver Sander 2021-03-30 07:36:44 UTC
To a large extend this is a resource problem -- Okular is not very well staffed.  If somebody came up with a patch it would certainly receive consideration.
Comment 11 2wxsy58236r3 2021-03-30 10:37:14 UTC
I believe "FIXED" is used only when Okular is patched (this feature is added to Okular)...

Author name is personal information, but is the timestamp also considered as personal information?

Can you please share your use case / workflow? (So that developers can understand the importance of the requested feature)
Comment 12 Rainer Klute 2021-03-30 11:45:27 UTC
(In reply to 2wxsy58236r3 from comment #11)
> Author name is personal information, but is the timestamp also considered as
> personal information?

According to Article 4 (1) GDPR, “‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly”.

Whether a timestamp is personal information, depends on the circumstances. If the author’s name is given and it relates to a natural person (i. e. not “Anonymous”, “Donald Duck”, or the like), the timestamp no doubt is personal data, because it reveals a data trace showing when that individual has created or edited which comments. Timestamps are even more revealing if the person creates/modifies multiple annotations, because you can deduce information about the working speed of the data subject, when he or she made pauses and for how long, and maybe more. I would say the timestamp can be highly sensitive personal data, even though it does not fall into the special categories defined by Article 9 paragraph 1 GDPR.

But even the timestamp alone, without regard to the author name field, can be personal data, namely if it’s obvious from the context or other information, which doesn't have to be in the document itself, who the author is. For example, if you send me a PDF, I create an annotation, and I send the document back to you, you know it was me - and when - even if I left the author field blank. This is what the GDPR means when it talks about “directly or indirectly” identifying a person.

To make a long story short, the timestamp is not always personal date. But as a first approximation, I would consider it as such, unless proven otherwise.


> Can you please share your use case / workflow? (So that developers can
> understand the importance of the requested feature)

The use case is that an individual making annotations does not want to reveal when he or she created/modified them – for whatever reasons or without any reason at all.

I would even say that according to Article 25 paragraph 2 GDPR omitting the timestamp should even be the default. (“The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed. That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility. In particular, such measures shall ensure that by default personal data are not made accessible without the individual’s intervention to an indefinite number of natural persons.”)
Comment 13 2wxsy58236r3 2021-03-30 12:37:28 UTC
Do you publish / upload PDFs publicly which contain annotations?

What is the purpose of annotation tool in your use case / workflow?

-----

I think many people use annotation tool to indicate the parts which should be edited in the source file. So an identifier (not necessarily the real name) and timestamp are useful, so that the person in charge knows who left the comment and when the comment was made.

In many situations, users will edit the source file (instead of directly editing the PDF) and re-generate a new PDF. The final PDF product should contain no annotation then.

-----

Image files (e.g. JPG) can contain GPS, Timestamp, Camera Model, etc. in the metadata, so I understand users will want to remove the metadata before publishing the images.

But in the case of PDF annotation, can you please explain your situation in more detail? Thank you very much!
Comment 14 Rainer Klute 2021-03-30 14:40:27 UTC
(In reply to 2wxsy58236r3 from comment #13)
> I think many people use annotation tool to indicate the parts which should
> be edited in the source file. So an identifier (not necessarily the real
> name) and timestamp are useful, so that the person in charge knows who left
> the comment and when the comment was made.

This is a valid scenario and there are certainly use cases for author names and timestamps, so the option to maintain these fields must be there. I don’t question that.


> But in the case of PDF annotation, can you please explain your situation in
> more detail? Thank you very much!

You are asking me to explain my particular situation and elaborate why I need more privacy. However, the data protection mindset is just the other way round: You should never have to justify why you want privacy. It is a fundamental right. Instead you should have to justify why privacy should be compromised, here: why timestamps must be maintained and to which granularity, i. e., the date might by sufficient without going down to hours and minutes.

Having said that, I simply do not want to reveal to my colleagues, to my boss, to the other department, to our customers, or even to the world when I did the annotations. If I want to do so, fine, but I would like to have a choice!
Comment 15 Albert Astals Cid 2021-03-30 17:12:06 UTC
I honestly don't see how GDPR applies here.
Comment 16 Rainer Klute 2021-03-31 16:53:32 UTC
(In reply to Albert Astals Cid from comment #15)
> I honestly don't see how GDPR applies here.

Well, I do. Maybe this somehow relates to the fact that I am a certified data protection officer.
Comment 17 Albert Astals Cid 2021-03-31 19:33:07 UTC
https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en

The GDPR applies to:
    a company or entity which processes personal data as part of the activities of one of its branches established in the EU, regardless of where the data is processed; or
    a company established outside the EU and is offering goods/services (paid or for free) or is monitoring the behaviour of individuals in the EU.

Okular is neither, so please enlighten me with your certified data protection officer knowledge.
Comment 18 Rainer Klute 2021-04-01 05:57:58 UTC
(In reply to Albert Astals Cid from comment #17)

You are right, the GDPR does not apply to a certain piece of software as such, but rather to a “controller”, i. e., “the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data” (Article 4 (7) GDPR).

Okular is such a possible “means of the processing of personal data”. A controller who has to stick to the GDPR may only deploy software that allows him to process personal data in a GDPR-compliant way. I’d appreciate if Okular would give him that possibility.
Comment 19 2wxsy58236r3 2021-04-01 10:26:13 UTC
(In reply to Rainer Klute from comment #18)
> A controller who has to stick to the GDPR may only deploy software that
> allows him to process personal data in a GDPR-compliant way.

> setting the author before editing is a viable way to go

> I haven’t found any option to “change the date before annotating”

Are you able to find any software or method which allows you to remove or edit the annotation metadata?
If no, does that mean you cannot process PDF files (or cannot do any annotation)?

Also, if "a controller who has to stick to the GDPR may only deploy software that allows him to process personal data in a GDPR-compliant way", does that mean they cannot use closed-source software?
How can they ensure that the closed-source software they are using have no backdoor and telemetry?
Comment 20 Rainer Klute 2021-04-22 05:57:26 UTC
Well, I don't think this is the right place to discuss GDPR principles. The question is whether you think it is a feature or a bug that Okular records sensitive data without giving the user the possibility to prevent it.
Comment 21 Albert Astals Cid 2021-04-22 17:21:06 UTC
You brought the GDPR up, and now you're complaining we're talking about it?

There is no sensitive data recorded, the name is optional and the date can be faked just moving the clock on your system if you fillthat having a timestamp is somehow "sensitive".
Comment 22 Rainer Klute 2021-04-23 05:33:21 UTC
(In reply to Albert Astals Cid from comment #21)
> You brought the GDPR up, and now you're complaining we're talking about it?

Oh no, I don’t mind discussing the GDPR, but I don’t see any point in challenging its principles. Theses principles are as they are, and there is no point in inventing your own interpretations that deviate from the established legal practice.


> There is no sensitive data recorded, the name is optional and the date can
> be faked just moving the clock on your system if you fillthat having a
> timestamp is somehow "sensitive".

By the way, we are not just talking about “sensitive” data, which requires specific protection according to recital 51 of the GDPR, but about personal data in general. The name of the user is no question personal. And adjusting the time of the system - are you really serious? Even in cases where the user is also the admin, this is not likely to be an practical option.

I also don't understand why you are resisting tooth and nail to consider the extensions I suggested. What is your motivation? Would it be that difficult to implement?
Comment 23 2wxsy58236r3 2021-04-24 02:43:36 UTC
I read [1] and I still do not understand why the timestamp is "sensitive".

If timestamp is "sensitive", then how can you use a computer?

Filesystem logs the creation time, modification time and access time of files, so how do you prevent that?

> Would it be that difficult to implement?
Please refer to Comment 10. Feel free to submit patches.

[1] https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en
Comment 24 Rainer Klute 2021-04-24 08:28:36 UTC
(In reply to 2wxsy58236r3 from comment #23)
> I read [1] and I still do not understand why the timestamp is "sensitive".

You can find the answer in the first two sentences of the source you quoted.


> If timestamp is "sensitive", then how can you use a computer?
> 
> Filesystem logs the creation time, modification time and access time of
> files, so how do you prevent that?

The GDPR is not about preventing the processing of personal data. It is about having to have a lawful basis for doing so.

By the way, I found the article “What Should Software Engineers Know about GDPR?” https://www.infoq.com/articles/gdpr-for-software-devs/ to be very worth reading. It explains the GDPR and its philosophy in a quite understandable way. Perhaps it succeeds in doing what I am obviously failing at here.


> > Would it be that difficult to implement?
> Please refer to Comment 10. Feel free to submit patches.

Why would I even consider that when even the very idea is so vehemently rejected here?
Comment 25 Laura David Hurka 2021-04-24 13:52:03 UTC
How should the user interface for defining the timestamp be like?

Okular annotations (except text markup) are bound to your pixel grid. If you use a zoom like 75% or 200%, which is often the case in Okular, the recipient of your review can easily calculate your screen size and resolution by just looking at the coordinates of some (or one) annotations. -> How should the user interface for defining an emulated screen resolution be like?

Freehand annotations store the order (and to some extent the speed) in which strokes were made. See https://en.wikipedia.org/wiki/Graphology (article has multiple issues ;) ). -> How should the user interface for obfuscating stroke order and speed be like?

The same applies to your typography in popup notes.

The words you use in popup annotations, or the number of spelling mistakes you spot, or other features of your review, may depend on your mood and environment. -> How should the user interface for rephrasing your review be like?

These examples may seem a bit ridiculous. But sorry: I don’t think we should care about protecting the timestamp.

I removed “author” from the summary line. Okular allows to change the name, there is nothing more to do.