Bug 85134

Summary:	[patch] Support for ReplayGain information
Product:	[Frameworks and Libraries] taglib	Reporter:	Martijn Pieters <bugs.kde.org>
Component:	general	Assignee:	Scott Wheeler <wheeler>
Status:	CONFIRMED ---
Severity:	wishlist	CC:	moritz-kdebugs
Priority:	NOR
Version:	unspecified
Target Milestone:	---
Platform:	Compiled Sources
OS:	Linux
Latest Commit:		Version Fixed In:
Attachments:	Proposed patch adding ReplayGain support Updated patch with doxygen problems fixed. Updated patch with replaygain data format changes and dummied-out MPC APE support.

Description Martijn Pieters 2004-07-14 00:51:27 UTC

Version: (using KDE Devel)
Installed from: Compiled sources

There have been several requests for normalisation and ReplayGain support in applications that use taglib as the music-file-format metadata library.

I am in the process of adding support for ReplayGain to taglib, and code has progressed far enough along to propose a patch. The patch (to be attached to this bug), makes the following changes:

- Extends ByteVector and String to support floats. ReplayGain values are
single-precision floats and the various music formats store these either
as strings or IEEE 754 binary representations.

- Tests for the new ByteVector and String functionality.

- Extends the Tag interface with ReplayGain getters and setters. The base
Tag implementation takes these new methods into account when testing for
emptyness and duplication.

- Adds implementations for the new interface methods to the various Tag
implementations:

- Ogg::XhiphComment reads and writes the VorbisGain tags (including the
old style tags).

- MPEG::Tag defers to ID3v2 for now (see below)

- Flac::Tag tries first the Xhiph, then ID3v2 information.

- Adds a dummy (NOOP) implementation of the new methods to ID3v1::Tag;
ID3v1 doesn't support ReplayGain information.

- Adds a new ID3v2 frame class, ReplayGainFrame, which implements to
proposed frame from the ReplayGain proposal. The ID3v2::Tag class refers
to this frame when getting/setting ReplayGain values. It? ID3v2 frameID
is "RGAD".

- Extends the ID3v2 FrameFactory to create ReplayGainFrames.

- Adds minimal flags support to Frame::Header; they now preserve the flags
in ID3v2.3 and 2.4 frame headers as an opaque 2-byte ByteVector, and one
can get and set this value. This allows the ReplayGainFrame class to set
the correct flags for the frame.

- Updates the C binding to the new Tag interface.

- Extends the examples to support ReplayGain information.

What remains to be done? Ogg and FLAC support are complete. Only the MP3 support needs more work:

- The RGAD frame is only described in the (fairly old) ReplayGain
proposal; I have not found any tools generating these frames so far.
Another disadvantage is that these frames don't store the album peak
amplitude.

- I'd prefer to use the ID3v2 RVA2 frames instead; but again, no tools
generate these frames yet. RVA2 frames can support the exact same
information. There is a bug in the XMMS bugzilla that proposes the exact
same frame for the information, for example.

See http://bugs.xmms.org/show_bug.cgi?id=1023

I am trying to get the opinion of the ReplayGain tool comunity on this.

- Lame puts replaygain information in the lame header (an extension to the
Xing header). This is read-only information, but I'd like to support
this anyway as a fall-back. Lame inserts this info unless told not to;
there are MP3 players that support this information. This header does
not include the album peak either.

- MP3Gain puts the information in an APE2 header (goody, another MP3
metadata format). This'd require solving bug #84666 first; I may have a
stab at this myself; possibly read-only.

Once these issues are resolved, the searching order would be:

1. RVA2 frames
2. APE2 frames
3. RGAD frames
4. Lame header

Writing would be to RVA2, possibly to APE2 and RGAD (latter is implemented already).

Related bugs: #56383, #78444 and #72558

Comment 1 Martijn Pieters 2004-07-14 00:52:59 UTC

Created attachment 6669 [details]
Proposed patch adding ReplayGain support

Comment 2 Martijn Pieters 2004-07-14 00:54:31 UTC

Hmmm, forgot to say the proper magic word for the related bugs.

Related bugs: bug #56383, bug #78444 and bug #72558.

Comment 3 Martijn Pieters 2004-07-14 12:32:35 UTC

Comment on attachment 6669 [details]
Proposed patch adding ReplayGain support

Oops, this patch contains small doxygen errors (how the hell did I ever think
the | pipe symbol was supposed to work?)

Comment 4 Martijn Pieters 2004-07-14 12:34:20 UTC

Created attachment 6674 [details]
Updated patch with doxygen problems fixed.

Only changes are doxygen (comment) changes; use backslash instead of a pipe
symbol to mark arguments, and remove an improper \overload argument.

Comment 5 Martijn Pieters 2004-07-18 22:58:11 UTC

Comment on attachment 6674 [details]
Updated patch with doxygen problems fixed.

The replaygain data format doesn't handle negatives correctly; no support for
the new MPC code. New patch underway.

Comment 6 Martijn Pieters 2004-07-18 23:06:37 UTC

Created attachment 6721 [details]
Updated patch with replaygain data format changes and dummied-out MPC APE support.

Now the ReplayGain data format is split out to a seperate class, which can be
reused by the Lame MP3 info tag support (data format is the same). 

Also, since MPC has just been added, the patch now includes dummied out support
for the APE::Tag class, which I am looking at to reuse for MP3. (see also my
comments on that code at http://lists.kde.org/?l=kde-cvs&m=109018198919187).

TODO: Lame support (now much easier), APEv2 support (possibly easier), MPC
support (I believe the format has it's own ReplayGain data format in the
header).

Comment 7 Samuel Krempp 2004-07-20 15:26:59 UTC

Great work !

I'm also lobbying for replaygain (including mp3 replaygain) support by linux tools, and TagLib is where it can start to happen.
As wheeler said (http://bugs.kde.org/show_bug.cgi?id=85551), it seems any change to the global Tag interface will have to wait till TagLib 2.0.

It leaves some time to refine the implementation of replaygain support.

I have a few comments on your patch :

1. why floats instead of double ? I have learnt that double should be used *unless* you have a reason for float, not the other way round.
In fact, float might not be precise enough for conveying a 5-digit replaygain string, and it is not unconceivable to find such tags (I once read some people want super precise gain tags)

2. Returning 0.0 in the absence of replaygain tag is not a good solution.
there is a small, but non-null chance that a track's replaygain could be exactly "0.00 dB" (with 2 decimal precision, it can happen)
Players should handle tracks with no replaygain by applying (possibly user-supplied) default gain, and good default values are more like "-10.00 dB" than 0.00. (because most tracks replaygains lie in [-11, -3], and I'd rather hear the non-replaygained tracks muffled than deafening-loud, so the default value should be picked in the upper range of usual replaygain values)

So it is important to tell the difference between "no replaygain tag" and "0.00 dB" replaygain tag.

Either pick a really impossible replaygain value (-infty or such..), or supply additional functions to query the presence of each replaygain tag (IMO the best solution)

3. RVA2 or RGAD or APEv2 : I've followed several discussions about replaygain on hydrogenaudio.org, where authors of mp3gain, of the replaygain system, of lame, of foobar2000, and some vorbis developpers come to chat, and it seems they all agree on APEv2 tags. IMO, that's because it's more flexible, and homogeneous with vorbis comments, but they might have different reasons for their opinions.
Anyway, you can ask for their advise on the subject, they're almost the only replaygain-with-mp3 tools authors, so their consensus is somewhat a de-facto standard.

Comment 8 Martijn Pieters 2004-07-20 22:34:54 UTC

> 1. why floats instead of double ?  I have learnt that double should be
> used *unless* you have a reason for float, not the other way round.  In
> fact, float might not be precise enough for conveying a 5-digit replaygain
> string, and it is not unconceivable to find such tags (I once read some
> people want super precise gain tags)

A single-precision float mantissa can handle up to 23 bits precision,
which is more than plenty of precision for gain adjustments; the human ear
can only discern volume differences of about 0.5dB, as I understand. Also
note that:

- VorbisGain and MP3gain both use single-precision floats to calculate the
  gain, so they certainly wont produce more precision. VorbisGain then only
  stores two digits precision, probably because of the 0.5dB human
  resolution.

- The RVA2 tag only stores 16 bits of information (signed integer, to be
  divided by 512). The RGAD frame (original ReplayGain specification) and
  Lame headers store only 1 digit precision; 9 bits of unsinged integer to
  be divided by 10 when reading.

> 2. Returning 0.0 in the absence of replaygain tag is not a good solution.
> there is a small, but non-null chance that a track's replaygain could be
> exactly "0.00 dB" (with 2 decimal precision, it can happen) Players should
> handle tracks with no replaygain by applying (possibly user-supplied)
> default gain, and good default values are more like "-10.00 dB" than 0.00.
> (because most tracks replaygains lie in [-11, -3], and I'd rather hear the
> non-replaygained tracks muffled than deafening-loud, so the default value
> should be picked in the upper range of usual replaygain values)
>
> So it is important to tell the difference between "no replaygain tag" and
> "0.00 dB" replaygain tag.
>
> Either pick a really impossible replaygain value (-infty or such..), or
> supply additional functions to query the presence of each replaygain tag
> (IMO the best solution)

Good point; VorbisGain uses -1000. for 'no gain' and -1. for 'no peak'. I
maybe have to adopt these, or use -inf or NaN. I see no elegant way of using
anything but a floating point number. I must say that using NaN has my
preference, if I can figure out how to state it as a literal.

> 3. RVA2 or RGAD or APEv2  :  I've followed several discussions about
> replaygain on hydrogenaudio.org, where authors of mp3gain, of the
> replaygain system, of lame, of foobar2000, and some vorbis developpers
> come to chat, and it seems they all agree on APEv2 tags. IMO, that's
> because it's more flexible, and homogeneous with vorbis comments, but they
> might have different reasons for their opinions.  Anyway, you can ask for
> their advise on the subject, they're almost the only  replaygain-with-mp3
> tools authors, so their consensus is somewhat a de-facto standard.

I was planning on asking them. My current plans are:

- Read from APEv2, ID3v2 RVA2, ID3v2 RGAD, Lame (in that order of preference)

- Write to: APEv2, ID3v2 RVA2, ID3v2 RGAD, if any of those are present. If
  none of these are present, add a RVA2 frame.

Implementors can influence that policy by ensuring that certain tag formats
are present in the MPEG file.

Comment 9 Martijn Pieters 2004-07-20 22:49:44 UTC

Hmm, with my tired head I hadn't actually finished my 'Why a float' argument: if the various formats don't support the precision, and the human ear doesn't need the precision, why use a double? The extra information will get lost when written out to a file, and will make no difference anyway.

Comment 10 Martijn Pieters 2004-07-21 14:07:36 UTC

CVS commit by mpieters: 

First cut at ReplayGain support in taglib. THIS IS NOT YET COMPLETE.
See bug #85134; development will now continue in this branch instead of
through patches attached to the bug.

CCMAIL: 85134@bugs.kde.org


  A            mpeg/replaygaindataformat.cpp   1.1.2.1 [LGPL (v2.1)]
  A            mpeg/replaygaindataformat.h   1.1.2.1 [LGPL (v2.1)]
  A            mpeg/id3v2/frames/replaygainframe.cpp   1.1.2.1 [LGPL (v2.1)]
  A            mpeg/id3v2/frames/replaygainframe.h   1.1.2.1 [LGPL (v2.1)]
  M +17 -1     tag.cpp   1.9.6.1
  M +48 -0     tag.h   1.10.6.1
  M +48 -0     bindings/c/tag_c.cpp   1.8.2.1
  M +45 -0     bindings/c/tag_c.h   1.7.2.1
  M +6 -0      examples/tagreader.cpp   1.7.6.1
  M +4 -0      examples/tagreader_c.c   1.3.6.1 [POSSIBLY UNSAFE: printf]
  M +24 -0     examples/tagwriter.cpp   1.1.6.1
  M +78 -0     flac/flactag.h   1.1.6.1
  M +44 -0     mpc/apetag.cpp   1.4.2.1
  M +8 -0      mpc/apetag.h   1.3.2.1
  M +2 -2      mpeg/Makefile.am   1.6.6.1
  M +78 -0     mpeg/mpegfile.cpp   1.43.2.1
  M +44 -0     mpeg/id3v1/id3v1tag.cpp   1.23.4.1
  M +8 -0      mpeg/id3v1/id3v1tag.h   1.14.4.1
  M +16 -5     mpeg/id3v2/id3v2frame.cpp   1.24.4.1
  M +10 -0     mpeg/id3v2/id3v2frame.h   1.22.4.1
  M +4 -0      mpeg/id3v2/id3v2framefactory.cpp   1.24.2.1
  M +89 -0     mpeg/id3v2/id3v2tag.cpp   1.45.2.1
  M +8 -0      mpeg/id3v2/id3v2tag.h   1.29.2.1
  M +3 -0      mpeg/id3v2/frames/Makefile.am   1.11.2.1
  M +83 -0     ogg/xiphcomment.cpp   1.14.2.1
  M +8 -0      ogg/xiphcomment.h   1.14.2.1
  M +49 -1     tests/toolkit-test.cpp   1.13.2.1
  M +111 -0    toolkit/tbytevector.cpp   1.45.2.1
  M +22 -0     toolkit/tbytevector.h   1.33.2.1
  M +62 -1     toolkit/tstring.cpp   1.46.2.1
  M +14 -0     toolkit/tstring.h   1.36.2.1

Comment 11 Samuel Krempp 2004-09-05 12:38:24 UTC

--------
Hmm, with my tired head I hadn't actually finished my 'Why a float' argument: if the various formats don't support the precision, and the human ear doesn't need the precision, why use a double? The extra information will get lost when written out to a file, and will make no difference anyway.
---------

the string formats do support more precision, eg "-0.12345dB".
Granted, mp3gain only writes 2 digits, but foobar writes more, and a user can very well store his own value if he wants to.

So the situation is :
- use double : no benefit, no drawback
- use float : no benefit, no drawback in usages we can think of.

In such a situation I pick double, as I said it's the usual choice when both can be used with no apparent preference.

Well, the way I see it, the question is not "what's the benefit of double in this case ?" but rather "what's the benefit of float ?"
none. If speed is a matter, getting a double RG value doesnt mean you can't then multiply the PCM samples with float precision..

BTW, in case of 24bit audio (eg hi-quality FLAC), you can't express bit-precise rescale factors with a float. It might not make any audible differences, but it's still better if we can provide it, with no drawback.

Comment 12 Martijn Pieters 2004-09-06 14:30:45 UTC

But there *is* a drawback for using double: not all TagLib formats support that precision. So, upon writing out the value to the files, precision will be lost.

Comment 13 Samuel Krempp 2004-09-12 18:15:24 UTC

I don't see how it's a drawback that double beats the precision possible in some formats. If the internal values are more precise than what will eventually be stored in file, fine, where's the problem ?
If the API uses less precision than needed, it introduces an unjustified limit. If it uses more : it's only a waste of 4 bytes in a function call.
What would be a drawback is using less precision in the API than possible on some formats, not the other way round.

Though, I agree float or double doesnt make any difference in the expected usage.
I just think it would be sad to arbitrarily limit precision at the API level, while someone might need more (the only examples I can see are reaching bit-precise rescaling of 24bits files.. and I don't see why anyone would use that. but there's a lot of audio fanatics out there, with some weird requirements sometimes)
And even if we can't really consider choosing double brings any advantage, it's the type to choose unless there are reasons for single precision.

Since it's still time to change the interface to double, I just want to make sure float won't be graved in stone without even considering double.

Comment 14 Felix Berger 2004-12-20 22:31:31 UTC

In the original bug report you said:

>I'd prefer to use the ID3v2 RVA2 frames instead; but again, no tools 
>generate these frames yet.  RVA2 frames can support the exact same 
>information. There is a bug in the XMMS bugzilla that proposes the exact     >same frame for the information, for example. 
 
This is not true:

normalize (http://www1.cs.columbia.edu/~cvaill/normalize/) writes RVA2 frames
and even provides an xmms plugin that honors these frames and normalizes the files while played.

I'm in the process of providing such a plugin for noatun.

Sincerely,
Felix Berger

Comment 15 Nick Brown 2005-09-21 17:46:34 UTC

*** This bug has been confirmed by popular vote. ***

Comment 16 Robert Feldhoff 2007-02-12 10:20:04 UTC

Is TagLib actually still supported?

Comment 17 Scott Wheeler 2007-02-12 10:33:53 UTC

You mean maintained?  Yes, though somewhat less actively than would be ideal.  ;-)

Comment 18 Jim Martin 2008-06-30 09:24:55 UTC

Whatever happened with ReplayGain support?  Was it abandoned?  Or is there somewhere in the SVN tree that I can find the last work that was done on it?

Comment 19 Moritz Moeller-Herrmann 2009-07-27 18:47:27 UTC

Amarok 2.1 now has built-in replaygain support - I do not know whether that implies that taglib has been updated? If not, maybe the code can be taken from there?