Bug 501831 - Histogram cuts off last sample when using automatic bin limits
Summary: Histogram cuts off last sample when using automatic bin limits
Status: RESOLVED INTENTIONAL
Alias: None
Product: LabPlot2
Classification: Applications
Component: general (show other bugs)
Version: 2.11.1
Platform: Flatpak Linux
: NOR normal
Target Milestone: ---
Assignee: Alexander Semke
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-21 15:00 UTC by realkpavel
Modified: 2025-03-23 12:46 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
File where the histogram problem occurs (462.12 KB, application/x-xz)
2025-03-23 09:11 UTC, realkpavel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description realkpavel 2025-03-21 15:00:01 UTC
Histogram cuts off last sample when using automatic bin limits


STEPS TO REPRODUCE
1. Create histogram
2. turn on automatic binning
3. turn off automatic binning, increase the upper limit
4. an extra sample is displayed

Operating System: TUXEDO OS 
KDE Plasma Version: 6.3.2
KDE Frameworks Version: 6.11.0
Qt Version: 6.8.2
Kernel Version: 6.11.0-109019-tuxedo (64-bit)
Graphics Platform: Wayland
Processors: 8 × AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx
Memory: 13.5 GiB of RAM
Graphics Processor: AMD Radeon Vega 10 Graphics
Manufacturer: LENOVO
Product Name: 20NKS52L02
System Version: ThinkPad T495
Comment 1 Alexander Semke 2025-03-23 08:54:38 UTC
(In reply to realkpavel from comment #0)
> Histogram cuts off last sample when using automatic bin limits
> 
> 
> STEPS TO REPRODUCE
> 1. Create histogram
> 2. turn on automatic binning
> 3. turn off automatic binning, increase the upper limit
> 4. an extra sample is displayed

Can you please attach the project file or the data that you used for this histogram?
Comment 2 realkpavel 2025-03-23 09:11:46 UTC
Created attachment 179661 [details]
File where the histogram problem occurs

Hello,
attaching the file where I noticed this. Hope this helps. If I can help any other way, please let me know.
Comment 3 realkpavel 2025-03-23 09:12:09 UTC
(In reply to Alexander Semke from comment #1)
> (In reply to realkpavel from comment #0)
> > Histogram cuts off last sample when using automatic bin limits
> > 
> > 
> > STEPS TO REPRODUCE
> > 1. Create histogram
> > 2. turn on automatic binning
> > 3. turn off automatic binning, increase the upper limit
> > 4. an extra sample is displayed
> 
> Can you please attach the project file or the data that you used for this
> histogram?

attached above
Comment 4 Alexander Semke 2025-03-23 09:37:26 UTC
(In reply to realkpavel from comment #2)
> Created attachment 179661 [details]
> File where the histogram problem occurs
> 
> Hello,
> attaching the file where I noticed this. Hope this helps. If I can help any
> other way, please let me know.

Thank you! 

Which data did you use to reproduce this problem? I'm looking now at Rsheet from the LongStructures/Resistance spreadsheet the maximum is at 30.6849 here. To what value did you increase the maximum to see this problem?
Comment 5 realkpavel 2025-03-23 09:52:32 UTC
(In reply to Alexander Semke from comment #4)
> (In reply to realkpavel from comment #2)
> > Created attachment 179661 [details]
> > File where the histogram problem occurs
> > 
> > Hello,
> > attaching the file where I noticed this. Hope this helps. If I can help any
> > other way, please let me know.
> 
> Thank you! 
> 
> Which data did you use to reproduce this problem? I'm looking now at Rsheet
> from the LongStructures/Resistance spreadsheet the maximum is at 30.6849
> here. To what value did you increase the maximum to see this problem?

All three histograms show this. In the one labeled "Resistance", the transition is at the high end (28.972 -> 28.973), In Rsheet it is 30.6849 -> 30.6850,  in Rcontact it is 1.08622 -> 1.08623. In addition, changing the bin limits of Rcontact LongStructures/Worksheet breaks the binning change behavior of Rcontact (if you change the limits of Rsheet, then change the limits of Rcontact, the graph of Rcontact sometimes disappears)
Comment 6 realkpavel 2025-03-23 09:53:17 UTC
(In reply to Alexander Semke from comment #4)
> (In reply to realkpavel from comment #2)
> > Created attachment 179661 [details]
> > File where the histogram problem occurs
> > 
> > Hello,
> > attaching the file where I noticed this. Hope this helps. If I can help any
> > other way, please let me know.
> 
> Thank you! 
> 
> Which data did you use to reproduce this problem? I'm looking now at Rsheet
> from the LongStructures/Resistance spreadsheet the maximum is at 30.6849
> here. To what value did you increase the maximum to see this problem?

All histograms are supposed to have 8 samples.
Comment 7 Alexander Semke 2025-03-23 11:08:26 UTC
(In reply to realkpavel from comment #5)
> (In reply to Alexander Semke from comment #4)
> > (In reply to realkpavel from comment #2)
> > > Created attachment 179661 [details]
> > > File where the histogram problem occurs
> > > 
> > > Hello,
> > > attaching the file where I noticed this. Hope this helps. If I can help any
> > > other way, please let me know.
> > 
> > Thank you! 
> > 
> > Which data did you use to reproduce this problem? I'm looking now at Rsheet
> > from the LongStructures/Resistance spreadsheet the maximum is at 30.6849
> > here. To what value did you increase the maximum to see this problem?
> 
> All three histograms show this. In the one labeled "Resistance", the
> transition is at the high end (28.972 -> 28.973), In Rsheet it is 30.6849 ->
> 30.6850,  in Rcontact it is 1.08622 -> 1.08623. In addition, changing the
> bin limits of Rcontact LongStructures/Worksheet breaks the binning change
> behavior of Rcontact (if you change the limits of Rsheet, then change the
> limits of Rcontact, the graph of Rcontact sometimes disappears)

ok, I see it now. The behavior is correct since the convention used in GSL and also in LabPlot is to define the last bin with strict < and not with <=.

From GSL's documentation in https://www.gnu.org/software/gsl/doc/html/histogram.html:

"Thus any samples which fall on the upper end of the histogram are excluded. If you want to include this value for the last bin you will need to add an extra bin to your histogram."

We'll add more tooltip texts in LabPlot in this area to explain this behavior and to also properly document it in the documentation on https://docs.labplot.org/

Can you please check and confirm this behavior on your side, too?
Comment 8 realkpavel 2025-03-23 11:26:24 UTC
(In reply to Alexander Semke from comment #7)
> (In reply to realkpavel from comment #5)
> > (In reply to Alexander Semke from comment #4)
> > > (In reply to realkpavel from comment #2)
> > > > Created attachment 179661 [details]
> > > > File where the histogram problem occurs
> > > > 
> > > > Hello,
> > > > attaching the file where I noticed this. Hope this helps. If I can help any
> > > > other way, please let me know.
> > > 
> > > Thank you! 
> > > 
> > > Which data did you use to reproduce this problem? I'm looking now at Rsheet
> > > from the LongStructures/Resistance spreadsheet the maximum is at 30.6849
> > > here. To what value did you increase the maximum to see this problem?
> > 
> > All three histograms show this. In the one labeled "Resistance", the
> > transition is at the high end (28.972 -> 28.973), In Rsheet it is 30.6849 ->
> > 30.6850,  in Rcontact it is 1.08622 -> 1.08623. In addition, changing the
> > bin limits of Rcontact LongStructures/Worksheet breaks the binning change
> > behavior of Rcontact (if you change the limits of Rsheet, then change the
> > limits of Rcontact, the graph of Rcontact sometimes disappears)
> 
> ok, I see it now. The behavior is correct since the convention used in GSL
> and also in LabPlot is to define the last bin with strict < and not with <=.
> 
> From GSL's documentation in
> https://www.gnu.org/software/gsl/doc/html/histogram.html:
> 
> "Thus any samples which fall on the upper end of the histogram are excluded.
> If you want to include this value for the last bin you will need to add an
> extra bin to your histogram."
> 
> We'll add more tooltip texts in LabPlot in this area to explain this
> behavior and to also properly document it in the documentation on
> https://docs.labplot.org/
> 
> Can you please check and confirm this behavior on your side, too?

Yes, that is the behavior I am seeing. A tooltip would certainly help.

I understand the convention, it makes the bin widths the same mathematically, but I would disaggre with it if I was designing the system  (since the default behavior guarantees one sample will be hidden, which will skew the data a lot in case of low sample numbers like here). But that is of course not my choice to make. Thank you for the work you're putting into this.
Comment 9 Alexander Semke 2025-03-23 12:46:54 UTC
(In reply to realkpavel from comment #8)
> (In reply to Alexander Semke from comment #7)
> > (In reply to realkpavel from comment #5)
> > > (In reply to Alexander Semke from comment #4)
> > > > (In reply to realkpavel from comment #2)
> > > > > Created attachment 179661 [details]
> > > > > File where the histogram problem occurs
> > > > > 
> > > > > Hello,
> > > > > attaching the file where I noticed this. Hope this helps. If I can help any
> > > > > other way, please let me know.
> > > > 
> > > > Thank you! 
> > > > 
> > > > Which data did you use to reproduce this problem? I'm looking now at Rsheet
> > > > from the LongStructures/Resistance spreadsheet the maximum is at 30.6849
> > > > here. To what value did you increase the maximum to see this problem?
> > > 
> > > All three histograms show this. In the one labeled "Resistance", the
> > > transition is at the high end (28.972 -> 28.973), In Rsheet it is 30.6849 ->
> > > 30.6850,  in Rcontact it is 1.08622 -> 1.08623. In addition, changing the
> > > bin limits of Rcontact LongStructures/Worksheet breaks the binning change
> > > behavior of Rcontact (if you change the limits of Rsheet, then change the
> > > limits of Rcontact, the graph of Rcontact sometimes disappears)
> > 
> > ok, I see it now. The behavior is correct since the convention used in GSL
> > and also in LabPlot is to define the last bin with strict < and not with <=.
> > 
> > From GSL's documentation in
> > https://www.gnu.org/software/gsl/doc/html/histogram.html:
> > 
> > "Thus any samples which fall on the upper end of the histogram are excluded.
> > If you want to include this value for the last bin you will need to add an
> > extra bin to your histogram."
> > 
> > We'll add more tooltip texts in LabPlot in this area to explain this
> > behavior and to also properly document it in the documentation on
> > https://docs.labplot.org/
> > 
> > Can you please check and confirm this behavior on your side, too?
> 
> Yes, that is the behavior I am seeing. A tooltip would certainly help.
> 
> I understand the convention, it makes the bin widths the same
> mathematically, but I would disaggre with it if I was designing the system 
> (since the default behavior guarantees one sample will be hidden, which will
> skew the data a lot in case of low sample numbers like here). But that is of
> course not my choice to make. Thank you for the work you're putting into
> this.

Yes, there are reasons for this convention and it's also used in other applications. We documented this topic in https://invent.kde.org/education/labplot/-/issues/815 and also ideas for how to improve here in future, just in case you're interested.

 I added now more explanations in the tooltip texts in https://invent.kde.org/education/labplot/-/commit/bad05b36358dcf287e19528eaf3b7c6647766049. This will be part of the next release 2.12 that I hope we can do soon. 

Thank you again for the confirmation and for raising our attention to this topic again!