SUMMARY *** Import from a CSV file to a spreadsheet produces mis-aligned columns when the CSV file contains double-quoted text containing commas, and quotes are stripped on import, and commas are also used as column delimiters. It appears that double quotes are stripped *before* parsing each row into columns, instead of *after*. *** STEPS TO REPRODUCE 1. Create a CSV file containing double-quoted text columns where the text includes a comma, and commas are also used as column delimiters. 2. Use File/Import to import that CSV file to a spreadsheet, using a custom data format and checking the "Remove quotes" checkbox. 3. OBSERVED RESULT Tokens separated by commas within double-quoted text are put in separate columns, pushing all subsequent columns to the right. EXPECTED RESULT Tokens separated by commas within double-quoted text should all be in the same column. SOFTWARE/OS VERSIONS Linux/KDE Plasma: (available in About System) System: Pop!_OS_22.04 LTS Installed from Ubuntu repository, pulling in all KDE dependencies KDE Plasma Version: KDE Frameworks Version: 5.92.0 Qt Version: 5.15.3 ADDITIONAL INFORMATION
(In reply to Dreas Nielsen from comment #0) > SUMMARY > *** > Import from a CSV file to a spreadsheet produces mis-aligned columns when > the CSV file contains double-quoted text containing commas, and quotes are > stripped on import, and commas are also used as column delimiters. > > It appears that double quotes are stripped *before* parsing each row into > columns, instead of *after*. > *** This was fixed in 2.9. Is it possible for you to update to this release and to try again?
Created attachment 163325 [details] attachment-3918188-0.html v. 2.9 is not in the Ubuntu repository. Is there another place to get a .deb package? By the way, another bug I've encountered occurs when the first column of a CSV file contains a text value, but the column is interpreted as integers--but that's more easily worked around. On Mon, Nov 20 2023 at 09:08:11 PM +0000, Alexander Semke <bugzilla_noreply@kde.org> wrote: > <https://bugs.kde.org/show_bug.cgi?id=477294> > > --- Comment #1 from Alexander Semke <alexander.semke@web.de > <mailto:alexander.semke@web.de>> --- > (In reply to Dreas Nielsen from comment #0) >> SUMMARY >> *** >> Import from a CSV file to a spreadsheet produces mis-aligned >> columns when >> the CSV file contains double-quoted text containing commas, and >> quotes are >> stripped on import, and commas are also used as column delimiters. >> >> It appears that double quotes are stripped *before* parsing each >> row into >> columns, instead of *after*. >> *** > This was fixed in 2.9. Is it possible for you to update to this > release and to > try again? > > -- > You are receiving this mail because: > You reported the bug.
(In reply to Dreas Nielsen from comment #2) > Created attachment 163325 [details] > attachment-3918188-0.html > > v. 2.9 is not in the Ubuntu repository. Is there another place to get > a .deb package? You can use flatpack to get the new version of LabPlot. Also, there're Ubuntu package (as well as flatpacks) available for the current development version. Please check the information on https://labplot.kde.org/download/ to see what is more feasible for you.
Created attachment 163370 [details] attachment-4151682-0.html Thanks. I closed that bug report. I installed the Windows version and evaluated that. The purpose of my evaluation was to determine whether LabData2 is a good tool to provide to a group of scientists and engineers who want to easily visualize data (primarily spatially-explicit data). This group includes some who are comfortable wielding R or Python for data analysis, but also a group who are technically oriented, but will not devote a lot of time to learning new software, so ease of use is extremely important. Tools that they currently have available include Orange (<https://orangedatamining.com/>), GeoDa (<https://geodacenter.github.io/>), KNIME (<https://www.knime.com/>), and mapdata.py (<https://mapdata.readthedocs.io/en/latest/>). LabPlot2 looks promising because it can display data in a spreadsheet (familiar to all potential users) and can potentially assemble multiple plots and text in something that looks like a dashboard. After evaluation, I won't be adding LabPlot2 to our toolbox, for the reasons illustrated by the following comments that I made during my evaluation. These don't necessarily qualify as bugs, but they are usability issues that perhaps you would want to consider. 1. Import of data from a CSV file or spreadsheet should be simpler: a project should be created by default if one does not exist, an import option should be on the right-click context menu for a project, and import to a new spreadsheet should be the default. 2. Data selection is carried out by masking values of particular columns, rather than selecting them. Users may be interested in only a subset of a large data set, and masking everything that is *not* of interest requires them to know everything about the data set that they do not care about. Masking inverts the importance of elements of the data set. 3. When values of a column are masked, they still show up in column statistics. 4. The right-click menu for a column header does not include a 'Mask' (or 'Select') option, which it reasonably ought to--this functionality is on a 'Manipulate data' submenu. Most of the other options on this submenu are greyed out because they cannot be applied to imported data. Data selection options should be more obvious and require fewer clicks. 5. Plotting functionality should be more accessible. When a spreadsheet has been imported and is displayed, there is no 'Plot' menu or button bar option to easily plot data from the spreadsheet. 6. If 'Plot data' is selected from the right-click menu for a column, only that column can be selected as the X and Y variable. It is necessary to click on one column and control-click on a second column to allow an X-Y plot of different variables. This is not obvious and is not documented. 7. When an X-Y plot is made, the points are connected by a line, by default, but the X values are not sorted in order. There is no property or option to allow sorting of X values after the plot is created. If the column of X values in the spreadsheet is sorted after the plot is displayed, the plot is not automatically update, and there is no option associated with the plot that allows it to be refreshed. 8. When plots are created in a new worksheet, both the plot and the worksheet are very small and need to be manually resized to a reasonable size. 9. The axis title can be resized, but the axis labels (i.e., data values) evidently cannot. 10. If multiple columns are selected in the spreadsheet, and a plot then produced, on returning to the spreadsheet the columns are not selected--i.e., selections are not preserved. 11. When a box plot of multiple variables is produced, there are no X-axis labels identifying the data column corresponding to each box-and-whisker figure. 12. When a histogram of a single variable is produced, there is no way to modify the number of bins used. 13. When the 'Sort' option is selected from the right-click menu for a column, or the sorting buttons on the button bar are used, only that column is sorted rather than the entire spreadsheet. This can (almost certainly will) lead to a loss of data integrity. 14. The relationships between folders, workbooks, spreadsheets, matrixes, and worksheets is not obvious from the UI. It is not clear which of them may be required, and how they optionally can be used together for different purposes. The documentation describes then individually but does not illustrate alternative workflows. On Tue, Nov 21 2023 at 07:30:52 AM +0000, Alexander Semke <bugzilla_noreply@kde.org> wrote: > <https://bugs.kde.org/show_bug.cgi?id=477294> > > --- Comment #3 from Alexander Semke <alexander.semke@web.de > <mailto:alexander.semke@web.de>> --- > (In reply to Dreas Nielsen from comment #2) >> Created attachment 163325 [details] >> attachment-3918188-0.html >> >> v. 2.9 is not in the Ubuntu repository. Is there another place to >> get >> a .deb package? > You can use flatpack to get the new version of LabPlot. Also, > there're Ubuntu > package (as well as flatpacks) available for the current development > version. > Please check the information on <https://labplot.kde.org/download/> > to see what > is more feasible for you. > > -- > You are receiving this mail because: > You reported the bug.
(In reply to Dreas Nielsen from comment #4) > Created attachment 163370 [details] > attachment-4151682-0.html > > Thanks. I closed that bug report. > > I installed the Windows version and evaluated that. The purpose of my > evaluation was to determine whether LabData2 is a good tool to provide > to a group of scientists and engineers who want to easily visualize > data (primarily spatially-explicit data). This group includes some who > are comfortable wielding R or Python for data analysis, but also a > group who are technically oriented, but will not devote a lot of time > to learning new software, so ease of use is extremely important. Tools > that they currently have available include Orange > (<https://orangedatamining.com/>), GeoDa > (<https://geodacenter.github.io/>), KNIME (<https://www.knime.com/>), > and mapdata.py (<https://mapdata.readthedocs.io/en/latest/>). LabPlot2 > looks promising because it can display data in a spreadsheet (familiar > to all potential users) and can potentially assemble multiple plots and > text in something that looks like a dashboard. > > After evaluation, I won't be adding LabPlot2 to our toolbox, for the > reasons illustrated by the following comments that I made during my > evaluation. These don't necessarily qualify as bugs, but they are > usability issues that perhaps you would want to consider. Thank you for your feedback. Let me quickly comment on the points your raised. > 1. Import of data from a CSV file or spreadsheet should be simpler: a > project should be created by default if one does not exist, an import > option should be on the right-click context menu for a project, and > import to a new spreadsheet should be the default. In the application settings you can determine what should happen on application start - do nothing, create a new project, create a new project with a spreadsheet, etc. In 2.11 we implemented additionally an option for which notebook to create on startup (Python, R, etc.) > 2. Data selection is carried out by masking values of particular > columns, rather than selecting them. Users may be interested in only a > subset of a large data set, and masking everything that is *not* of > interest requires them to know everything about the data set that they > do not care about. Masking inverts the importance of elements of the > data set. Plotting only parts of the data is something that needs more elaboration on our side for the UX part, the technical implementation is straightworfard. Masking is the only solution right now with the problems you mentioned. We'll definitely address this in future releases. > 3. When values of a column are masked, they still show up in column > statistics. This is a bug that we'll fix for the upcoming release 2.11. > 4. The right-click menu for a column header does not include a 'Mask' > (or 'Select') option, which it reasonably ought to--this functionality > is on a 'Manipulate data' submenu. Most of the other options on this > submenu are greyed out because they cannot be applied to imported data. > Data selection options should be more obvious and require fewer clicks. Manipulation of data is also possible for imported data, of course. The entries are greyed out most probably since you don't have any numeric data in this column. Did you pay attention to the decimal separator settings during the import step so the data is properly imported as numeric and not as text? > 5. Plotting functionality should be more accessible. When a spreadsheet > has been imported and is displayed, there is no 'Plot' menu or button > bar option to easily plot data from the spreadsheet. In the context menu of the spreadsheet columns there is "Plot Data" menu to quickly plot the data, you can check our first tutorial video to see how it works (https://labplot.kde.org/video-tutorials/). > 6. If 'Plot data' is selected from the right-click menu for a column, > only that column can be selected as the X and Y variable. It is > necessary to click on one column and control-click on a second column > to allow an X-Y plot of different variables. This is not obvious and is > not documented. Lack of good and detailed documentation is an issue, yes. If you select one single column and want to plot it on y, LabPlot is trying to find another column in the spreadsheet having the "plot designation" x and to use it as the column for x. If the plot designation is not properly set, this logic fails and you need to select two columns. > 7. When an X-Y plot is made, the points are connected by a line, by > default, but the X values are not sorted in order. There is no property > or option to allow sorting of X values after the plot is created. If > the column of X values in the spreadsheet is sorted after the plot is > displayed, the plot is not automatically update, and there is no option > associated with the plot that allows it to be refreshed. I cannot reproduce this issue - the plot is properly updated if the data is sorted in the spreadsheet. It would be great if you could report this issue in a separate bug ticket and provide the steps for how to reproduce. > 8. When plots are created in a new worksheet, both the plot and the > worksheet are very small and need to be manually resized to a > reasonable size. With the help of "templates" you can have new and "yours" default values for any object properties that you need to modify. > 9. The axis title can be resized, but the axis labels (i.e., data > values) evidently cannot. You can change the font size of the axis value labels by selecting the axis and modifying the size in the tab "Labels" in the properties explorer. > 10. If multiple columns are selected in the spreadsheet, and a plot > then produced, on returning to the spreadsheet the columns are not > selected--i.e., selections are not preserved. This is the correct behavior UX-wise since the previous object doesn't have the focus anymore and there is a new selection in the application. > 11. When a box plot of multiple variables is produced, there are no > X-axis labels identifying the data column corresponding to each > box-and-whisker figure. This is possible by using a customer column for the axis label type with arbitrary text values that are used then for the labels. > 12. When a histogram of a single variable is produced, there is no way > to modify the number of bins used. For this you need to select "By Number" for the binning method and to specify the number of bins you need. > 13. When the 'Sort' option is selected from the right-click menu for a > column, or the sorting buttons on the button bar are used, only that > column is sorted rather than the entire spreadsheet. This can (almost > certainly will) lead to a loss of data integrity. In 2.11 we re-worked the UX so this part is more clear and consistent. Prior to 2.11 you need the sort option from the context menu of the spreadsheet to sort multiple columns together. > 14. The relationships between folders, workbooks, spreadsheets, > matrixes, and worksheets is not obvious from the UI. It is not clear > which of them may be required, and how they optionally can be used > together for different purposes. The documentation describes then > individually but does not illustrate alternative workflows. Spreadsheet and Matrix are used for different data layouts. Workbook can be seen as a parent object that can hold multiple such "sheets" together, similar to Excel, etc. The documentation should be improved, yes. Feel free to reach out to us via the support email if more clarification or help is needed.