Menu

#2042 NA not being ignored with label style

None
closed-fixed
nobody
None
2018-07-02
2018-04-24
No

There seems to be a problem with ignoring NA values in a data file with the newest version of gnuplot. I run the following script (pitimes.plt):

labelstyle = "font \"Arial,10\" center tc 'black'"

set key autotitle columnhead
set xrange[0:11]
set boxwidth 0.9
set style fill solid 0.3 border lc 'red'
set title "Time Comparisions For Implementations"

formatlab(x) = x<60?sprintf("%d",x):(x<3600)?sprintf("%d:%02d",x/60,x%60):sprintf("%d:%02d:%02d",x/3600,(x%3600)/60,x%60)

set xtics rotate by -45 1,1,9
set yrange[0:*]
set ylab "Time (seconds)"
set ytics nomirror
unset my2tics

set y2range[0:5]
set xrange[0:10]
unset key
set y2tics format "%g"
set y2lab "log(Time) (seconds)"
plot "pitime.txt" u ($0+1):3:xtic(1) with boxes lc 'red',\
"pitime.txt" u ($0+1):(log10($3)) with impulses axes x1y2 lw 2 lc 'black',\
"pitime.txt" u ($0+1):(log10($3)+0.5):(formatlab(int($3))) with labels @labelstyle axes x1y2

on the following data file (pitime.txt)

Language    Time20k Time100k
Java        2   45
Kotlin      2   45
Scala       3   71
C       4   103
Javascript  17  412
Groovy      42  1030
Ruby        115 3306    
Python      151 3718
Perl        189 4858
Php     372 9231
R       1322    NA

This works correctly with version 5.0 and produces the following graph

result

However, in version 5.2, it does not produce any output and causes the error message "pitimes.plt", line 24: non-integer operand for % to be displayed (line 24 is the last line of the script). It seems that in version 5.0, when encontering the NA value in the last line of the data file, the line is ignored. However, in version 5.2, it attempts to feed the NA value to the formatlab function creating the error message.

I am using 5.0 patchlevel 6 and 5.2 patchlevel 2 on Windows 7.

3 Attachments

Discussion

  • Matthew Halverson

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -49,3 +49,5 @@
     [[img src=pitimes.png alt=result]]
    
     However, in version 5.2, it does not produce any output and causes the error message **"pitimes.plt", line 24: non-integer operand for %** to be displayed.  It seems that in version 5.0, when encontering the NA value in the last line of the data file, the line is ignored.  However, in version 5.2, it attempts to feed the NA value to the formatlab function creating the error message.
    +
    +I am using 5.0 patchlevel 6 and 5.2 patchlevel 2 on Windows 7.
    
    • Group: -->
    • Priority: -->
     
  • Matthew Halverson

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -48,6 +48,6 @@
    
     [[img src=pitimes.png alt=result]]
    
    -However, in version 5.2, it does not produce any output and causes the error message **"pitimes.plt", line 24: non-integer operand for %** to be displayed.  It seems that in version 5.0, when encontering the NA value in the last line of the data file, the line is ignored.  However, in version 5.2, it attempts to feed the NA value to the formatlab function creating the error message.
    +However, in version 5.2, it does not produce any output and causes the error message **"pitimes.plt", line 24: non-integer operand for %** to be displayed (line 24 is the last line of the script).  It seems that in version 5.0, when encontering the NA value in the last line of the data file, the line is ignored.  However, in version 5.2, it attempts to feed the NA value to the formatlab function creating the error message.
    
     I am using 5.0 patchlevel 6 and 5.2 patchlevel 2 on Windows 7.
    
     
  • Ethan Merritt

    Ethan Merritt - 2018-04-25

    You are seeing a side effect of an intentional change.
    In older versions of gnuplot, processing of the "using" specifier for a line of data would stop as soon as it hit an undefined value. Now the program attempts to evaluate all of the "using" clauses even if one or more of them evaluates to NaN. Unfortunately your command manages to trigger a fatal error while evaluating the label format.

    I will look into how it might be mitigated, but for now I can offer a work-around.

    Since the quantity that is undefined is in column 3, you can explicitly skip the format evaluation in this case:

    plot "" u ($0+1):(log10($3)+0.5):($3==$3 ? formatlab(int($3)) : NaN) \
           with labels @labelstyle axes x1y2, \
    

    the test ($3 == $3) fails if and only if $3 is not-a-number or undefined.

     
    • Matthew Halverson

      Thanks for the suggested workaround, I will apply it to the script for now.

       
  • Ethan Merritt

    Ethan Merritt - 2018-04-25

    The intended mechanism is
    set datafile missing "NA"
    but this does not work in the example provided because the check for missing data is too shallow. This comment at datafile.c:2105 is relevant:

     /* If column N contains the "missing" flag and is referenced by */
    /* using N then we caught it already.  Here we are checking for */
    /* indirect references like using ($N) or using "header_of_N".  */
    /* It does not catch deeper evaluations like using (2*f($N)).   */
    

    The problem is that by the time the program notices that the data field it is processing contains the "missing" flag it is deep inside evaluation of an expression. There is not currently a mechanism for it to bail out of the evaluation early and report the missing field cleanly. Instead it continues evaluation until it either finishes or it hits some more fatal error.

     
  • Ethan Merritt

    Ethan Merritt - 2018-04-29
    • status: open --> pending-wont-fix
     
  • Ethan Merritt

    Ethan Merritt - 2018-04-29

    I see no path to fixing this in the general case that doesn't impose major penalties on the process of expression evaluation. The best I can think of is to document the limitation more clearly and give an example of the workaround.

     
  • Ethan Merritt

    Ethan Merritt - 2018-04-30
    • status: pending-wont-fix --> open
     
  • Ethan Merritt

    Ethan Merritt - 2018-04-30

    ... And then enlightenment hits. I eventually was hit by a concept for a much more powerful way to screen for missing values, so long as the using spec accesses the column as (func($N)) rather than something complicated like (column(func(whatever)).

     
  • Ethan Merritt

    Ethan Merritt - 2018-04-30
    • status: open --> pending-fixed
     
  • Ethan Merritt

    Ethan Merritt - 2018-07-02
    • Status: pending-fixed --> closed-fixed
     

Log in to post a comment.