Discussion:
Help needed - "Storage_Error stack overflow or erroneous memory access"
(too old to reply)
reinert
2017-06-18 10:34:07 UTC
Permalink
Hi there,

I just installed Debian 9 ("stretch") on a computer, and I tried to compile my pretty large (ada) program system on this computer. Then I got the error message:

"6.3.0 20170516 (x86_64-linux-gnu) Storage_Error stack overflow or erroneous memory access"

The porgramme compiles without problems on another computer (with debian 8.8, "jessie" and on raspberry pi).

Has anybody a preliminary hint that can help me sorting out my problem on "stretch"? Since the program is large, it may take a while to make a short test program to reproduce the error. More details below.

reinert


-----------------Here is copy of output from the the command "gprbuild":

gprbuild gives:

using project file c0.gpr
gnatgcc -c -O c0.adb
+===========================GNAT BUG DETECTED==============================+
| 6.3.0 20170516 (x86_64-linux-gnu) Storage_Error stack overflow or erroneous memory access|
| Error detected at c0.adb:268:7 |
| Please submit a bug report; see http://gcc.gnu.org/bugs.html. |
| Use a subject line meaningful to you and us to track the bug. |
| Include the entire contents of this bug box in the report. |
| Include the exact command that you entered. |
| Also include sources listed below. |
+==========================================================================+

Please include these source files with error report
Note that list may not be accurate in some cases,
so please double check that the problem can still
be reproduced with the set of files listed.
Consider also -gnatd.n switch (see debug.adb).
Simon Clubley
2017-06-18 10:41:48 UTC
Permalink
Post by reinert
Hi there,
I just installed Debian 9 ("stretch") on a computer, and I tried to compile
my pretty large (ada) program system on this computer. Then I got the error
"6.3.0 20170516 (x86_64-linux-gnu) Storage_Error stack overflow or erroneous memory access"
The porgramme compiles without problems on another computer (with debian 8.8,
"jessie" and on raspberry pi).
Has anybody a preliminary hint that can help me sorting out my problem on
"stretch"? Since the program is large, it may take a while to make a short
test program to reproduce the error. More details below.
How do the ulimit values compare on both systems ?

(Use "ulimit -a" to see all the limits.)

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
reinert
2017-06-18 12:00:58 UTC
Permalink
Here below are results from "ulimit -a" for both computers:

On the (problematic) "stretch computer":

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 28417
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 28417
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

On the "jessie" computer (where tings works):

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31579
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 31579
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Egil H H
2017-06-18 12:24:51 UTC
Permalink
Post by reinert
open files (-n) 1024
open files (-n) 65536
I guess this is your culprit
Simon Clubley
2017-06-18 14:58:17 UTC
Permalink
Post by Egil H H
Post by reinert
open files (-n) 1024
open files (-n) 65536
I guess this is your culprit
I would have personally expected to see a different error message if
the process ran out of available channels, however it is certainly
worth trying to increase the limit to see if it fixes the problem.

We can also see from the ulimit output that there is no specific
stack limit set.

To the OP, can you post the section of code this is failing on and
then identify which line in that specific section of code is line 268 ?

Does changing (for example) the optimisation options cause the failing
line number to change ?

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
reinert
2017-06-18 15:44:57 UTC
Permalink
I tried different optimization, but it did not affect the line number given by the error message. However, if I just comment out this part of the code, the same error message comes pointing on different line numbers.

But: on the "jessie" machine (without the problem), I can execute
"ulimit -n 65536", but trying the same on the "stretch machine" (with the problem), I get this response:
------------------------------------------------------------------------
ulimit -n 65536
bash: ulimit: open files: cannot modify limit: Operation not permitted
-------------------------------------------------------------------------

"ulimit -n 100" function (gives no error message), but even "ulimit -n 200"
gives the same error message as above "bash: ulimit: open files: cannot modify limit: Operation not permitted".

Strange?


The actual code looks like this (but the details here seems not to be significant):

procedure leap_to_cell1 (id : target_t; n : Positive) is
cos : cell_observation_set.set := to_set ((id => id, others => <>));
begin
cos := csv (n).cell_observations1 and cos;
if cos.is_empty then
return;
end if;
cell_scene_cursor := csv.to_cursor (n);
cell_scene_ptr := csv (cell_scene_cursor);
c0.co := cos.first_element;
r_cursor0 := c0.co.r;
cursor0 := True;
ds_prev0 := element (cell_scene_cursor).su;
leap_time1 := Clock;
df.center0 (r_cursor0 - csv (cell_scene_cursor).su); -- <-- line 268
df.center0 (True);
return;
end leap_to_cell1;

reinert
reinert
2017-06-18 18:04:26 UTC
Permalink
Forgot to say that I changed the hard and soft number of file limits (in the stretch machine) to conform with the "jessie" machine, so now "ulimit -a" gives this (on the "stretch machine"):

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 28417
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 28417
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

But the problem persists... :-(
reinert
2017-06-19 04:53:27 UTC
Permalink
OK, for those having interest:

I have somehow sorted out the problem.
The lesson learned is that I may have done something wrong when using "with invariant .." and the compiler error message could be more helpful for programmers like me :-)

Here is a test program illustrating. It compiles without error on my debian 8.8 (jessie) machine, but the debian 9 (stretch) gives a cryptic error message.
Could anybody confirm? And maybe elaborate?

with frame1; use frame1;
procedure c0 is
df : df_t;
begin
df.center0 (True);
end c0;

-- package:

with Ada.Numerics.Generic_Real_Arrays;

package frame1 is

subtype real is float;

package gra is new Ada.Numerics.Generic_Real_Arrays (real);
use gra;

type frame_t is tagged private;

procedure center0 (frame : in out frame_t; b : Boolean);

type df_t is new frame_t with private;

private

type frame_t is tagged record
a, b : Boolean := False;
r_a, r_b : real_vector(1..2);
set_modus : Boolean := False;
dynamic_frame1, dynamic_frame2 : real_vector (1 .. 2);
center0 : Boolean := False;
active1 : Boolean := False;
w1,w2 : real_vector(1..2);
end record with
type_invariant =>
((if
frame_t.set_modus
then
frame_t.dynamic_frame1 (1) <= frame_t.dynamic_frame2 (1) and
frame_t.dynamic_frame1 (2) >= frame_t.dynamic_frame2 (2)));

type df_t is new frame_t with record
focus1 : Boolean := False;
end record;

end frame1;

package body frame1 is

procedure center0 (frame : in out frame_t; b : Boolean) is
begin
frame.center0 := b;
end center0;

end frame1;




~
Simon Wright
2017-06-19 07:03:40 UTC
Permalink
Here, on x86_64-apple-darwin16 (macOS Sierra), I get

FSF GCC 7.1.0: compiles OK.
FSF GCC 8.0.0: compiles OK.
GNAT GPL 2016: compiles OK.
GNAT GPL 2017: compiles OK.

But, with FSF GCC 6.1.0,

1. with frame1; use frame1;
2. procedure c0 is
3. df : df_t;
4. begin
5. df.center0 (True);
|
"" is undefined
6. end c0;

which is a symptom of an internal compiler error. Perhaps your 6.3.0
problem is related to that.
Simon Clubley
2017-06-19 13:01:11 UTC
Permalink
I certainly do, but ran out of additional ideas to suggest.
Post by reinert
I have somehow sorted out the problem.
The lesson learned is that I may have done something wrong when using "with
invariant .." and the compiler error message could be more helpful for
programmers like me :-)
Actually, what you appear to have found is a compiler bug; compilers
should never crash because of problems in the code they are compiling.

It might be worthwhile reporting it to Adacore.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
reinert
2017-06-19 14:58:11 UTC
Permalink
Post by Simon Clubley
Actually, what you appear to have found is a compiler bug; compilers
should never crash because of problems in the code they are compiling.
It might be worthwhile reporting it to Adacore.
But is seems to be an old (now resolved) bug?

reinert
Simon Clubley
2017-06-19 17:33:21 UTC
Permalink
Post by reinert
Post by Simon Clubley
Actually, what you appear to have found is a compiler bug; compilers
should never crash because of problems in the code they are compiling.
It might be worthwhile reporting it to Adacore.
But is seems to be an old (now resolved) bug?
Yes, you are correct. I had forgotten about Simon's message which
mentioned gcc version numbers when I wrote that. Sorry. Glad you
have an answer however.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
Dmitry A. Kazakov
2017-06-18 11:59:44 UTC
Permalink
Post by reinert
I just installed Debian 9 ("stretch") on a computer, and I tried to
compile my pretty large (ada) program system on this computer. Then I
"6.3.0 20170516 (x86_64-linux-gnu) Storage_Error stack overflow or erroneous memory access"
The porgramme compiles without problems on another computer (with debian 8.8, "jessie" and on raspberry pi).
Has anybody a preliminary hint that can help me sorting out my
problem on "stretch"? Since the program is large, it may take a while to make a
short test program to reproduce the error. More details below.
Strange. Normally this happens on 32-bit systems when GNAT runs out of
virtual memory. I get this quite frequently and reorganize packages to
get through. But I never had this on a 64-bit system.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Loading...