Skip to content

Sketch halts when using std::string and map #8575

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Open
fabianoriccardi opened this issue May 18, 2022 · 18 comments
Open

Sketch halts when using std::string and map #8575

fabianoriccardi opened this issue May 18, 2022 · 18 comments

Comments

@fabianoriccardi
Copy link

fabianoriccardi commented May 18, 2022

Basic Infos

  • [ x] This issue complies with the issue POLICY doc.
  • [ x] I have read the documentation at readthedocs and the issue is not addressed there.
  • [x ] I have tested that the issue is present in current master branch (aka latest git).
  • [ x] I have searched the issue tracker for a similar issue.
  • [x ] If there is a stack dump, I have decoded it.

Platform

  • Hardware: ESP-12
  • Core Version: 8f71d2c
  • Development Env: Arduino IDE
  • Operating System: Windows

Description

I was migrating a project to esp8266-core 3.0.x when I started to hit weird execution issues (i.e. the execution suddenly halts until the watchdog triggers to restart the esp8266).

After very frustrating moments, I isolated the issue to the following MCVE, which basically it doesn't print the message ciao on the serial port. The same behavior can be observed on all the versions 3.0.x and on the latest git version (8f71d2c).

After looking at possible related issues, I found #8314, but I don't know if they have a solution for that issue...

This code works with v2.7.4 and previous versions.

If you reduce the size of the hashmap by removing, for example, 15 pairs, the Serial.print() works.

#include <unordered_map>
#include <string>
#include <Arduino.h>

void setup() {
  Serial.begin(115200);
  Serial.println("Ciao");
}

# note that it is never used
std::unordered_map<std::string, unsigned short> noteMapping{
  {"SILENCE", 0   },
  { "B0",     31  },
  { "C1",     33  },
  { "CS1",    35  },
  { "D1",     37  },
  { "DS1",    39  },
  { "D1",     37  },
  { "DS1",    39  },
  { "E1",     41  },
  { "F1",     44  },
  { "FS1",    46  },
  { "G1",     49  },
  { "GS1",    52  },
  { "A1",     55  },
  { "AS1",    58  },
  { "B1",     62  },
  { "C2",     65  },
  { "CS2",    69  },
  { "D2",     73  },
  { "DS2",    78  },
  { "E2",     82  },
  { "F2",     87  },
  { "FS2",    93  },
  { "G2",     98  },
  { "GS2",    104 },
  { "A2",     110 },
  { "AS2",    117 },
  { "B2",     123 },
  { "C3",     131 },
  { "CS3",    139 },
  { "D3",     147 },
  { "DS3",    156 },
  { "E3",     165 },
  { "F3",     175 },
  { "FS3",    185 },
  { "G3",     196 },
  { "GS3",    208 },
  { "A3",     220 },
  { "AS3",    233 },
  { "B3",     247 },
  { "C4",     262 },
  { "CS4",    277 },
  { "D4",     294 },
  { "DS4",    311 },
  { "E4",     330 },
  { "F4",     349 },
  { "FS4",    370 },
  { "G4",     392 },
  { "GS4",    415 },
  { "A4",     440 },
  { "AS4",    466 },
  { "B4",     494 },
  { "C5",     523 },
  { "CS5",    554 },
  { "D5",     587 },
  { "DS5",    622 },
  { "E5",     659 },
  { "F5",     698 },
  { "FS5",    740 },
  { "G5",     784 },
  { "GS5",    831 },
  { "A5",     880 },
  { "AS5",    932 },
  { "B5",     988 },
  { "C6",     1047},
  { "CS6",    1109},
  { "D6",     1175},
  { "DS6",    1245},
  { "E6",     1319},
  { "F6",     1397},
  { "FS6",    1480},
  { "G6",     1568},
  { "GS6",    1661},
  { "A6",     1760},
  { "AS6",    1865},
  { "B6",     1976},
  { "C7",     2093},
  { "CS7",    2217},
  { "D7",     2349},
  { "DS7",    2489},
  { "E7",     2637},
  { "F7",     2794},
  { "FS7",    2960},
  { "G7",     3136},
  { "GS7",    3322},
  { "A7",     3520},
  { "AS7",    3729},
  { "B7",     3951},
  { "C8",     4186},
  { "CS8",    4435},
  { "D8",     4699},
  { "DS8",    4978}
};

void loop() {}
@mcspr
Copy link
Collaborator

mcspr commented May 18, 2022

Does it work after adding disable_extra4k_at_link_time(); in the setup();?

@fabianoriccardi
Copy link
Author

Yes it works! I didn't hear about this function!

@fabianoriccardi
Copy link
Author

fabianoriccardi commented May 18, 2022

I see that disable_extra4k_at_link_time() was introduced after the release of v3.0.2. Is it a workaround or will it be remain in public api?

Is it related to the linked issue? May you tell me which is the cause of this issue?

@mcspr
Copy link
Collaborator

mcspr commented May 18, 2022

cc @mhightower83 may have a better explanation

it is a pretty old api, not something new to the 3.x.x release.
edit: we allocate 4kib structure for our 'user task' stack, either in system stack (default) or via static variable (no-extra4k). so something goes wrong there, i'd guess, since the issue only manifests when the map grows

only fact I am sure of atm, we grab 5KiB 6872 bytes of heap to store strings and storage used by the unordered map when we init global ctors.
(which btw may be suboptimal container(s) for what you are trying to do)

@fabianoriccardi
Copy link
Author

it is a pretty old api, not something new to the 3.x.x release.

Yes, I was wrong, there where few modifications since v3.0.2 around this function but it is pretty old, sorry.

However, this piece of code is part of a library, I don't like to oblige the user to add this function to their setup(), so I would exclude the usage of this function.

may be suboptimal container(s) for what you are trying to do

I know, but it works fine enough, but if it leads to this behavior I will change for the better.

@mcspr
Copy link
Collaborator

mcspr commented May 18, 2022

Adding it to setup() simply references it for the linker, which is enough to trigger the no-extra4k mode. Your library might as well do that somewhere in it's code, setup() is not requirement for this workaround.

@mcspr
Copy link
Collaborator

mcspr commented May 19, 2022

I know, but it works fine enough, but if it leads to this behavior I will change for the better.

Just consider that you execute a lot of useless initialization code doing it that way. Be it a desktop app, perhaps that would 've been fine :) e.g. compare building this unordered_map example with a local g++ -Os -S and see the resulting .s file

Keeping the existing strings + u16, at least one way is to do this
(pretty sure esp32 also supports c++17 features)

#include <string_view>
#include <array>

using Note = std::pair<std::string_view, unsigned short>;
static constexpr std::array<Note, 92> noteMapping {{
...
}};

Just 92 entries with a linear search instead of hashing should be a reasonable trade-off?
With a further opportunity to use flash to store both strings and the structure itself, reducing RAM impact with a trade-off of writing code to access it in a special way

@mhightower83
Copy link
Contributor

mhightower83 commented May 19, 2022

The origin of disable_extra4k_at_link_time() goes back a while. This is the quickest description I can find:

/*
A bit of explanation for this entry point:
SYS is the SDK task/context used by the upperlying system to run its
administrative tasks (at least WLAN and lwip's receive callbacks and
Ticker). NONOS-SDK is designed to run user's non-threaded code in
another specific task/context with its own stack in BSS.
Some clever fellows found that the SYS stack was a large and quite unused
piece of ram that we could use for the user's stack instead of using user's
main memory, thus saving around 4KB on ram/heap.
A problem arose later, which is that this stack can heavily be used by
the SDK for some features. One of these features is WPS. We still don't
know if other features are using this, or if this memory is going to be
used in future SDK releases.
WPS being flawed by its poor security, or not being used by lots of
users, it has been decided that we are still going to use that memory for
user's stack and disable the use of WPS.
app_entry() jumps to app_entry_custom() defined as "weakref" calling
itself a weak customizable function, allowing to use another one when
this is required (see core_esp8266_app_entry_noextra4k.cpp, used by WPS).
(note: setting app_entry() itself as "weak" is not sufficient and always
ends up with the other "noextra4k" one linked, maybe because it has a
default ENTRY(app_entry) value in linker scripts).
References:
https://github.com/esp8266/Arduino/pull/4553
https://github.com/esp8266/Arduino/pull/4622
https://github.com/esp8266/Arduino/issues/4779
https://github.com/esp8266/Arduino/pull/4889
*/

It is not apparent how reducing the heap and increasing the system stack would help this problem. WPS and WiFi Enterprise are the only features I know of, so far, that need disable_extra4k_at_link_time().

Edit: To be clear, I do see the problem occurring.

@mcspr
Copy link
Collaborator

mcspr commented May 19, 2022

I have not tried anything originally, just a guess about the nature of the used structure and how much memory it grabs from the get-go. Perhaps it is nothing, and totally unrelated to the SYS stack grab.

The question is mainly, what happens on initialization?
With the above, there is a proxy initializer-list with a bunch of pairs that are moved into the mapping.
(ref. disassemble _Z41__static_initialization_and_destruction_0ii$constprop$0, if I understood correctly at what place that hashtable aka unordered_map ctor happens)

However, using a separate const-array and passing begin(), end() to map ctor still works just fine.

@fabianoriccardi
Copy link
Author

Thanks both for the detailed answers and suggestions.

@mcspr about string_view, I would preserve the compatibility with esp8266 <v3.0.0, which use an old version of gcc and it to not support c++17, so no support to string_view. I will use std::array, and maybe, arduino String instead of std::string. Are there any performance advantages between using Arduino String vs std::string?

@mcspr
Copy link
Collaborator

mcspr commented May 20, 2022

@fabianoriccardi probably not. Same problem as before, though. It allocates some structure, somewhere, and it there for the duration of the program copied from char data already loaded in RAM. Simple const char* might suffice.

btw even gcc4.8.2 (from our previous version 2.7.4) will still support a simple wrapper around string literal
https://godbolt.org/z/8dav8P1rc
(although older standard also means there cannot be more than one expression on return)

or, using the same approach as in the linked issue with a template based on size. pretty common as well
https://github.com/ETLCPP/etl/blob/752d9adb5c2978e7e21c39a19d7297dc88f24e5c/include/etl/string_view.h#L811

@jjsuwa-sys3175
Copy link
Contributor

just curious, quick n' dirty trial... the threshold is between "B7" and "C8" (binary-search by hand), and including "C8" goes WDT timeout.
(using x86_64-w64-mingw32.xtensa-lx106-elf-aff7f1b.211127.zip)

@mhightower83
Copy link
Contributor

mhightower83 commented May 20, 2022

My current understanding of stacks sys and cont and neighbors

ETS system data RAM starts at 0x3FFFC000 and is 16K bytes (0x4000). The first 8K, 0x3FFFC000 - 0x3FFFE000, is used by the Boot ROM for data and BSS. This 8K block continues to be used after the system has finished booting. Some Boot ROM Services continue to be used by the NONOS SDK and some user libraries. The remaining 8K, 0x3FFFE000 - 0x40000000 is mostly used for the stacks.

The Boot ROM functions that support WPS and aes_unwrap() have buffers toward the beginning of this address space, 0x3FFFE000. By not using WPS and replacing aes_unwrap() with a version that uses malloc() we have a full 8K byte space to support the system (sys) and user (cont) stack, approximately 4K for each. The system stack is roughly in the 0x3FFFE000 - 0x3FFFF000 address space and the user stack in 0x3FFFF000 - 0x40000000.

To use WPS, the user stack needs to be moved out of this 8K space, allowing the system stack to move up closer to 0x40000000 out of the way of the WPS hardcoded buffers. Placing a call to disable_extra4k_at_link_time() anywhere in your sketch does this. In this configuration, the system stack is about 6K bytes. If you are not using WPS, you get the whole 8K for the system stack. This comes at the expense of 4K of heap space. The user stack is defined in lower user data memory. The heap size is the resulting unused memory below 0x3FFFC000.

When the system stack grows down below 0x3FFFE000, you may see an HWDT or other weird crashes.

When WPS is used without disable_extra4k_at_link_time(), you may see an HWDT or other weird crashes.

Using the 8K system stack model to take stack painting and check measurements without crashing

Assuming a 4K system stack, it looks like when do_global_ctors() is called from init_done(), it has 1376 bytes of the system stack already in use. That leaves only 2720 to handle do_global_ctors(). The full MCVE needs 3324 to initialize. A total of 4700 bytes.

Edited: add clarification

@jjsuwa-sys3175
Copy link
Contributor

interesting information... it feels initializations via global ctors imply potentially dangerous.
can't do_global_ctors()'s stack be assigned to cont one, or to another large enough block?

@mcspr
Copy link
Collaborator

mcspr commented May 21, 2022

Or, if not that, could we implement something similar to stack protection check from gcc applied to global ctors function? At least for error reporting in more apparent manner than just triggering hwdt.

@fabianoriccardi
Copy link
Author

Or, if not that, could we implement something similar to stack protection check from gcc applied to global ctors function? At least for error reporting in more apparent manner than just triggering hwdt.

I think it would be super-useful for any other unlucky guy hitting this sneaky issue!

btw even gcc4.8.2 (from our previous version 2.7.4) will still support a simple wrapper around string literal
https://godbolt.org/z/8dav8P1rc

Thanks for the suggestion, I tested it and it works perfectly on older and newer versions. I will go for it!

@jjsuwa-sys3175
Copy link
Contributor

can't do_global_ctors()'s stack be assigned to cont one, or to another large enough block?

--- cores/esp8266/core_esp8266_main.cpp
+++ cores/esp8266/core_esp8266_main.cpp
@@ -305,9 +305,12 @@ void init_done() {
     system_set_os_print(1);
     gdb_init();
     std::set_terminate(__unhandled_exception_cpp);
-    do_global_ctors();
-    esp_schedule();
+    ESP.resetHeap();
+    cont_run(g_pcont, &do_global_ctors);
     ESP.setDramHeap();
+    if (cont_check(g_pcont) != 0)
+        panic();
+    esp_schedule();
 }
 
 /* This is the entry point of the application.

seems to resolve the problem.
(but other side effects are unknown, any ideas?)

@mhightower83
Copy link
Contributor

mhightower83 commented May 26, 2022

I think there are a few bumps.

At init_done() there is no need for ESP.resetHeap(). There is nothing to restore and would panic when multiple heaps are available. Also, with the call to ESP.resetHeap(), you are calling a class that has not (yet) been initialized.

It is a little strange. When cont_run() returns, the stack we are using may have been corrupted by the call to do_global_ctors(); however, (if?) we do not need any of our local stack content, a direct call to postmortem can be made with enough stack space to process the panic.

Unfortunately, nothing will print. All the GPIO pins were reset during user_init(). The TX pin was turned off.

One workaround would be to add a replacement resetPins()

extern "C" void resetPins() {
  for (int pin = 0; pin <= 16; ++pin) {
    if (1 == pin) {
      continue;
    }

    if (!isFlashInterfacePin(pin))
      pinMode(pin, INPUT);
  }
}

to the sketch to keep the TX pin alive. Note, that the UART speed will default to 115200bps.

As long as the overflow doesn't exceed 8K, a configuration could be defined to detect that initializing global ctors overflowed the stack.

Edited to supply an alternative to get printf working and apply suggested changes (edited again to correct == to != )

void init_done() {
    system_set_os_print(1);
    gdb_init();
    std::set_terminate(__unhandled_exception_cpp);
    cont_run(g_pcont, &do_global_ctors);
    if (cont_check(g_pcont) != 0) {
        pinMode(1, SPECIAL);
        panic();
    }
    esp_schedule();
    ESP.setDramHeap();
}

I am not sure about the adverse side effects either. It is not clear to me how to create a proper solution. I assume it is inappropriate Arduino behavior to force serial on when that is not part of the sketch. Pin 1 could be wired up for some other function.

No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants