Skip to content

Speech Note 4.8.0 Beta 3 #223

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Open
mkiol opened this issue Feb 24, 2025 · 19 comments
Open

Speech Note 4.8.0 Beta 3 #223

mkiol opened this issue Feb 24, 2025 · 19 comments

Comments

@mkiol
Copy link
Owner

mkiol commented Feb 24, 2025

If you want to test the upcoming release, Speech Note 4.7.0 Beta is available in "flathub-beta" repository.

This version is perfectly usable, but may contain more bugs.

To enable "flathub-beta" in your system follow this instruction or simply do the following:

flatpak remote-add --if-not-exists flathub-beta https://flathub.org/beta-repo/flathub-beta.flatpakrepo

Changes between 4.7.1 and 4.8.0 Beta 3

  • User Interface
    • Speech Note has been translated into Arabic, Catalan, Spanish and French-Canadian languages.
  • Speech to Text
    • New CrisperWhisper model for FasterWhisper engine. CrisperWhisper is designed for fast, precise, and verbatim speech recognition with accurate word-level timestamps. Unlike the original Whisper, which tends to omit disfluencies and follows more of a intended transcription style, CrisperWhisper aims to transcribe every spoken word exactly as it is, including fillers, pauses, stutters and false starts. CrisperWhisper model is enabled only for English and German languages.
    • New KBLab Whisper models for Swedish. The National Library of Sweden has released fine-tuned STT models trained on its library collections. The models have significantly improved accuracy compared to regular Whisper models.
    • Option to pause listening while processing. This option can be useful when Listening mode is Always on. By default, listening continues even when a piece of audio data is being processed. Using this option, you can temporarily pause listening for the duration of processing.
    • Option to play an audible tone when starting and stopping listening
  • Text to Speech
    • Kokoro TTS engine. Kokoro is a compact yet powerful open-source multilingual TTS engine. Despite its modest size (trained on less than 100 hours of audio), it delivers impressive results. Kokoro voices are enabled for: English, Chinese, Japanese, Hindi, Italian, French, Spanish and Portuguese.
    • F5-TTS engine. The F5-TTS provides exceptional voice cloning capabilities. The currently enabled model works with English and Chinese languages. F5-TTS works best with CUDA acceleration. CPU only processing can be very slow.
    • Parler-TTS engine. Parler-TTS can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). The speaker's characteristics are defined by a text description (prompt). To use Parler-TTS models, you need to configure a Text voice profile. This can be done in the Voice profiles menu. Parler-TTS primarily supports English, but a multilingual model for French, Spanish, Portuguese, Polish, German, Dutch and Italian is also included. Currently, the multilingual model provides rather poor quality and not entirely usable speech. Parler-TTS works best with CUDA acceleration. CPU only processing can be very slow.
    • S.A.M. TTS engine. S.A.M. is a small speech synthesizer designed for the Commodore 64. It features a robotic voice that evokes a strong sense of nostalgia. The S.A.M. voice is available in English only.
    • Normalize audio setting option. Use this option to enable/disable audio volume normalization. The volume is normalized independently for each sentence, which can lead to unstable volume levels in different sentences. Disable this option if you observe this problem.
    • New Piper voices for Dutch, Finnish, German and Luxembourgish
    • New RHVoice voice for Spanish
  • Accessibility (Wayland)
    • Support for Insert into active window under Wayland. Using start-listening-active-window or start-listening-translate-active-window actions you can directly insert the decoded text into any window which is currently in focus. This feature worked under X11 only, but now it is also supported under Wayland. For actions to work, ydotool daemon must be installed and running. If you are using Flatpak, also make sure that the application has permission to access ydotool daemon socket file.
    • Support for Global keyboard shortcuts under Wayland. Global keyboard shortcuts allow you to start or stop listening and reading using keyboard even when the application is not active (e.g. minimized or in the background). Until now, this capability was only available under X11. Now integration with XDG Desktop Portal has been added, making global keyboard shortcuts possible also under Wayland. For shortcuts to work, your desktop environment has to support GlobalShortcuts interface on XDG Desktop Portal service. Right now, GlobalShortcuts is only supported in KDE Plasma.
  • Flatpak
    • Python support enabled in Tiny and ARM packages. Python libraries are not included in Tiny or ARM packages, but using the Location of Python libraries option, you can set an external directory that contains the libraries. Make sure that the Flatpak application has permissions to access this directory.
@phirsch
Copy link

phirsch commented Feb 25, 2025

For me, 4.8.0 Beta 1 breaks global keyboard shortcuts on X11 (Ubuntu 24.04/gnome) - even with ydotoold running.

@mkiol
Copy link
Owner Author

mkiol commented Feb 28, 2025

@phirsch

Thanks for reporting!

X11

For me, 4.8.0 Beta 1 breaks global keyboard shortcuts on X11

Just tested on Ubuntu 24.04 GNOME X11 session and "Global Shortcuts" as well as "Insert into active window" worked without any problem. On X11, ydotool is not used by default, instead "Insert into active window" should work out-of-the-box without any additional program.

Could you attach the Speech Note log after running with the --verbose option?

flatpak run net.mkiol.SpeechNote --verbose

Wayland

In GNOME Wayland session (the default in Ubuntu 24.04), you can't use "Global Shortcuts" because GNOME implemented support for this just 2 days ago! :) "Global Shortcuts" on Wayland currently only work in the latest version of KDE Plasma.

"Insert into active window" on Wayland requires ydotoold, but unfortunately Ubuntu 24.04 provides too old version. For Speech Note to work, you need to install ydotoold from sources or use binaries from Github.

In addition, ydotoold must be run with the GID and UID of a regular user, otherwise Speech Note will not be able to connect due to lack of permissions.

sudo ./ydotoold-release-ubuntu-latest --socket-own="$(id -u):$(id -g)"

Finally, permissions to /tmp must be granted to the Flatpak package because ydotool socket file is /tmp/.ydotool_socket. For example, this can be done with the FlatSeal app:

Image

@nnbyte
Copy link

nnbyte commented Mar 4, 2025

Hi, I successfully tested the following:

Text to Speech

  • S.A.M. TTS engine

Accessibility (Wayland)

  • Support for Insert into active window under Wayland.
  • Support for Global keyboard shortcuts under Wayland.

Great Job!

FYI, tested on
OpenSuse Tumbleweed 20250302
KDE Plasma: 6.3.2
Kernel: 6.13.5-1
Graphics: Wayland

@mkiol
Copy link
Owner Author

mkiol commented Mar 5, 2025

@nnbyte Cool! Thanks for the feedback.

@phirsch
Copy link

phirsch commented Mar 13, 2025

Another observation in this context: In 4.8.0 Beta 1, pressing a modifier key like 'Ctrl' on its own immediately aborts the 'shortcut recording mode' (note: that's just the XCB_KEY_PRESS(2) event on it's own, before releasing the key again).

@phirsch
Copy link

phirsch commented Mar 13, 2025

Sorry for the delay - only just managed to test this again. My shortcut still works with 4.7.0 and fails with 4.8.0 Beta 1.

This snippet of the log looks like it might be relevant:

4.7.0:

[D] 05:07:32.134254881.134 0x782e3df9cd00 () - Event | XCB_KEY_PRESS(2) | sequence: 1799
[D] 05:07:32.134381082.134 0x782e3df9cd00 () - Event | XCB_XKB_* event(85) | sequence: 1799
[D] 05:07:32.134678107.134 0x782e3df9cd00 () - Event | XCB_PROPERTY_NOTIFY(28) | sequence: 1801
[D] 05:07:32.229723613.229 0x782e3df9cd00 () - Event | XCB_KEY_PRESS(2) | sequence: 1805
[D] 05:07:32.229797965.229 0x782e3df9cd00 () - Event | XCB_XKB_* event(85) | sequence: 1805
[D] 05:07:32.254977133.254 0x782e3df9cd00 () - Event | XCB_KEY_PRESS(2) | sequence: 1806
[D] 05:07:32.255023559.255 0x782e3df9cd00 () - Event | XCB_XKB_* event(85) | sequence: 1806
[D] 05:07:32.979062930.979 0x782e3df9cd00 () - Event | XCB_FOCUS_OUT(10) | sequence: 1807
[D] 05:07:32.979096661.979 0x782e3df9cd00 () - Event | XCB_KEY_PRESS(2) | sequence: 1807
[D] 05:07:32.979132194.979 0x782e3df9cd00 () - hot key activated: start-reading-clipboard
[D] 05:07:32.979142827.979 0x782e3df9cd00 () - executing action: start-reading-clipboard extra = ""
[D] 05:07:32.979805037.979 0x782e3df9cd00 () - tts play speech
[D] 05:07:32.980543624.980 0x782e3df9cd00 () - choosing model for id: "en_piper_us_lessac_high" "en"
[D] 05:07:32.980604936.980 0x782e3df9cd00 () - restart tts engine config: "lang=en, speaker=, model-files=[model-path=/home/user/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/en_piper_us_lessac_high, vocoder-path=, diacritizer=, hub-path=/home/user/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models], speaker=, ref_voice_file=, text-format=raw, sync_subs=on-fit-only-if-longer, tag_mode=support, options=, lang_code=, share-dir=/app/share, cache-dir=/home/user/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote, data-dir=, speech-speed=12, split-into-sentences=1, use-engine-speed-control=1, use-gpu=0, gpu-device=[id=-1, api=opencl, name=, platform-name=], audio-format=ogg-opus"
[D] 05:07:32.980607645.980 0x782e3df9cd00 () - new tts engine required
[D] 05:07:32.980613335.980 0x782e3df9cd00 start:235 - tts start
[D] 05:07:32.980653383.980 0x782e3df9cd00 start:245 - tts start completed
[D] 05:07:32.980658030.980 0x782e3df9cd00 encode_speech:329 - tts encode speech
[D] 05:07:32.980824891.980 0x782c28400680 process:923 - tts prosessing started

4.8.1 Beta:

[D] 05:07:51.953262818.953 0x7b68f7782d00 () - Event | XCB_KEY_PRESS(2) | sequence: 1854
[D] 05:07:51.953354418.953 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1854
[D] 05:07:52.026767774.26 0x7b68f7782d00 () - Event | XCB_KEY_PRESS(2) | sequence: 1855
[D] 05:07:52.026826592.26 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1855
[D] 05:07:52.069326510.69 0x7b68f7782d00 () - Event | XCB_KEY_PRESS(2) | sequence: 1856
[D] 05:07:52.069422382.69 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1856
[D] 05:07:52.460911365.460 0x7b68f7782d00 () - Event | XCB_KEY_PRESS(2) | sequence: 1857
[D] 05:07:52.461047689.461 0x7b68f7782d00 () - createNewSequences(QKeyEvent(ShortcutOverride, Key_E, ShiftModifier|ControlModifier|AltModifier, text="\u0005"), ignoredModifiers=QFlags<Qt::KeyboardModifier>(NoModifier)), possibleKeys=(
[D] 05:07:52.461061403.461 0x7b68f7782d00 () - QKeySequence("Ctrl+Alt+Shift+E")
[D] 05:07:52.461065553.461 0x7b68f7782d00 () - )
[D] 05:07:52.461071883.461 0x7b68f7782d00 () - Possible shortcut key sequences: QVector(QKeySequence("Ctrl+Alt+Shift+E"))
[D] 05:07:52.461078482.461 0x7b68f7782d00 () - Returning shortcut match ==  0
[D] 05:07:52.461085129.461 0x7b68f7782d00 () - QShortcutMap::nextState(QKeyEvent(ShortcutOverride, Key_E, ShiftModifier|ControlModifier|AltModifier, text="\u0005")) = 0
[D] 05:07:52.608621020.608 0x7b68f7782d00 () - Event | XCB_KEY_RELEASE(3) | sequence: 1858
[D] 05:07:52.980278902.980 0x7b68f7782d00 () - Event | XCB_KEY_RELEASE(3) | sequence: 1858
[D] 05:07:52.980332142.980 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1858
[D] 05:07:52.990208969.990 0x7b68f7782d00 () - Event | XCB_KEY_RELEASE(3) | sequence: 1858
[D] 05:07:52.990240721.990 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1858
[D] 05:07:53.004393929.4 0x7b68f7782d00 () - Event | XCB_KEY_RELEASE(3) | sequence: 1858
[D] 05:07:53.004455400.4 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1858
[D] 05:07:53.493337540.493 0x7b68f7782d00 () - Event | XInput Event(XCB_INPUT_MOTION) | sequence: 1858
[D] 05:07:53.493369740.493 0x7b68f7782d00 () - XI2 mouse motion 855,515, time 433579051, source MouseEventNotSynthesized
[D] 05:07:53.493417995.493 0x7b68f7782d00 () - QQuickWindow::handleMouseEvent() QEvent::MouseMove QPointF(855,515) Qt::NoButton QFlags<Qt::MouseButton>(NoButton)

Although 4.8.1 Beta does detect QKeySequence("Ctrl+Alt+Shift+E"), the action never gets triggered for some reason.

@mkiol
Copy link
Owner Author

mkiol commented Mar 15, 2025

@phirsch Sorry, but I can't reproduce this problem.

Also, I do not see these log lines Event | XCB_. They are not coming from Speech Note, at least not directly. Can you say something more about your system? Do you have any custom environment variables, etc.?

The whole "4.8.1 Beta" log does not look like the Speech Note log. I have no idea what it is.

@Kentoseth
Copy link

  • If you are using Flatpak, also make sure that the application has permission to access ydotool daemon socket file.

How do we do this?

I also read in one of the other issues that ydotool needs to have elevated/root privileges. Maybe some quick instructions on how to enable ydotool to function fully so that it works with this app can be shared?

@mkiol
Copy link
Owner Author

mkiol commented Mar 17, 2025

@Kentoseth

I also read in one of the other issues that ydotool needs to have elevated/root privileges.

I'm not an expert, but it doesn't have to be always run with root privileges. It might depend on distro and version of ydotool daemon.

  • In Arch, the ydotool package provides regular systemd user service. It can be run without sudo, and the daemon creates a socket at /run/user/$UID/.ydotool_socket with user uid/gid. To use it with Speech Note, just grant access to /run/user/$UID (in FlatSeal).
  • In Fedora, you need root to start systemd ydotool service. Socket is created in /tmp/.ydotool_socket but with permissions for everyone. To use it with Speech Note, grant access to /tmp (in FlatSeal).

@DivineMK
Copy link

In Arch, the ydotool package provides regular systemd user service.

The Arch package actually provide a uinput rule to allow access to /dev/uinput without elevated privileges. It also comes with a systemd service that allows you to use systemd to manage ydotoold as a service.

@tprotopopescu
Copy link

First, thank you for making this! Great app.

Text to speech in the main window and to clipboard work, but text to active window is not working for me. I have ydotool installed and the configuration seems to be correct - I don't get the error message in 'Settings -> Advanced' - but there is no output. In the log this line appears repeatedly

key_from_character:197 - xkb_keymap not initialized

I am guessing something about my X keyboard configuration is not right, but I don't know enough about it to work out what is going wrong. setxkbmap -query -verbose gives the following:

keycodes:   evdev+aliases(qwerty)
types:      complete
compat:     complete
symbols:    pc+us+inet(evdev)
geometry:   pc(pc105)
rules:      evdev
model:      pc105
layout:     us

System details:

Operating System: openSUSE Leap 15.6
KDE Plasma Version: 6.3.3
KDE Frameworks Version: 6.12.0
Qt Version: 6.8.2
Kernel Version: 6.4.0-150600.23.42-default (64-bit)
Graphics Platform: Wayland

@mkiol
Copy link
Owner Author

mkiol commented Mar 28, 2025

@tprotopopescu
Thanks for reporting. On Wayland xkb_keymap is retrieved directly from Wayland composer. I need more information to track down the cause. Would you be able to paste here every log lines between "using ydo fake-keyboard" and "xkb_keymap not initialized"?

Please make sure to start the app with --verbose option:

flatpak run net.mkiol.SpeechNote --verbose

@tprotopopescu
Copy link

Yes, no problem. Here is the output:

[D] 19:11:40.048272248.48 0x7f2c1c3c8d00 init_ydo:393 - using ydo fake-keyboard
[D] 19:11:40.048320369.48 0x7f2c1c3c8d00 operator():348 - connected ydo socket: /run/user/1000/.ydotool_socket
[D] 19:11:40.048426301.48 0x7f2c1c3c8d00 connect_wayland:750 - connect wayland
[D] 19:11:40.048801140.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_compositor version=6
[D] 19:11:40.048811498.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_tablet_manager_v2 version=1
[D] 19:11:40.048817842.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_keyboard_shortcuts_inhibit_manager_v1 version=1
[D] 19:11:40.048823971.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zxdg_decoration_manager_v1 version=1
[D] 19:11:40.048829646.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_viewporter version=1
[D] 19:11:40.048835188.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_fractional_scale_manager_v1 version=1
[D] 19:11:40.048840717.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_shm version=1
[D] 19:11:40.048846683.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_seat version=9
[D] 19:11:40.048855757.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_pointer_gestures_v1 version=3
[D] 19:11:40.048862309.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_pointer_constraints_v1 version=1
[D] 19:11:40.048868327.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_relative_pointer_manager_v1 version=1
[D] 19:11:40.048874396.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_data_device_manager version=3
[D] 19:11:40.048880644.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwlr_data_control_manager_v1 version=2
[D] 19:11:40.048886653.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_cursor_shape_manager_v1 version=1
[D] 19:11:40.048892598.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_idle version=1
[D] 19:11:40.048900162.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_idle_inhibit_manager_v1 version=1
[D] 19:11:40.048906194.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=ext_idle_notifier_v1 version=1
[D] 19:11:40.048914079.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_plasma_shell version=8
[D] 19:11:40.048919774.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_appmenu_manager version=2
[D] 19:11:40.048925331.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_server_decoration_palette_manager version=1
[D] 19:11:40.048930917.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_plasma_virtual_desktop_management version=2
[D] 19:11:40.048936395.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_shadow_manager version=2
[D] 19:11:40.048941704.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_dpms_manager version=1
[D] 19:11:40.048946936.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_server_decoration_manager version=1
[D] 19:11:40.048952310.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_management_v2 version=12
[D] 19:11:40.048958448.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zxdg_output_manager_v1 version=3
[D] 19:11:40.048964089.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_subcompositor version=1
[D] 19:11:40.048969524.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zxdg_exporter_v2 version=1
[D] 19:11:40.048974753.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zxdg_importer_v2 version=1
[D] 19:11:40.048980193.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_activation_v1 version=1
[D] 19:11:40.048986234.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_content_type_manager_v1 version=1
[D] 19:11:40.048992347.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_tearing_control_manager_v1 version=1
[D] 19:11:40.048998385.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_toplevel_drag_manager_v1 version=1
[D] 19:11:40.049004683.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_toplevel_icon_manager_v1 version=1
[D] 19:11:40.049010652.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_screen_edge_manager_v1 version=1
[D] 19:11:40.049016190.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=frog_color_management_factory_v1 version=1
[D] 19:11:40.049021686.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_presentation version=2
[D] 19:11:40.049027074.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_color_manager_v1 version=1
[D] 19:11:40.049032618.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_wm_dialog_v1 version=1
[D] 19:11:40.049038056.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_external_brightness_v1 version=2
[D] 19:11:40.049043263.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_alpha_modifier_v1 version=1
[D] 19:11:40.049048948.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_drm version=2
[D] 19:11:40.049054420.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_linux_dmabuf_v1 version=4
[D] 19:11:40.049059899.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_linux_drm_syncobj_manager_v1 version=1
[D] 19:11:40.049065364.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_wm_base version=6
[D] 19:11:40.049070965.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwlr_layer_shell_v1 version=5
[D] 19:11:40.049077247.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_device_v2 version=11
[D] 19:11:40.049083413.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_drm_lease_device_v1 version=1
[D] 19:11:40.049089507.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_order_v1 version=1
[D] 19:11:40.049095801.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_text_input_manager_v1 version=1
[D] 19:11:40.049102055.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_text_input_manager_v2 version=1
[D] 19:11:40.049108111.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_text_input_manager_v3 version=1
[D] 19:11:40.049114189.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_blur_manager version=1
[D] 19:11:40.049120554.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_contrast_manager version=2
[D] 19:11:40.049126579.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_slide_manager version=1
[D] 19:11:40.049132679.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_system_bell_v1 version=1
[D] 19:11:40.049138667.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_device_v2 version=11
[D] 19:11:40.049145066.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_output version=4
[D] 19:11:40.049151039.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_device_v2 version=11
[D] 19:11:40.049157184.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_output version=4
[W] 19:11:40.049410243.49 0x7f2c1c3c8d00 wly_keyboard_keymap:854 - map shm failed
[D] 19:11:40.049434051.49 0x7f2c1c3c8d00 connect_wayland:775 - wayland roundtrip done
[D] 19:11:40.049448141.49 0x7f2c1c3c8d00 make_compose_table:380 - trying compose file: /usr/share/X11/locale/en_US.UTF-8/Compose
[D] 19:11:40.052686130.52 0x7f2c1c3c8d00 () - stt intermediate text decoded: *** "en" 0
[D] 19:11:40.052847238.52 0x7f2c1c3c8d00 () - stt engine eof: 0
[D] 19:11:40.052857993.52 0x7f2c1c3c8d00 () - cancel: 0
[D] 19:11:40.052901761.52 0x7f2c1c3c8d00 request_stop:283 - stt stop requested
[D] 19:11:40.052907399.52 0x7f2c1c3c8d00 stop_processing_impl:389 - whisper cancel
[D] 19:11:40.052923145.52 0x7f2ab8dff680 flush:517 - flush: exit
[D] 19:11:40.052938073.52 0x7f2ab8dff680 reset_in_processing:424 - reset in processing
[D] 19:11:40.052941661.52 0x7f2ab8dff680 process:345 - stt processing ended
[D] 19:11:40.053128101.53 0x7f2c1c3c8d00 () - service refresh status, new state: listening-auto
[D] 19:11:40.062929431.62 0x7f2c1c3c8d00 () - app task state: processing => idle
[D] 19:11:40.064008208.64 0x7f2c1c3c8d00 () - stt engine stopping
[D] 19:11:40.064300198.64 0x7f2c1c3c8d00 () - service refresh status, new state: listening-auto
[D] 19:11:40.064312392.64 0x7f2c1c3c8d00 () - task state changed: 0 => 6
[D] 19:11:40.064329542.64 0x7f2c1c3c8d00 () - stt engine stopped: 0
[D] 19:11:40.064337054.64 0x7f2c1c3c8d00 () - stop stt engine
[D] 19:11:40.064343072.64 0x7f2c1c3c8d00 request_stop:279 - stt stop already requested
[D] 19:11:40.064350698.64 0x7f2c1c3c8d00 stop:308 - stt stop completed
[D] 19:11:40.064357670.64 0x7f2c1c3c8d00 () - mic source dtor
[D] 19:11:40.064641159.64 0x7f2c1c3c8d00 () - [dbus app] TaskState called
[D] 19:11:40.064691824.64 0x7f2c1c3c8d00 () - app task state: idle => cancelling
[D] 19:11:40.066049876.66 0x7f2c1c3c8d00 () - service refresh status, new state: idle
[D] 19:11:40.066066190.66 0x7f2c1c3c8d00 () - service state changed: listening-auto => idle
[D] 19:11:40.066079400.66 0x7f2c1c3c8d00 () - task state changed: 6 => 0
[D] 19:11:40.067355580.67 0x7f2c1c3c8d00 () - [dbus app] TaskState called
[E] 19:11:40.067405741.67 0x7f2c1c3c8d00 key_from_character:197 - xkb_keymap not initialized
[D] 19:11:40.067753591.67 0x7f2c1c3c8d00 () - app current task: 0 => -1
[W] 19:11:40.067764000.67 0x7f2c1c3c8d00 () - invalid task, reseting task state
[D] 19:11:40.067770554.67 0x7f2c1c3c8d00 () - app task state: cancelling => idle
[D] 19:11:40.068578732.68 0x7f2c1c3c8d00 () - app service state: listening-auto => idle
[D] 19:11:40.068614417.68 0x7f2c1c3c8d00 operator():348 - connected ydo socket: /run/user/1000/.ydotool_socket
[D] 19:11:40.068660492.68 0x7f2c1c3c8d00 operator():348 - connected ydo socket: /run/user/1000/.ydotool_socket
[W] 19:11:40.071242017.71 0x7f2c1c3c8d00 () - no available mnt langs
[W] 19:11:40.071254590.71 0x7f2c1c3c8d00 () - no available mnt out langs
[W] 19:11:40.071258343.71 0x7f2c1c3c8d00 () - no available tts models for in mnt
[W] 19:11:40.071260927.71 0x7f2c1c3c8d00 () - no available tts models for out mnt
[W] 19:11:40.071263313.71 0x7f2c1c3c8d00 () - invalid task, reseting task state
[W] 19:11:40.071624172.71 0x7f2c1c3c8d00 () - ignore TaskStatePropertyChanged signal
[D] 19:11:40.072294376.72 0x7f2c1c3c8d00 () - [dbus app] TaskState called
[D] 19:11:40.072480067.72 0x7f2c1c3c8d00 () - [dbus app] State called
[E] 19:11:40.072539565.72 0x7f2c1c3c8d00 key_from_character:197 - xkb_keymap not initialized

@mkiol
Copy link
Owner Author

mkiol commented Mar 30, 2025

@tprotopopescu Thanks.
The problem is here:

[W] 19:11:40.049410243.49 0x7f2c1c3c8d00 wly_keyboard_keymap:854 - map shm failed

auto *map_shm =
static_cast<char *>(mmap(nullptr, size, PROT_READ, MAP_SHARED, fd, 0));
if (map_shm == MAP_FAILED) {
LOGW("map shm failed");
return;
}

At the moment, I have no idea why mmap is failing on your system, but I'm trying to figure it out.

@phirsch
Copy link

phirsch commented Apr 5, 2025

My log above was obtained on an Ubuntu 24.04 system via the following command: flatpak run --branch=beta -v net.mkiol.SpeechNote --verbose

@mkiol mkiol changed the title Speech Note 4.8.0 Beta 1 Speech Note 4.8.0 Beta 2 Apr 16, 2025
@mkiol
Copy link
Owner Author

mkiol commented Apr 16, 2025

@tprotopopescu

At the moment, I have no idea why mmap is failing on your system, but I'm trying to figure it out.

I have released "Beta 2" with the following significant changes:

  • better logging has been added to investigate the cause of the problem with mmap
  • if the "keymap" (mapping between keyboard keys and key codes) cannot be retrieved from Wayland, the application will revert to the standard US keyboard layout.

I would appreciate it if you could test this and see what the map shm failed trace contains.

@phirsch
Copy link

phirsch commented Apr 17, 2025

FYI: With 4.8.0 Beta 2, I noticed the line Flash attention 2 is not installed in the console log.

In case this is an oversight, adding Flash attention would be great as it can speed up inference (particularly Parler is still pretty slow even on an A4500).

However, Flash Attention can be a bit tricky to install, so YMMV. (E.g. with uv (in a different context), I had to use UV_FIND_LINKS="https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp313-cp313-linux_x86_64.whl" to get it to work, where system, CUDA and Python versions all have to match the installation).

@mkiol mkiol changed the title Speech Note 4.8.0 Beta 2 Speech Note 4.8.0 Beta 3 Apr 21, 2025
@tprotopopescu
Copy link

@tprotopopescu

At the moment, I have no idea why mmap is failing on your system, but I'm trying to figure it out.

I have released "Beta 2" with the following significant changes:

* better logging has been added to investigate the cause of the problem with `mmap`

* if the "keymap" (mapping between keyboard keys and key codes) cannot be retrieved from Wayland, the application will revert to the standard US keyboard layout.

I would appreciate it if you could test this and see what the map shm failed trace contains.

Thanks again for looking into this. While trying to figure out how to update I started the Beta one version and found that now text to active window works. The only thing that I can think of that changed is that I did a regular system update, which may have fixed whatever the problem was.

@phirsch
Copy link

phirsch commented Apr 24, 2025

I'm happy to report that the global keyboard shortcuts now work again with 4.8.0 Beta 3. I'm not certain what caused the change to the previous state, but I suspect that maybe one of the following actions might have played a role:

  1. I disabled and re-enabled the 'Use global keyboard shortcuts' setting.

  2. After that, I triggered the shortcut action once while the SpeechNote window was in the foreground and focused.

Anyway, thanks again for this great app!

No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants