Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
-
Updated
May 22, 2024 - Python
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Python library for creating PEG parsers
Text Classification Algorithms: A Survey
Persian NLP Toolkit
Thai natural language processing in Python
All-in-one text de-duplication
A simple Python module for parsing human names into their individual components
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Text Normalization & Inverse Text Normalization
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…
Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.
Automatic Korean word spacing with Python
Simple SQL-like syntax on top of Perl text processing.
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Textpipe: clean and extract metadata from text
短文本聚类预处理模块 Short text cluster
Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.
To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."