text-processing

Star

Here are 605 public repositories matching this topic...

google / diff-match-patch

Star

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

diff match patch text-processing difference

Updated May 22, 2024
Python

pymupdf / PyMuPDF

Star

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

python pdf font data-science ocr tesseract epub mupdf text-processing pdf-documents extract-data table-extraction text-shaping xps pymupdf

Updated May 1, 2025
Python

fastnlp / fastNLP

Star

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

natural-language-processing deep-learning text-classification chinese-nlp text-processing nlp-parsing nlp-library

Updated Jun 5, 2023
Python

pyparsing / pyparsing

Star

Python library for creating PEG parsers

python parsing parser-combinators python3 parsing-expression-grammar python-3 text-processing python-2 python2 parsing-library peg-parsers

Updated Apr 5, 2025
Python

kk7nc / Text_Classification

Star

Text Classification Algorithms: A Survey

Updated Apr 1, 2025
Python

roshan-research / hazm

Star

Persian NLP Toolkit

python nlp natural-language-processing tokenizer embeddings persian text-processing dependency-parser farsi pos-tagging persian-nlp normalization lemmatization

Updated Jul 16, 2024
Python

PyThaiNLP / pythainlp

Star

Thai natural language processing in Python

python natural-language-processing thai-language thai computational-linguistics text-processing soundex nlp-library word-segmentation thai-nlp hacktoberfest thai-nlp-library thai-soundex

Updated Apr 30, 2025
Python

ChenghaoMou / text-dedup

Sponsor

Star

All-in-one text de-duplication

nlp text-processing data-processing de-duplication

Updated May 21, 2024
Python

derek73 / python-nameparser

Star

A simple Python module for parsing human names into their individual components

python text-processing text-parser python-module

Updated May 28, 2024
Python

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

nlp tokenizer text-processing semeval nlp-library word-segmentation spelling-correction tokenization text-segmentation spell-corrector word-normalization

Updated Feb 27, 2024
Python

wenet-e2e / WeTextProcessing

Star

Text Normalization & Inverse Text Normalization

text-processing production-ready normalization

Updated Nov 11, 2024
Python

proycon / pynlpl

Star

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…

python nlp machine-learning natural-language-processing library linguistics computational-linguistics text-processing nlp-library search-algorithms evaluation-metrics folia language-modelling

Updated Sep 14, 2023
Python

lukaszliniewicz / Pandrator

Star

Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.