Found 347 repositories(showing 30)
reader-dict
The most comprehensive universal, multilingual, and monolingual dictionaries—perfect for e-readers, phones, tablets, and desktop apps. Powered by Wiktionary.
open-dict-data
Monolingual wordlists with pronunciation information in IPA
IngmarStein
Remove unnecessary language resources from macOS.
facebookresearch
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
ajinkyakulkarni14
TED parallel Corpora is growing collection of Bilingual parallel corpora, Multilingual parallel corpora and Monolingual corpora extracted from TED talks www.ted.com for 109 world languages.
wietsedv
BERTje is a Dutch pre-trained BERT model developed at the University of Groningen. (EMNLP Findings 2020) "What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models"
lena-voita
This is a repository with the data and code for the ACL 2019 paper "When a Good Translation is Wrong in Context: ..." and the EMNLP 2019 paper "Context-Aware Monolingual Repair for Neural Machine Translation"
CPJKU
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
jcyk
Code for our ACL2021 paper Neural Machine Translation with Monolingual Translation Memory
ma-sultan
No description available
ikergarcia1996
A monolingual and cross-lingual meta-embedding generation and evaluation framework
lmthang
Train bilingual embeddings as described in our NAACL 2015 workshop paper "Bilingual Word Representations with Monolingual Quality in Mind". Besides, it has all the functionalities of word2vec with added features and code clarity. See README for more info.
IlyaGusev
Code inspired by Unsupervised Machine Translation Using Monolingual Corpora Only
cbaziotis
This repository contains source code for the paper "Language Model Prior for Low-Resource Neural Machine Translation"
M4t1ss
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
cjerry1243
Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion
Caucasus-Rosetta
Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)
konstantinjdobler
[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
Improving Indonesian text classification using multilingual language model
lgessler
A tiny BERT for low-resource monolingual models
wxjiao
Implementation of our paper "Self-training Sampling with Monolingual Data Uncertainty for Neural Machine Translation" to appear in ACL-2021.
adapter-hub
Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"
kulupu-lapo
A library / monolingual corpus of Toki Pona texts.
tylerachang
Goldfish: Monolingual language models for 350 languages.
AmrHendy
An easy way to use the released TransCoder by Facebook AI Research to convert code from one programming language to another using unsupervised neural machine translation (NMT) systems that use deep-learning to translate text from one natural language to another and is trained only on monolingual source data.
sanskrit monolingual corpus
yhcc
This is the code for the EMNLP2020 Finding paper "BERT for Monolingual and Cross-Lingual Reverse Dictionary"
facebookresearch
Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projections to align monolingual word embedding spaces. In this paper, we show it is possible to produce much higher quality lexicons with methods that combine (1) unsupervised bitext mining and (2) unsupervised word alignment. Directly applying a pipeline that uses recent algorithms for both subproblems significantly improves induced lexicon quality and further gains are possible by learning to filter the resulting lex-ical entries, with both unsupervised and semi-supervised schemes. Our final approach out-performs the state of the art on the BUCC 2020shared task by 14 F1 points averaged over 12 language pairs, while also providing a more interpretable approach that allows for rich reasoning of word meaning in context.
Monolingual Finetuning for Chatterbox Multilingual
mrinaldhar
Repository for the English-Hindi Codemixed to Monolingual English Parallel Corpus