A Python API library for alignment of linguistic units based on parallel data, such as word alignment, using libraries such as awesome-align. Also meant to be used as the base for using such aligned data.
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
Added heuristics-based transfer of linguistic knowledge based on word alignment. Before that, word alignment also improved using linguistic knowledge based on morphologicand, POS and dependecy parse from the source language to reduce NULL alignments for example, so as to correct obvious omissions from alignments. Updated requirements files, YAML files,docs, CLI and usage for this extension. Effectively merges the SyntheticWrdAlignedUDTB project into this project, so that it does not just word alignment and provides data structure API (with SSF and CoNLL-U support), but also aligns and creates treebanks (not necessarily synthetic).
9a38370View on GitHubExtended docs to include details about the new options for scoring etc..
9b13e35View on GitHubAdded alignment scoring using some common ways. Added these options to logging, CLI and usage.
d16da96View on GitHubIntegrated the ability to use any BERT model such as xlm-roberta-large or indic-bert-v2 etc. Improved the CLI and usage help in the main run scription.
9863bb8View on GitHubVersion 3.0 working. Added word alignment with awesome-align and fast_align, converter from their output to GIZA++ .A3.final files. Some docs and tests added.
6a8c26cView on GitHubVersion 1.0: Fully functional API for aligned linguistic units, as well as awesome-align based word alignment. Includes two notebooks for each of them.
6b09295View on GitHub