huggingface / transformers

Star

Open

[trainer] figuring out why eval with `--fp16_full_eval` is 25% slower

stas00 commented Mar 20, 2021

Recently HF trainer was extended to support full fp16 eval via --fp16_full_eval. I'd have expected it to be either equal or faster than eval with fp32 model, but surprisingly I have noticed a 25% slowdown when using it.

This may or may not impact deepspeed as well, which also runs eval in fp16, but we can't compare it to a baseline, since it only runs fp16.

I wonder if someone would like t

Good First Issue Good Second Issue

Open

DeBERTa Fast Tokenizer

5

Open

[Wav2Vec2] Improve SpecAugment function by converting numpy based function to pytorch based function

9

Find more good first issues →

google-research / bert

Star

TensorFlow code and pre-trained models for BERT

nlp natural-language-processing google tensorflow natural-language-understanding

Updated Feb 25, 2021
Python

GokuMohandas / madewithml

Star

Learn how to responsibly deliver value with applied ML.

machine-learning natural-language-processing deep-learning pytorch mlops applied-ml

Updated Mar 12, 2021
Jupyter Notebook

hankcs / HanLP

Star

中文分词词性标注命名实体识别依存句法分析语义依存分析新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

nlp natural-language-processing text-classification hanlp named-entity-recognition dependency-parser pos-tagging semantic-parsing

Updated Mar 23, 2021
Python

d2l-ai / d2l-zh

Star

《动手学深度学习》：面向中文读者、能运行、可讨论。中英文版被全球175所大学采用教学。

python machine-learning natural-language-processing computer-vision deep-learning book notebook chinese

Updated Mar 25, 2021
Python

explosion / spaCy

Star

💫 Industrial-strength Natural Language Processing (NLP) in Python

python nlp data-science machine-learning natural-language-processing ai deep-learning neural-network text-classification cython artificial-intelligence spacy named-entity-recognition neural-networks nlp-library tokenization entity-linking

Updated Mar 25, 2021
Python

sebastianruder / NLP-progress

Star

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

machine-learning natural-language-processing machine-translation dialogue named-entity-recognition nlp-tasks

Updated Mar 23, 2021
Python

oxford-cs-deepnlp-2017 / lectures

Star

Oxford Deep NLP 2017 course

nlp machine-learning natural-language-processing deep-learning oxford

Updated Jun 12, 2017

ShusenTang / Dive-into-DL-PyTorch

Star

Open

Update 2.3_autograd.ipynb

haoma7 commented Oct 21, 2019

Change tensor.data to tensor.detach() due to
pytorch/pytorch#6990 (comment)
tensor.detach() is more robust than tensor.data.

good first issue

RaRe-Technologies / gensim

Star

Open

Move BrownCorpus from word2vec to gensim.corpora

6

piskvorky commented Sep 24, 2020

Not a high-priority at all, but it'd be more sensible for such a tutorial/testing utility corpus to be implemented elsewhere - maybe under /test/ or some other data- or doc- related module – rather than in gensim.models.word2vec.

Originally posted by @gojomo in RaRe-Technologies/gensim#2939 (comment)

good first issue housekeeping

Open

word2vec doc-comment example of KeyedVectors usage broken

6

Open

WikiCorpus.filter_wiki/remove_markup don't remove heading-markup

8

Find more good first issues →

keon / awesome-nlp

Star

📖 A curated list of resources dedicated to Natural Language Processing (NLP)

nlp language machine-learning natural-language-processing text-mining awesome deep-learning awesome-list

Updated Mar 21, 2021

bharathgs / Awesome-pytorch-list

Star

A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.

python nlp data-science machine-learning natural-language-processing awesome facebook computer-vision deep-learning neural-network cv tutorials pytorch awesome-list utility-library probabilistic-programming papers nlp-library pytorch-tutorials pytorch-model

Updated Mar 24, 2021

RasaHQ / rasa

Star

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

nlp bot machine-learning natural-language-processing bots botkit chatbot bot-framework nlu spacy mitie chatbots machine-learning-library wit rasa conversational-agents conversational-bots chatbots-framework conversational-ai conversation-driven-development

Updated Mar 26, 2021
Python

flairNLP / flair

Star

A very simple framework for state-of-the-art Natural Language Processing (NLP)

nlp machine-learning natural-language-processing word-embeddings pytorch named-entity-recognition sequence-labeling semantic-role-labeling

Updated Mar 24, 2021
Python

chiphuyen / stanford-tensorflow-tutorials

Star

This repository contains code examples for the Stanford's course: TensorFlow for Deep Learning Research.

python nlp machine-learning natural-language-processing tutorial deep-learning course-materials tensorflow chatbot stanford

Updated Dec 22, 2020
Python

allenai / allennlp

Star

Open

Option to run PretrainedTransformer in eval mode

5

mahnerak commented Jan 2, 2021

While setting train_parameters to False very often we also may consider disabling dropout/batchnorm, in other words, to run the pretrained model in eval mode.
We've done a little modification to PretrainedTransformerEmbedder that allows providing whether the token embedder should be forced to eval mode during the training phase.

Do you this feature might be handy? Should I open a PR?

Contributions welcome Good First Issue

Open

Add `best_span_probs` to TransformerQA

5

Open

Port learning rate schedulers from Huggingface

1

Find more good first issues →

nltk / nltk

Star

NLTK Source

python nlp machine-learning natural-language-processing nltk

Updated Mar 25, 2021
Python

eugeneyan / applied-ml

Star

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

search data-science machine-learning natural-language-processing reinforcement-learning computer-vision deep-learning production data-engineering data-discovery recsys data-quality applied-data-science applied-machine-learning

Updated Mar 24, 2021

d2l-ai / d2l-en

Star

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 175 universities.

python data-science machine-learning natural-language-processing computer-vision deep-learning mxnet book notebook tensorflow keras pytorch kaggle

Updated Mar 25, 2021
Python

kmario23 / deep-learning-drizzle

Star

Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!

machine-learning natural-language-processing deep-neural-networks reinforcement-learning computer-vision deep-learning optimization machine-translation deep-reinforcement-learning medical-imaging speech-recognition artificial-neural-networks pattern-recognition probabilistic-graphical-models bayesian-statistics artificial-intelligence-algorithms visual-recognition graph-neural-networks

Updated Mar 12, 2021

hanxiao / bert-as-service

Star

Mapping a variable-length sentence to a fixed-length vector using BERT model

nlp machine-learning natural-language-processing deep-neural-networks ai deep-learning tensorflow bert natural-language-understanding word-embedding sentence-encoding

Updated Jan 1, 2021
Python

graykode / nlp-tutorial

Star

Natural Language Processing Tutorial for Deep Learning Researchers

nlp natural-language-processing tutorial tensorflow paper pytorch transformer attention bert

Updated Oct 20, 2020
Jupyter Notebook

stanfordnlp / CoreNLP

Star

Stanford CoreNLP: A Java suite of core NLP tools.

nlp natural-language-processing named-entity-recognition stanford-nlp nlp-parsing

Updated Mar 25, 2021
Java

clips / pattern

Star

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

python machine-learning natural-language-processing sentiment-analysis wordnet web-mining network-analysis

Updated Dec 3, 2020
Python

ludwig-ai / ludwig

Star

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

learning machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning machine natural-language deep python3 machinelearning deeplearning natural-language-generation natural-language-understanding

Updated Mar 25, 2021
Python

sloria / TextBlob

Star

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

python nlp natural-language-processing pattern nltk python-3 python-2

Updated Mar 22, 2021
Python

huggingface / datasets

Star

Open

citation, homepage, and license fields of `dataset_info.json` are duplicated many times

1

samsontmr commented Mar 23, 2021

This happens after a map operation when num_proc is set to >1. I tested this by cleaning up the json before running the map op on the dataset so it's unlikely it's coming from an earlier concatenation.

Example result:

"citation": "@ONLINE {wikidump,\n    author = {Wikimedia Foundation},\n    title  = {Wikimedia Downloads},\n    url    = {https://dumps.wikimedia.org}\n}\n\n@ONLINE

enhancement good first issue

Open

Only user permission of saved cache files, not group

13

Ciphey / Ciphey

Star

Open

Find a good English dictionary for us!

17

bee-san commented Sep 29, 2020

Hello spoooopyyy hackers 🎃

This is a Hacktoberfest only issue! 👻

This is also data-sciency!

The Problem

Our English dictionary contains words that aren't English, and does not contain common English words.

Examples of non-common words in the dictionary:

"hlithskjalf",
  "hlorrithi",
  "hlqn",
  "hm",
  "hny",
  "ho",
  "hoactzin",
  "hoactzine

feature_request good first issue hacktoberfest

Open

Add T9 decoder

7

Open

Add Double Transposition cracker

4

Find more good first issues →

lazyprogrammer / machine_learning_examples

Star

A collection of machine learning examples and tutorials.

python data-science machine-learning natural-language-processing reinforcement-learning deep-learning

Updated Mar 24, 2021
Python

PaddlePaddle / models

Star

Pre-trained and Reproduced Deep Learning Models （『飞桨』官方模型库，包含多种学术前沿和工业场景验证的深度学习模型）

natural-language-processing computer-vision deep-learning neural-network speech recommendation paddlepaddle

Updated Mar 19, 2021
Python

Feb	MAR	Apr
	26
2020	2021	2022

natural-language-processing

Here are 7,322 public repositories matching this topic...

huggingface / transformers

google-research / bert

GokuMohandas / madewithml

hankcs / HanLP

d2l-ai / d2l-zh

explosion / spaCy

sebastianruder / NLP-progress

oxford-cs-deepnlp-2017 / lectures

ShusenTang / Dive-into-DL-PyTorch

RaRe-Technologies / gensim

keon / awesome-nlp

bharathgs / Awesome-pytorch-list

RasaHQ / rasa

flairNLP / flair

chiphuyen / stanford-tensorflow-tutorials

allenai / allennlp

nltk / nltk

eugeneyan / applied-ml

d2l-ai / d2l-en

kmario23 / deep-learning-drizzle

hanxiao / bert-as-service

graykode / nlp-tutorial

stanfordnlp / CoreNLP

clips / pattern

ludwig-ai / ludwig

sloria / TextBlob

huggingface / datasets

Ciphey / Ciphey

The Problem

lazyprogrammer / machine_learning_examples

PaddlePaddle / models