natural-language-processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
Here are 7,322 public repositories matching this topic...
-
Updated
Feb 25, 2021 - Python
-
Updated
Mar 12, 2021 - Jupyter Notebook
-
Updated
Mar 23, 2021 - Python
-
Updated
Mar 25, 2021 - Python
-
Updated
Mar 25, 2021 - Python
-
Updated
Mar 23, 2021 - Python
-
Updated
Jun 12, 2017
Change tensor.data to tensor.detach() due to
pytorch/pytorch#6990 (comment)
tensor.detach() is more robust than tensor.data.
Not a high-priority at all, but it'd be more sensible for such a tutorial/testing utility corpus to be implemented elsewhere - maybe under /test/ or some other data- or doc- related module – rather than in gensim.models.word2vec.
Originally posted by @gojomo in RaRe-Technologies/gensim#2939 (comment)
-
Updated
Mar 21, 2021
-
Updated
Mar 24, 2021
-
Updated
Mar 26, 2021 - Python
-
Updated
Mar 24, 2021 - Python
-
Updated
Dec 22, 2020 - Python
While setting train_parameters to False very often we also may consider disabling dropout/batchnorm, in other words, to run the pretrained model in eval mode.
We've done a little modification to PretrainedTransformerEmbedder that allows providing whether the token embedder should be forced to eval mode during the training phase.
Do you this feature might be handy? Should I open a PR?
-
Updated
Mar 25, 2021 - Python
-
Updated
Mar 24, 2021
-
Updated
Mar 25, 2021 - Python
-
Updated
Mar 12, 2021
-
Updated
Jan 1, 2021 - Python
-
Updated
Oct 20, 2020 - Jupyter Notebook
-
Updated
Mar 25, 2021 - Java
-
Updated
Dec 3, 2020 - Python
-
Updated
Mar 25, 2021 - Python
This happens after a map operation when num_proc is set to >1. I tested this by cleaning up the json before running the map op on the dataset so it's unlikely it's coming from an earlier concatenation.
Example result:
"citation": "@ONLINE {wikidump,\n author = {Wikimedia Foundation},\n title = {Wikimedia Downloads},\n url = {https://dumps.wikimedia.org}\n}\n\n@ONLINE
Hello spoooopyyy hackers
This is a Hacktoberfest only issue!
This is also data-sciency!
The Problem
Our English dictionary contains words that aren't English, and does not contain common English words.
Examples of non-common words in the dictionary:
"hlithskjalf",
"hlorrithi",
"hlqn",
"hm",
"hny",
"ho",
"hoactzin",
"hoactzine
-
Updated
Mar 24, 2021 - Python
-
Updated
Mar 19, 2021 - Python
Created by Alan Turing
- Wikipedia
- Wikipedia


Recently HF trainer was extended to support full fp16 eval via
--fp16_full_eval. I'd have expected it to be either equal or faster than eval with fp32 model, but surprisingly I have noticed a 25% slowdown when using it.This may or may not impact deepspeed as well, which also runs eval in fp16, but we can't compare it to a baseline, since it only runs fp16.
I wonder if someone would like t