Skip to content
Advertisement

Tag: huggingface-tokenizers

Can BERT output be fixed in shape, irrespective of string size?

I am confused about using huggingface BERT models and about how to make them yield a prediction at a fixed shape, regardless of input size (i.e., input string length). I tried to call the tokenizer with the parameters padding=True, truncation=True, max_length = 15, but the prediction output dimensions for inputs = [“a”, “a”*20, “a”*100, “abcede”*20000] are not fixed. What am

Hugging Face: NameError: name ‘sentences’ is not defined

I am following this tutorial here: https://huggingface.co/transformers/training.html – though, I am coming across an error, and I think the tutorial is missing an import, but i do not know which. These are my current imports: Current code: The error: Answer The error states that you do not have a variable called sentences in the scope. I believe the tutorial presumes

Transformers v4.x: Convert slow tokenizer to fast tokenizer

I’m following the transformer’s pretrained model xlm-roberta-large-xnli example and I get the following error I’m using Transformers version ‘4.1.1’ Answer According to Transformers v4.0.0 release, sentencepiece was removed as a required dependency. This means that “The tokenizers that depend on the SentencePiece library will not be available with a standard transformers installation” including the XLMRobertaTokenizer. However, sentencepiece can be installed

Advertisement