I generate text via transformer models and I am looking for a way of measuring the grammatical text-quality.
Like the text: "Today is a good day. I slept well and got up good in the morning."
should be rated higher than: "Yesterday I went into bed and. got Breakfast son."
Are there any models, which can do this job which I didnt find before, or is there any other way of measuring the quality of the grammatical output of the text?
What I found out was, that spacy has the option to show whether a text has a grammatical error, but what I am more interested in is a score which included the length of the text and the amount of error it has. Also I looked into NLTK readability, but this aims at how well the text can be understood, which depends on more than the grammar only.
Thank you!
Advertisement
Answer
So I found what I was looking for: In this paper the researchers tested different measures for their ability on checking grammar mistakes for text without references (what the GLEU-Score can be used for). They also tested the python-language-tool which is also used for spell checking in open-office. This tool is able to measure the amount of grammar mistakes in a text. For my purpose, I will just divide the amount of error through the amount of words in the text, which gives me an error metric.
Maybe this helps someone, who has the same issue. Here the example code, based on pypi:
import language_tool_python tool = language_tool_python.LanguageTool('en-US') text = "this is a test tsentence, to check if all erors are found" matches = tool.check(text) len(matches) >>>3