I am new to Prodigy and spaCy as well as CLI coding. I’d like to use Prodigy to label my data for an NER model, and then use spaCy in python to create models.
Prodigy outputs in SQLite format. SpaCy takes in this other kind of format, not sure what to call it:
TRAIN_DATA = [ ( "Horses are too tall and they pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}, ), ("Do they bite?", {"entities": []}), ( "horses are too tall and they pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}, ), ("horses pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}), ( "they pretend to care about your feelings, those horses", {"entities": [(48, 54, LABEL)]}, ), ("horses?", {"entities": [(0, 6, LABEL)]}), ]
How can I convert from one to the other? It seems like this should be easy, but I cannot find it anywhere.
I have no problem loading in the dataset, just converting.
Advertisement
Answer
Prodigy should export this training format with data-to-spacy
as of version 1.9: https://prodi.gy/docs/recipes#data-to-spacy