I am performing an in silico digestion of the human proteome, meaning that I am trying to chopped the amino acid sequence of every protein at a certain position. I am using the Pyteomics parser function Pyteomics Parser within a bigger function that I have created.
I am getting this error: PyteomicsError: Pyteomics error, message: “Not a valid modX sequence: {‘sequence’: ‘AKDEVQKN’}”
However, I am unsure how AKDEVQKN doesn’t match the modX_reqquence compilier:
_modX_sequence = re.compile(r'^([^-]+-)?((?:[^A-Z-]*[A-Z])+)(-[^-]+)?$')
From my understanding of this regex, it should find any string that doesn’t start with (-) and is followed by a series of alphabetical characters.
This is the function I am trying to use it on.
import re
import pyteomics
from pyteomics import fasta, parser
def ButcherShop(df, target, rule,min_length=7,exception=None,max_legnth=100, pH=2.0):
> raw = df[target]
> unique_peptides = set()
> for peptide in raw:
> new_peptides = parser.cleave(peptide, rule=rule,min_length=min_length,exception=exception)
> unique_peptides.update(new_peptides)
> print(f'Done,{len(unique_peptides)} sequences of >= 7 amino acids!')
> pep_dic = [{'sequence': i} for i in unique_peptides]
> for peptides in pep_dic:
> pep_dic['parsed_sequence'] = parser.parse(peptides,show_unmodified_termini=False)
> pep_dic['xlength'] = len(peptides)
> pep_dic['charge'] = int(round(electrochem.charge(peptides, pH=pH)))
> pep_dic['mass']=int(round(Peptide_mass(peptides)))
> pep_dic = [peptide for peptide in pep_dic if peptide['length'] <= int(max_length)]
> pep_df = pd.DataFrame.from_dict(pep_dic)
> return unique_peptides,pep_dic,pep_df
Thank you for any insight on how to address this.
** Update: If I run on a different set, I am getting the same error which may suggest it is the library itself.
Advertisement
Answer
Pyteomics maintainer here.
The error message actually tells you the source of the problem: PyteomicsError: Pyteomics error, message: "Not a valid modX sequence: {'sequence': 'AKDEVQKN'}"
It means that instead of a string 'AKDEVQKN'
you passed a dictionary {'sequence': 'AKDEVQKN'}
. This actually happens here:
pep_dic = [{'sequence': i} for i in unique_peptides]
for peptides in pep_dic:
pep_dic['parsed_sequence'] = parser.parse(peptides,show_unmodified_termini=False)
You should pass the sequence itself to parse
, not the dict:
pep_dic['parsed_sequence'] = parser.parse(peptides['sequence'], show_unmodified_termini=False)