I have a python code and i’m reading a certificate and matching only the root cert. For ex my certificate is as below:
--------begin certificate-------- CZImiZPyLGQBGRYFbG9jYWwxGjAYBgoJkiaJk/IasdasdassZAEZFgp2aXJ0dWFsdnB4MSEw HwYDVQQDExh2aXJ0dWFsdnB4LVZJUlRVQUxEQzEtQ0EwHhfdgdgdgfcNMTUwOTE2MTg1MTMx WhcNMTcwOTE2MTkwMTMxWjBaMQswCQYDVQQGEwJVUzEXMBUGCgmSJoaeqasadsmT8ixkARkW B3ZzcGhlcmUxFTATBgoJkiaJk/IsZAEZFgVsb2NhbDEOMAwGA1UEChMFdmNlcnfrrfgfdvQx CzAJBgNVBAMTAkNBMIIBIjANBgkqhkiG9w --------end certificate---------- --------begin certificate-------- ZGFwOi8vL0NOPXZpcnR1YWx2cHgtcvxcvxvVklSVFVBTERDMS1DQSxDTj1BSUEsQ049UHVi bGljJTIwS2V5JTIwU2VydmldfsfhjZXMsQ049U2VydmfffljZXMsQ049Q29uZmlndXJhdGlv bixEQz12aXJ0dWFsdnB4LERDPWxvY2FsP2NxvxcvxcvBQ2VydGlmaWNhdGU/YmFzZT9vYmpl Y3RDbGFzcz1jZXJ0aWZpY2F0aW9uQXV0dsfsdffraG9yaXR5MD0GCSsGAQQBgjcVBwQwMC4G --------end certificate----------
I want to fetch only the root certificate, which starts with CZImiZPy. I read the certificate into the variable data and applying the below regex
re.sub('-----.*?-----', '', data)
But it fetched both the encrypted certificates and not just the first one. Is there any better way I can tweak the regular expression?
Advertisement
Answer
You want to search for text, not substitute it with something else.
>>> import re >>> s = """--------begin certificate-------- <certificate encrypted> --------end certificate---------- --------begin certificate-------- <certificate encrypted> --------end certificate----------""" >>> re.search(r"-+begin certificate-+s+(.*?)s+-+end certificate-+", s, flags=re.DOTALL).group(1) '<certificate encrypted>'
Explanation:
-+begin certificate-+ # Match the starting label s+ # Match whitespace (including linebreaks) (.*?) # Match any number of any character. Capture the result in group 1 s+ # Match whitespace (including linebreaks) -+end certificate-+ # Match the ending label
re.search()
will always return the first match.