Skip to content
Advertisement

Tag: parsing

Finding identical numbers in large files python

I have two data files in python, each containing two-column data as below: There are about 10M entries in each file (~400Mb). I have to sort through each file and check if any number in the first column of one file matches any number in the first column in another file. The code I currently have converted the files to

How to parse and match with multiple regexes

I have an input data of the form: I need to parse through this data and the IN: / OUT: /INOUT: depending on three regexes given as: My output should be: The code I tried: The problem I face is that it does not parse correctly and it is not getting matched for each subdata beginning with [2] Answer Though

Substring any kind of HTML String

i need to divide any kind of html code (string) to a list of tokens. For example: or or What i tried to do : My output: So i tried to split at “/>” which is working for the first case. Then i tried several things. Tried to identify the “name”, so the first identifier of the html string like

Split a nested XML string to get a string using parser

I have this string : My goal is to extract Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour unnm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1. so the text between <run> and </run> I did it with regular expression but it doesn’t work with some xml string so I tried with

How to look for specific values in a dictionary and return them properly

So I have a list dictionary of dictionaries (lst) that I’m trying to iterate through, compare values, and return the appropriate values. I have the following code to retrieve 2 arguments given from the command line, compare them through the dictionaries entries, and return the appropriate value: get_attribute_number(cmd1=sys.argv[1], cmd2=sys.argv[2], lst=data_list) However, my program is not returning anything. It is supposed

Regex for AlphaNumeric words with special characters [closed]

Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 1 year ago. Improve this question I am trying to make regex for capturing alphanumeric words with special characters. The search will be done on small

The python parser does not read information from the site, but returns None

I’m making a python parser for the site: https://www.kinopoisk.ru/lists/series-top250/ The task is to pick film genres from films (displayed on the page as: ‘span’, class _ = ‘selection-film-item-meta__meta-additional-item’) I can’t understand why it gives the result: [{‘title’: None}, {‘title’: None}, {‘title’: None}, … {‘title’: None}] Answer I’m definitely getting some captcha blocks from my local machine https://www.kinopoisk.ru/**showcaptcha**?cc=1&retpath=https%3A//www.kinopoisk.ru/lists/series-top250%3F_ea4584… but running from

Python: Convert markdown table to json with

I am trying to figure out, what is the easiest way to convert some markdown table text into json using only python. For example, consider this as input string: The wanted output should be this: Note: Ideally, the output should be RFC 8259 compliant, aka use double quotes ” instead of single quotes ‘ around they key value pairs. I’ve

Advertisement