I am trying to concat multiple CSVs that live in subfolders of my parent directory into a data frame, while also adding a new filename column. I can do something like this to concat all the CSVs into a single data frame But is there a way to also add the filename of each file as a column to the
Tag: glob
Wrong snakemake glob_wilcards and wildcard_constraints
Within my snakemake pipeline I’m trying to retrieve the correct wildcards. I’ve looked into wildcard_constraints and this post and this post, however I can’t figure out the exact solution. Here’s an example of file names within 2 datasets. 1 dataset contains paired mouse RNAseq read files and another dataset contains human paired RNAseq read files. “Mus_musculus” dataset is “PRJNA362883_GSE93946_SRP097621” with
How to read most recent file with Pandas? Output path is undefined?
I’m trying to read the two latest sheets in my folder READ1 and READ2 with pandas. Usually when I read files the file name has to be formatted at ‘File.xlsx’ but the method I’m using is printing in the terminal as File.xlsx. I tried changing the format with: Which outputs as [“‘None'”] My Code: If I run my code as
summing the values row wise
I have a three column of data as arranged below: Input file: In the above input file the first column values are repeated so I want to take only once that value and want to sum the third column values row wise and do not want to take any second column values. I also want to append a third column
My code is confusing an input file name for a regex expression
My regular expression does not explicitly include a dash in a character range, but my code fails when the input file name is like this: Here is my code: It seems obvious that this part of the filename is the issue: [Maxi-Single] How do I handle filenames similar to that so that they are treated as fixed strings, not part
How can I delete multiple images from a folder using wildcards and a list of unique ids in a .txt file?
I am trying to delete about 3000 images from a folder with 5000 images. The image names look like this, for example: 03_38_25_006892_2.jpg I have a .txt file that has the unique digits that follow the final underscore in the image name, for the images that I want to delete. So the text file contents look maybe like this: 2
Python/pandas/os: get the files in this folder and iterate over those that fit this naming convention
I want to perform some data cleaning on all the files in the same folder as my script that fit a naming convention. The data cleaning I am fine with, but it’s just the same folder that I am struggling with. Previous working code: Current code: I get the error code No such file or directory: ‘C’ Do I need
Copy files from multiple specific subfolders
Under the file path D:/src, I have images folders and its subfolders which have a regular structure as follows: I want to copy all .jpg files in Subfolder b from Folder A, B, C and so on to a new folder Subfolder b in D:/dst. How can I do it in Python? Thanks. Here is what I have found from
glob exclude pattern
I have a directory with a bunch of files inside: eee2314, asd3442 … and eph. I want to exclude all files that start with eph with the glob function. How can I do it? Answer The pattern rules for glob are not regular expressions. Instead, they follow standard Unix path expansion rules. There are only a few special characters: two
How to use glob() to find files recursively?
This is what I have: but I want to search the subfolders of src. Something like this would work: But this is obviously limited and clunky. Answer pathlib.Path.rglob Use pathlib.Path.rglob from the pathlib module, which was introduced in Python 3.5. If you don’t want to use pathlib, use can use glob.glob(‘**/*.c’), but don’t forget to pass in the recursive keyword