Skip to content
Advertisement

Tag: awk

Parse multiple line CSV using PySpark , Python or Shell

Input (2 columns) : Note: Harry and Prof. does not have starting quotes Output (2 columns) What I tried (PySpark) ? Issue The above code worked fine where multiline had both start and end double quotes (For eg: row starting with Ronald) But it didnt work with rows where we only have end quotes but no start quotes (like Harry

Remove duplicates from each cell

I have a file like this and need to remove duplicates in each cell without changing the order or format The missing data are noted as . (dot). So far I have tried with awk But it is killing the format. Is there any other way to do this ? Expected output Answer with sed

Building matrix with values from multiple files

I have multiple files where i need to create a matrix with matching values File_1, which is primary file contains all numbers tab delimited with one row There are multiple files where if a number matches, add 1 or else add 0 to file above File_2 File_3 Output Answer awk to the rescue!

Split CSV values on single row into individual rows

I have a Python script that outputs a text file with thousands of random filenames in a comma separated list, all on a single row. I want to take each value in the list and put it into its own row in a new CSV file. I’ve tried some variations of awk with no success. What’s the best way to

How to completely erase the duplicated lines by linux tools?

This question is not equal to How to print only the unique lines in BASH? because that ones suggests to remove all copies of the duplicated lines, while this one is about eliminating their duplicates only, i..e, change 1, 2, 3, 3 into 1, 2, 3 instead of just 1, 2. This question is really hard to write because I

Advertisement