Skip to content
Advertisement

Function to read csv string

This question is duplicate of a now deleted earlier question now edited as per the comments provided in the original.

I am trying to create a function which takes a string input in CSV format. For example "id,name,age,scoren1,Jack,NULL,12n17,Betty,28,11".

It should return the follow table:

id name age score
1 Jack NULL 12
17 Betty 28 11

It should also remove the defective rows. A defective row is when it has value NULL in all capital letters — any other characters like (0 to 9 or a to z or A to Z) are acceptable.

The final output from the above input string should be:

id name age score
17 Betty 28 11

Here is my code using pandas and csv packages. I need to create this without using any of these packages.

def test(S):
    result = pd.DataFrame(csv.reader(S.splitlines()))
    new_header = result.iloc[0]
    result = result[1:]
    result.columns = new_header 
    df = result.select_dtypes(object)
    new_result = ~df.apply(lambda series: series.str.contains('NULL')).any(axis=1)
    f_result = result[new_result]
    return f_result

Advertisement

Answer

Here’s how to do what you say you want. Your description of the desired “table” output is somewhat vague, so I made my best guess.

def get_rows(data):
    rows = []
    for line in data.splitlines():
        fields = line.split(',')
        if not any(field == 'NULL' for field in fields):  # Not defective row.
            rows.append(fields)
    return rows


csv_string = 'id,name,age,scoren1,Jack,NULL,12n17,Betty,28,11'
rows = get_rows(csv_string)

# Find longest item in each column.
widths = [max(len(item) for item in col) for col in zip(*rows)]

# Create a row of separators and make it the second row of the list.
separator_row = [width*'-' for width in widths]
rows.insert(1, separator_row)  # Insert following header row.

# Create a format specification for rows of table.
field_specs = [f' {{:{width}}} ' for width in widths]
format_spec = '|' + '|'.join(field_specs) + '|'

# Print formatted data.
for row in rows:
    print(format_spec.format(*row))

Plain text sample output:

| id | name  | age | score |
| -- | ----- | --- | ----- |
| 17 | Betty | 28  | 11    |
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement