Skip to content
Advertisement

Airflow/EC2 – Save CSV from DAG

I was looking around but I could not find anyone that may have a similar issue. I’m trying to save some results to a CSV into my EC2 instance, but for some reason the return value is none. Here is what I have:

# Saving local_path variable for multiple use:
local_filepath = f'/home/ec2-user/{csv_name}'

def load_db():
        db_hook = PostgresHook(postgres_conn_id='DB_CONNECTION')

        # Query to grab desired results:
        df = db_hook.get_pandas_df(QUERY)

        # Save query results in CSV. Take headers, index and add double quotes:
        print(f'Saving in {local_filepath}')
        return df.to_csv(csv_name, header=False, index=False, quoting=1)

I have to get this output in a CSV and therefore the use of Pandas. This is working in my local environment, but when pushing to EC2 the DAG does not return anything (it does NOT fail too).

Any ideas what I should be looking out for? Maybe permissions, a specific path, configuration of Airflow itself?

Thanks in advance.

Advertisement

Answer

After a couple of days of trial and error, I was able to save the file by just specifying the actual location rather than defaulting to the default location:

return df.to_csv(local_filepath, header=False, index=False, quoting=1)
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement