I have 10000 jsons with different ids each has 10000 names. How to flatten nested arrays by merging values by int or str in pyspark? EDIT: I have added column name_10000_xvz to explain better data structure. I have updated Notes, Input df, required output df and input json files as well. Notes: Input dataframe has more than 10000 columns name_1_a,
Tag: sql
How to clean data so that the correct arrival code is there for the city pair?
How to clean data so that the correct arrival code is there for the city pair? From the picture, the CSV is like column 1: City Pair (Departure – Arrival), column 2 is meant to be the Departure Code, and column 3 is meant to be the Arrival Code. As you can see for row 319 in the first column,
How to create new table with first name only in table
I have some data that looks like this: I’d like to create a new table with the name column but with the first name only. Answer This gets the first substring before the space character in name as first_name. first_name Arizona Emerald
DataFrame comparison with SQL Server table and upload just the differences
I have an SQL table (table_1) that contains data, and I have a Python script that reads a csf and creates a dataframe. I want to compare the dataframe with the SQL table data and then insert the missing data from the dataframe into the SQL table. I went around and read this comparing pandas dataframe with sqlite table via
Iterating SQL query inside python loop and changing the value of date function in SQL query with every loop
I have a SQL query which I want to iterate using python for loop. Is there a way where I can define a variable inside the sql query and update it’s value with each python loop? date1 = datetime.date(2017, 1, 1) date2 = datetime.date(2017, 12, 31) for d in daterange(date1, date2): SQL = “SELECT * FROM table WHERE TABLE.CREATED_AT =
Snowflake table created with SQLAlchemy requires quotes (“”) to query
I am ingesting data into Snowflake tables using Python and SQLAlchemy. These tables that I have created all require quotations to query both the table name and the column names. For example, select * from “database”.”schema”.”table” where “column” = 2; Will run, while select * from database.schema.table where column = 2; will not run. The difference being the quotes. I
Python and SQL: Getting rows from csv results in ERROR: “There are more columns in the INSERT statement than values specified in the VALUES clause.”
I have a csv file with several records that I am trying to import into a SQL table via a Python script. My csv file (now reduced to) just one row of 1s. Here is what I am trying to do (after successfully connecting to the database etc etc…): No matter how I format the data in the csv (right
How to create a summary table with subheadings in SQL?
I am using PostgreSQL to create summaries from data using a python script. I had large amounts of data in my SQL table and using the following query I was able to get the required data. Below is my query: And below is the table created: I am trying to create a query on this table to further create a
Python – Store cryptography keys in SQL database
Working on a “Password Saver” and will be using the module “cryptography” to encrypt the passwords. I need to save the key you generate from cryptography in the database as well, but I am not sure how you actually do this. Done some google searches myself and it seems to be called a “byte string”? Not really sure what it
Import data from python (probleme with where condition)
I work in Python I have code that allows me to import a dataset that works fine. However in my dataset I have 3 different patients and I would like to import only the patient that interests me (possible by adding the WHERE statement in the SQL query. So the following code works: It return the patient 14 data But