Skip to content
Advertisement

Pandas dataframe prints row data and column names by characters and as a tuple instead as a single string

TLDR; version: This is what i’m getting when printing the dataframe:

(C, O, L, 1) ... (C, O, L, 3)

(A, B, C) ... (G, H, I)

But the input (passed as variable coming from query) was the following:

colnames = ['COL1', 'COL2', 'COL3']
data = [('abc', 'def', 'ghi')]

I have a guess that it has to do something with a while loop running, waiting for user input to do this pre-defined query.

Long version

I am developing a tool for myself at work and the basic idea is that there is a terminal where you input some commands and things happen. However there is an anomaly with dataframes, namely that the queried results which are used to create a dataframe are printed character-by-character in a tuple instead of as a single string.

Let me explain.

This is the part for starting my console – it is still very much in experimental phase.

def runConsole(self):
    closed = False

    while not closed:
        cmd = input(":>t")
        if cmd.lower() == "quit":
            closed = True
            print("Closing.")
        else:
            self.readCommand(cmd)


def readCommand(self, cmd):

    # Reading cmd
    cmdSplit = cmd.split()
    cmd = cmdSplit[0].lower()  # it's a string
    params = cmdSplit[1:]  # it's a list

    # Figuring out the command
    if cmd == "getissue":
         self.jira.getIssueByKey(params[0])
    elif cmd == "generateddl":
         self.getForeignKey()
    else:
        print(" Command is not found. ")

This is the one i’m calling and where the anomaly happens.

def getForeignKey(self):
    query = ''' SELECT * FROM SYSCAT.REFERENCES '''

    # Establish connection    
    self.conn = jdba.connect(connectionString)
    cur = self.conn.cursor()
    cur.execute(query)
    result_set = cur.fetchall()

    # Obtain column names
    headers = []
    for header in cur.description:
        headers.append(header[0])
    cur.close()

    print(result_set)
    print(headers)

    self.fkTable = pd.DataFrame(data=result_set, columns=headers)

The two prints are printing the data as expected, example:

headers: [‘COL1’, ‘COL2’, ‘COL3’] and result_set: [(‘abc’, ‘def’, ‘ghi’)]

However after I create the dataframe and i do a head() or any other method that would display row or col data, i am getting the following result:

(C, O, L, 1) … (C, O, L, 3)

(A, B, C) … (G, H, I)

However if instead of a variable i’m passing the hardcoded data directly copied from the printouts earlier:

self.fkTable = pd.DataFrame(data=[('abc', 'def', 'ghi')], columns=['COL1', 'COL2', 'COL3'])

then i am getting the expected prints. I am completely baffled what might be causing this. I included the ‘console’ part with the while loop, because it awfully feels like a problem that is caused by that, strings being essentially a list of characters.

Feedback is most welcome on making the question better.

Advertisement

Answer

I am using jpype to pass arguements to jaydebeapi, because i need truststore.jks for the ssl connection to access the DB2 database. Jpype however uses java in the background and strings returned through it are returned as java objects.

The truststore related part looks like this:

        filepath_script = os.path.dirname(os.path.abspath(__file__))
        ssl_trust_store_location = filepath_script + "truststore.jks"

        jar = filepath_script + "/dbDriver/db2jcc4.jar"
        jvm_path = jpype.getDefaultJVMPath()
        jpype.startJVM(jvm_path,
                       '-Djava.class.path=%s' % jar,
                       '-Djavax.net.ssl.trustStore=%s' % ssl_trust_store_location,
                       '-Djavax.net.ssl.trustStorePassword=%s' % DB2_TRUSTSTORE_PW,
                        convertStrings=True)

What is important, and in fact the solution, is the convertStrings=True parameter added here.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement