I am trying to read a pdf using python and the content has many newline (crlf) characters. I tried removing them using below code: But the output remains unchanged. I tried using double backslashes also which didn’t fix the issue. can someone please advise? Answer I don’t have access to your pdf f…
Tag: python
PySpark 2.4 – Read CSV file with custom line separator
Support for custom line separators (for various text file formats) was added to spark in 2017 (see: https://github.com/apache/spark/pull/18581). … or maybe it wasn’t added in 2017 – or ever (see: https://github.com/apache/spark/pull/18304) Today, with Pyspark 2.4.0 I am unable to use custom …
Difference between transpose() and .T in Pandas
I have a sample of data: I want to display simple statistics of the dataset in pandas using describe() method. Output 1: Is there any difference between the two workflows when I am ending up with the same result? Output 2: References: Pandas | API documentation | pandas.DataFrame.transpose Answer There is no …
TypeError: train_test_split() got an unexpected keyword argument ‘test_size’
I’m trying to find the best feature set using random forest approach I need to split the dataset into test and train. here is my code parameters data,data_y are parsing correctly. But I’m getting the following error. I couldn’t figure out why this is. Answer You are using the same function n…
Python how to find the minimum number of moves for a directory iteration – crawler
I’m working on a Python(3) program in which I have to return the number of moves for a directory iteration by using the input as a list of multiple iterations denotes various actions like: ../ denotes move to the parent folder of the current folder. ./ remain in the same folder x/ move to the child fold…
Using Pytube to download playlist from YouTube
I am looking to download a YouTube playlist using the PyTube library. Currently, I am able to download a single video at a time. I cannot download more than one video at once. Currently, my implimentation is This results in the following output And the YouTube file is downloaded. When I try this with a playli…
Use Python click command to invoke a class method with variadic arguments
I have a class that gets initialized with a previously unknown number of arguments and I want it to be done on CLI using Python’s click package. My issue is that I can’t manage to initialize it and run a click command: Setting a defined number of arguments, like nargs=5, solves the issue of missin…
Connecting to Cloud SQL from Google Cloud Function using Python and SQLAlchemy
I read all documentation related to connecting to MysQL hosted in Cloud SQL from GCF and still can’t connect. Also, tried all hints in documentation of SQLAlchemy related to this. I am using the following connection The error I got was: (pymysql.err.OperationalError) (2003, “Can’t connect to…
How does one ignore extra arguments passed to a dataclass?
I’d like to create a config dataclass in order to simplify whitelisting of and access to specific environment variables (typing os.environ[‘VAR_NAME’] is tedious relative to config.VAR_NAME). I therefore need to ignore unused environment variables in my dataclass’s __init__ function, b…
Calculating min, max without using a list
I am trying to solve a problem from a python textbook: Write a program that asks the user to enter the number of times that they have run around a racetrack, and then uses a loop to prompt them to enter the lap time for each of their laps. When the loop finishes, the program should display the time of