Skip to content
Advertisement

PySpark 2.4 – Read CSV file with custom line separator

Support for custom line separators (for various text file formats) was added to spark in 2017 (see: https://github.com/apache/spark/pull/18581). … or maybe it wasn’t added in 2017 – or ever (see: https://github.com/apache/spark/pull/18304) Today, with Pyspark 2.4.0 I am unable to use custom line separators to parse CSV files. Here’s some code: Here’s two sample csv files: one.csv – lines are separated

Difference between transpose() and .T in Pandas

I have a sample of data: I want to display simple statistics of the dataset in pandas using describe() method. Output 1: Is there any difference between the two workflows when I am ending up with the same result? Output 2: References: Pandas | API documentation | pandas.DataFrame.transpose Answer There is no difference. As mentioned in the T attribute documentation,

Using Pytube to download playlist from YouTube

I am looking to download a YouTube playlist using the PyTube library. Currently, I am able to download a single video at a time. I cannot download more than one video at once. Currently, my implimentation is This results in the following output And the YouTube file is downloaded. When I try this with a playlist link (An example) only

Connecting to Cloud SQL from Google Cloud Function using Python and SQLAlchemy

I read all documentation related to connecting to MysQL hosted in Cloud SQL from GCF and still can’t connect. Also, tried all hints in documentation of SQLAlchemy related to this. I am using the following connection The error I got was: (pymysql.err.OperationalError) (2003, “Can’t connect to MySQL server on ‘localhost’ ([Errno 111] Connection refused)”) (Background on this error at: http://sqlalche.me/e/e3q8)

How does one ignore extra arguments passed to a dataclass?

I’d like to create a config dataclass in order to simplify whitelisting of and access to specific environment variables (typing os.environ[‘VAR_NAME’] is tedious relative to config.VAR_NAME). I therefore need to ignore unused environment variables in my dataclass’s __init__ function, but I don’t know how to extract the default __init__ in order to wrap it with, e.g., a function that also

Calculating min, max without using a list

I am trying to solve a problem from a python textbook: Write a program that asks the user to enter the number of times that they have run around a racetrack, and then uses a loop to prompt them to enter the lap time for each of their laps. When the loop finishes, the program should display the time of

SMTPAuthenticationError 5.7.14 Please logn5.7.14 in via your web browser

I have a script which sends periodically reports to a list of recipients. Everything worked fine until today 4 am, when I checked my inbox and the reports didn’t come. By debugging the code: I receive the following (old known) result: (250, b’smtp.gmail.com at your service, [SERVERIP]nSIZE 35882577n8BITMIMEnSTARTTLSnENHANCEDSTATUSCODESnPIPELININGnCHUNKINGnSMTPUTF8′) (220, b’2.0.0 Ready to start TLS’) (250, b’smtp.gmail.com at your service, [SERVERIP]nSIZE

Advertisement