I am using R on a MacBook. I have an Rmarkdown document and I’m trying to use reticulate in order to use python within R.
First I download the libraries:
```{r libraries, warning = FALSE, message = FALSE} library(dplyr) library(reticulate) ```
Next I look at an R chunk and figure out my working directory. Then I write mtcars to my desktop.
```{r chunk, warning = FALSE, message = FALSE} getwd() write.csv(mtcars, '/Users/name/Desktop/mtcars.csv', row.names = TRUE) ```
Then I try to use python instead to read in that csv that I just wrote to my desktop.
```{python} import pandas as pd mtcars = pd.read_csv('/Users/name/Desktop/mtcars.csv') ```
But I get this error:
ModuleNotFoundError: No module named 'pandas' NameError: name 'pd' is not defined
So I went to this R documentation website and discovered that with python you have to import packages differently. So I went to terminal and then I typed in
python -m pip install pandas
It seemed to download OK? But when I return to my Rmarkdown document I can’t seem to get the python code to run and read in the csv. I still get the same error message.
I also saw a similar question on this SO post but I’m certain that my RStudio version is newer than the version in this question, so I don’t the answer hits on the same error exactly.
Advertisement
Answer
An option is to create a virtualenv, install the package and then specify the virtual env to be used
virtualenv_create("py-proj") py_install("pandas", envname = "py-proj")
In the rmarkdown, we can use
--- title: "Testing" output: pdf_document: default html_document: default --- ```{r libraries, warning = FALSE, message = FALSE} library(reticulate) use_virtualenv("py-proj") ``` ```{r chunk, warning = FALSE, message = FALSE} write.csv(mtcars, "/Users/name/Desktop/mtcars.csv", row.names = TRUE) ``` ```{python} import pandas as pd mtcars = pd.read_csv("/Users/name/Desktop/mtcars.csv") mtcars.head(5) ```
-output