Skip to content
Advertisement

Best way to automate terminal line commands? Mac / get_iPlayer [closed]

I’m looking for some advice as to how to automate commands via Terminal using a Mac.

I’m using a CLI programme called get_iPlayer which essentially allows you to download and store BBC iPlayer content locally. This isn’t actually why I’m using it though. I’m collecting data about its catalogue (for academic research purposes, not commercial use) and it’s been incredibly useful to that end. It caches a whole bunch of data about all the programmes in the catalogue, but not ALL of the available data.

To get that other data you have to specify the particular programme via a command in terminal which will then download a small xml file containing a relevant metadata. Here’s what the command looks like:

get_iplayer --type=tv --pid<insert unique programme code here> --metadata-only

I have a list of all of the unique programme codes (they’re called pid) in a csv file, and I’m wondering if it would be possible to automate this process. I.e. write a script that runs that line but with a different pid each time.

There are a couple of caveats worth mentioning:

  1. I only really have experience of using R. I suspect python will be better, and if that’s the only option, then fair enough!
  2. There are 129,000 of these to look up, so if it is possible to automate this process, I would need to write a script that adds a delay between each new query so as not to kill the BBC servers!

Any advice on how to approach this would be most welcome. I have a very limited knowledge of programming, so apologies if this is a very obvious question!

Advertisement

Answer

R’s system or system2 can call an external program just fine, though I personally prefer the processx package for its better handling of arguments. My guess is that you can do something like this (unverified, I don’t have that cli tool):

pids <- c('111', '222', '333')
out <- lapply(sprintf("get_iplayer --type=tv --pid%s --metadata-only", pids),
             function(pid) { Sys.sleep(3); system(pid, intern = TRUE); })

This uses a 3-second sleep before each command, tune that to your own preferences. I don’t know if this abides by their terms-of-use, nor if they will throttle your connections for you. It might be more robust to do this in a way that preserves all already-retrieved data if the command starts failing or you need to interrupt it:

out <- list()
errcount <- 0
for (pid in pids) {
  out[[pid]] <- tryCatch({
    system(sprintf("get_iplayer --type=tv --pid%s --metadata-only", pid),
           intern = TRUE)
  }, error = function(e) e)
  if (inherits(out[[pid]], "error")) {
    errcount <- errcount + 1
    if (errcount > 3) {
      warning("errcount over 3, stopping", call. = FALSE)
      break
    }
  }
}

If the command returns a non-0 exit status (0 generally means normal operations), then R will emit a warning and continue. If something severe goes wrong, it’ll emit an error (stop), but try to continue anyway. I added the errcount steps to be a little more robust to repeated-failures: if it happens repeatedly (3 may not be the right number for your tolerance), you may not want to keep going, especially since you’re handling 129K of these.

R has no advantage doing this over other languages.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement