% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/batch_import_fun.R
\name{jst_import}
\alias{jst_import}
\alias{jst_import_zip}
\title{Wrapper for file import}
\usage{
jst_import(
  in_paths,
  out_file,
  out_path = NULL,
  .f,
  col_names = TRUE,
  n_batches = NULL,
  files_per_batch = NULL,
  show_progress = TRUE
)

jst_import_zip(
  zip_archive,
  import_spec,
  out_file,
  out_path = NULL,
  col_names = TRUE,
  n_batches = NULL,
  files_per_batch = NULL,
  show_progress = TRUE,
  rows = NULL
)
}
\arguments{
\item{in_paths}{A character vector to the \code{xml}-files which should be
imported}

\item{out_file}{Name of files to export to. Each batch gets appended by an
increasing number.}

\item{out_path}{Path to export files to (combined with filename).}

\item{.f}{Function to use for import. Can be one of \code{jst_get_article},
\code{jst_get_authors}, \code{jst_get_references}, \code{jst_get_footnotes}, \code{jst_get_book}
or \code{jst_get_chapter}.}

\item{col_names}{Should column names be written to file? Defaults to \code{TRUE}.}

\item{n_batches}{Number of batches, defaults to 1.}

\item{files_per_batch}{Number of files for each batch. Can be used instead of
n_batches, but not in conjunction.}

\item{show_progress}{Displays a progress bar for each batch, if the session
is interactive.}

\item{zip_archive}{A path to a .zip-archive from DfR}

\item{import_spec}{A specification from \link{jst_define_import}
for which parts of a .zip-archive should be imported via which functions.}

\item{rows}{Mainly used for testing, to decrease the number of files which
are imported (i.e. 1:100).}
}
\value{
Writes \code{.csv}-files to disk.
}
\description{
This function applies an import function to a list of \code{xml}-files
or a .zip-archive in case of \code{jst_import_zip} and saves
the output in batches of \code{.csv}-files to disk.
}
\details{
Along the way, we wrap three functions, which make the process of converting
many files easier:
\itemize{
\item \code{\link[purrr:safely]{purrr::safely()}}
\item \code{\link[furrr:future_map]{furrr::future_map()}}
\item \code{\link[readr:write_delim]{readr::write_csv()}}
}

When using one of the \verb{find_*} functions, there should usually be no errors.
To avoid the whole computation to fail in the unlikely event that an error
occurs, we use \code{safely()} which let's us
continue the process, and catch the error along the way.

If you have many files to import, you might benefit from executing the
function in parallel. We use futures for this to give you maximum
flexibility. By default the code is executed sequentially. If you want to
run it in parallel, simply call \code{\link[future:plan]{future::plan()}} with
\code{\link[future:multisession]{future::multisession()}} as an argument before
running \code{jst_import} or \code{jst_import_zip}.

After importing all files, they are written to disk with
\code{\link[readr:write_delim]{readr::write_csv()}}.

Since you might run out of memory when importing a large quantity of files,
you can split up the files to import  into batches. Each batch is being
treated separately, therefore for each batch multiple processes from
\code{\link[future:multisession]{future::multisession()}} are spawned, if you added this plan.
For this reason, it is not recommended to have very small batches,
as there is an overhead for starting and ending the processes. On the other
hand, the batches should not be too large, to not exceed memory limitations.
A value of 10000 to 20000 for \code{files_per_batch} should work fine on most
machines. If the session is interactive and \code{show_progress} is \code{TRUE}, a
progress bar is displayed for each batch.
}
\examples{
\dontrun{
# read from file list --------
# find all files
meta_files <- list.files(pattern = "xml", full.names = TRUE)

# import them via `jst_get_article`
jst_import(meta_files, out_file = "imported_metadata", .f = jst_get_article,
           files_per_batch = 25000)
           
# do the same, but in parallel
library(future)
plan(multiprocess)
jst_import(meta_files, out_file = "imported_metadata", .f = jst_get_article,
           files_per_batch = 25000)

# read from zip archive ------ 
# define imports
imports <- jst_define_import(article = c(jst_get_article, jst_get_authors))

# convert the files to .csv
jst_import_zip("my_archive.zip", out_file = "my_out_file", 
                 import_spec = imports)
} 
}
\seealso{
\code{\link[=jst_combine_outputs]{jst_combine_outputs()}}
}
