Convenience wrapper that downloads records from OpenAlex via
pro_request() and converts them directly to an Apache Parquet
dataset via pro_request_parquet(). No intermediate JSONL
files are written.
Usage
pro_fetch(
query_url,
pages = 10000,
project_folder = NULL,
overwrite = FALSE,
api_key = pro_api_key(),
delete_input = TRUE,
workers = 1,
verbose = FALSE,
progress = TRUE,
enrich = TRUE,
count_only,
error_log = NULL
)Arguments
- query_url
The URL of the API query or a list of URLs returned from
pro_query().- pages
The number of pages to be downloaded. The default is set to 10000, which would be 2,000,000 works. It is recommended to not increase it beyond 100000 due to server load and to use the snapshot instead. If
NULL, all pages will be downloaded. Default: 100000.- project_folder
Directory where intermediate (
json) and final (parquet) results are stored. If it does not exist, it is created. IfNULL, a temporary directory is created.- overwrite
Logical. If
TRUE, thejsonandparquetsubdirectories are deleted fromproject_folderbefore the pipeline starts. IfFALSE(the default) and any of those subdirectories already exist, the function stops with an error.- api_key
Character string API key or
NULL. Defaults topro_api_key(). IfNULLor"", requests are sent without an API key (subject to OpenAlex's unauthenticated limits).- delete_input
Logical. If
TRUE(the default), thejsonsubfolder is deleted after successful conversion to Parquet.- workers
Number of parallel workers to use if
query_urlis a list. Defaults to 1.- verbose
Logical indicating whether to show verbose messages.
- progress
Logical indicating whether to show a progress bar. Default
TRUE.- enrich
Logical. When
TRUE(the default) and the inferred schema containsabstract_inverted_index/authorships/publication_year, addabstractandcitationcomputed columns.- count_only
Do not use it here. The function will abort if set to
TRUEand give a warning ifFALSE.- error_log
location of error log of API calls. (default:
NULL(none)).
Details
The function
downloads records from OpenAlex via
pro_request()into a"json"subfolder ofproject_folder, andconverts the JSON files to an Apache Parquet dataset via
pro_request_parquet()into a"parquet"subfolder.
This function assumes count_only == FALSE
See also
pro_request() for the download step,
pro_request_parquet() for the conversion step.