7 Pre- Live Session Content
7.1 VBD Hub Overview
The VBD Hub is a non-profit, open-source project funded by UKRI and Defra, which aims to improve accessibility and information sharing. To do this, the project builds infrastructure and tools to allow researchers to combine knowledge and share data within the VBD research community and with policymakers.
The hub site is home to resources to help your research, and spaces for collaboration and networking with the VBD community.
In this session, we will cover some of the key resources available through the VBD Hub and how to use them effectively.
7.3 VBD Hub Resources & How to Use Them
7.3.1 Hub Search
As we have seen, the Hub Search can be found under the Find Data tab on the VBD Hub site.
Hub Search makes discovering datasets much easier by searching multiple data sources in one place, allowing you to identify datasets relevant to your research and explore metadata before downloading several individual datasets.
On the Find Data page, we can find a Filter menu with several drop down options:
- Category - filter your search based on data type. Hub Search currently allows you to filter Occurrence, Abundance, Traits, Proteomics, and Epidemiological data.
- Database - filter by source database - VecDyn, VecTraits, GBIF, ProteomeXchange, and VBD Hub.
- Published - set the start date and end date of the publications you want to search.
- Location - draw polygons around the geographical area you want to search.
- Taxonomy - search for a specific taxon.
- Full text search - filter your search by more specific text fields.

Note: Through this workshop, we will discuss resources that search and retrieve data from several open-access VBD databases. If you are unfamiliar with these databases, they include:
- VecDyn - vector population dynamics, including how vector populations change over time and across locations.
- VecTraits - vector trait data, such as life history, behavioural, and ecological traits.
- GBIF - species occurrence records on the location and time vector species have been observed.
- ProteomeXchange - proteomic and molecular vector data.
- AreaData - environmental and geographic data.
- NCBI- genetic and genomic data.
Once we have searched for specific data, we can review the search results. Results typically include the dataset name, the source database, and a brief description of the dataset.
Clicking on resulting datasets lets us explore that data in more detail, including metadata, geographic or temporal coverage, and access or download options.
After reviewing your data, you might decide to refine your search depending on:
- Relevance - does the resulting dataset answer your research question?
- Coverage - does the resulting data include the right location or time period for your research?
- Structure - is the data in a usable format or file type?
You can refine your search by adding more parameters, trying more specific key words, or combining terms, such as “species + country”.
Tip: Start broad, then narrow your search by adding more parameters where necessary.
Remember, you don’t have to download everything - you should focus on datasets that are the most useful to you.
7.3.2 Hub Search Task
Use the Hub Search to find datasets on abundance of Ixodes ricinus. How many results did your search return?
Select one dataset from your search and identify the:
- Dataset name
- Publication date
- Source database
Use the Response Form at the end of the Pre- Live Session content to record your answers.
7.3.3 ohvbd package
ohvbd is an R package developed by the VBD Hub that allows you to search for and retrieve data within R, without needing to download files from multiple sources.
It connects to several VBD database sources at once, including VBD Hub (vbdhub), VecTraits (vt), VecDyn (vd), GBIF (gbif), and AreaData (px), and pulls datasets directly into your R workflow.
You can install ohvbd from CRAN by running this code in R:
ohvbd uses a piped workflow, which allows us to build on each step of our code. For example, if we wanted to search for data on Ixodes ricinus from the VecTraits database, we can run:
ixodes_ricinus_data <- search_hub("Ixodes ricinus") |>
filter_db("vt") |>
fetch(connections = 8) |>
glean()Let’s break this down a bit so we can understand what is happening:
-
search_hub()- searches for datasets matching your criteria, here we want to search for “Ixodes ricinus”. -
filter_db()- narrows results to a specific database, in this case “vt”, VecTraits. -
fetch()- retrieves the data. -
glean()- converts the data into a usable table format.
Tip: We can consider a basic search with ohvbd as 4 stages:
- Find the data we are looking for.
- Filter the search field.
- Fetch the data.
- Format the data.
The data we retrieve from ohvbd is often raw and its formatting depends on the original source database. After using the package, you will then need to wrangle and analyse the data yourself.
We can make our search more refined by adding more search parameters and retrieving the IDs for any datasets that match those parameters:
search_hub(
query = "",
db = c("vt", "vd", "gbif", "px"),
fromdate = NULL,
todate = NULL,
locationpoly = NULL,
taxonomy = NULL,
exact = FALSE,
withoutpublished = TRUE,
returnlist = FALSE
)Let’s also break this down so we can understand what each argument does:
- query - what you are searching for, such as a species name.
- db - which databases we want to search
- fromdate - the date we want to search from.
- todate - the date we want to search up to.
- locationpoly - set our search to a geographic area.
- taxonomy - advanced search by species ID.
- exact - whether to return exact matches only.
- withoutpublished - whether to return results without a publishing date when filtering by date.
- returnlist - return the raw output list of datasets, rather than a formatted dataframe.
Important: When adding parameters, you do not need to use all of these arguments at once. Start simple and build a pipeline that aligns with your search aims.
Tip: fromdate and todate use ISO format: yyyy-mm-dd.
7.3.4 ohvbd Task
Let’s have a go at retrieving data using ohvbd. Consider what we would want to include to search for data on Aedes aegypti* from VecTraits.
We first want to define our search string using search_hub():
search_hub("Aedes aegypti")Next, we want to filter our search so that data is only retrieved from VecTraits:
filter_db("vt")Now we can combine these lines of code to retrieve and format the data:
aedes_aegypti_data <- search_hub("Aedes aegypti", db = "vt") |>
filter_db("vt") |>
fetch(connections = 8) |>
glean()Notice we also added fetch() and glean() to our pipeline. These ensure the data is downloaded and formatted for further wrangling and analysis.
Once we have searched and retrieved the data we want, we can inspect the dataset:
head(aedes_aegypti_data)
#> <ohvbd.data.frame>
#> Database: vt
#> Id DatasetID IndividualID OriginalID
#> 1 110069 893 SK055
#> 2 110070 893 SK055
#> 3 110065 892 SK056
#> 4 110066 892 SK056
#> 5 110067 892 SK056
#> 6 110068 892 SK056
#> OriginalTraitName OriginalTraitDef
#> 1 development time mean duration of life stage
#> 2 development time mean duration of life stage
#> 3 body size mean wing length
#> 4 body size mean wing length
#> 5 body size mean wing length
#> 6 body size mean wing length
#> StandardisedTraitName StandardisedTraitDef
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> OriginalTraitValue OriginalTraitUnit OriginalErrorPos
#> 1 21.00 days NA
#> 2 21.00 days NA
#> 3 3.01 mm 0.12
#> 4 3.30 mm 0.10
#> 5 2.80 mm 0.10
#> 6 3.22 mm 0.03
#> OriginalErrorNeg OriginalErrorUnit StandardisedTraitValue
#> 1 NA <NA> NA
#> 2 NA <NA> NA
#> 3 0.12 SE NA
#> 4 0.10 SE NA
#> 5 0.10 SE NA
#> 6 0.03 SE NA
#> StandardisedTraitUnit StandardisedErrorPos
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> StandardisedErrorNeg StandardisedErrorUnit Replicates
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Habitat LabField ArenaValue ArenaUnit ArenaValueSI
#> 1 terrestrial laboratory NA NA NA
#> 2 terrestrial laboratory NA NA NA
#> 3 terrestrial laboratory NA NA NA
#> 4 terrestrial laboratory NA NA NA
#> 5 terrestrial laboratory NA NA NA
#> 6 terrestrial laboratory NA NA NA
#> ArenaUnitSI AmbientTemp AmbientTempMethod AmbientTempUnit
#> 1 NA NA NA NA
#> 2 NA NA NA NA
#> 3 NA NA NA NA
#> 4 NA NA NA NA
#> 5 NA NA NA NA
#> 6 NA NA NA NA
#> AmbientLight AmbientLightUnit SecondStressor
#> 1 NA NA <NA>
#> 2 NA NA <NA>
#> 3 NA NA <NA>
#> 4 NA NA <NA>
#> 5 NA NA <NA>
#> 6 NA NA <NA>
#> SecondStressorDef
#> 1 estimate of total food (tetramin) provided per larvae per day
#> 2 estimate of total food (tetramin) provided per larvae per day
#> 3 estimate of total food (tetramin) provided per larvae per day
#> 4 estimate of total food (tetramin) provided per larvae per day
#> 5 estimate of total food (tetramin) provided per larvae per day
#> 6 estimate of total food (tetramin) provided per larvae per day
#> SecondStressorValue SecondStressorUnit TimeStart TimeEnd
#> 1 0.0952381 mg 2013 2013
#> 2 0.4761905 mg 2013 2013
#> 3 0.0952381 mg 2013 2013
#> 4 0.4761905 mg 2013 2013
#> 5 0.0952381 mg 2013 2013
#> 6 0.4761905 mg 2013 2013
#> TotalObsTimeValue TotalObsTimeUnit TotalObsTimeValueSI
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> TotalObsTimeUnitSI TotalObsTimeNotes ResRepValue
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> ResRepUnit ResRepValueSI ResRepUnitSI
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Location
#> 1 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 2 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 3 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 4 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 5 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 6 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> LocationType OriginalLocationDate LocationDate
#> 1 colony NA <NA>
#> 2 colony NA <NA>
#> 3 colony NA <NA>
#> 4 colony NA <NA>
#> 5 colony NA <NA>
#> 6 colony NA <NA>
#> LocationDatePrecision CoordinateType Latitude Longitude
#> 1 0 decimal -22.87736 -43.24356
#> 2 0 decimal -22.87736 -43.24356
#> 3 0 decimal -22.87736 -43.24356
#> 4 0 decimal -22.87736 -43.24356
#> 5 0 decimal -22.87736 -43.24356
#> 6 0 decimal -22.87736 -43.24356
#> Interactor1 Interactor1Common Interactor1Wholepart
#> 1 Aedes albopictus <NA> NA
#> 2 Aedes albopictus <NA> NA
#> 3 Aedes albopictus <NA> NA
#> 4 Aedes albopictus <NA> NA
#> 5 Aedes albopictus <NA> NA
#> 6 Aedes albopictus <NA> NA
#> Interactor1WholePartType Interactor1Number
#> 1 NA 100
#> 2 NA 20
#> 3 NA 100
#> 4 NA 20
#> 5 NA 100
#> 6 NA 20
#> Interactor1Kingdom Interactor1Phylum Interactor1Class
#> 1 Animalia Arthropoda Insecta
#> 2 Animalia Arthropoda Insecta
#> 3 Animalia Arthropoda Insecta
#> 4 Animalia Arthropoda Insecta
#> 5 Animalia Arthropoda Insecta
#> 6 Animalia Arthropoda Insecta
#> Interactor1Order Interactor1Family Interactor1Genus
#> 1 Diptera Culicidae Aedes
#> 2 Diptera Culicidae Aedes
#> 3 Diptera Culicidae Aedes
#> 4 Diptera Culicidae Aedes
#> 5 Diptera Culicidae Aedes
#> 6 Diptera Culicidae Aedes
#> Interactor1Species Interactor1Stage Interactor1Sex
#> 1 albopictus juvenile (L1 to pupa) indeterminate
#> 2 albopictus juvenile (L1 to pupa) indeterminate
#> 3 albopictus adult female
#> 4 albopictus adult female
#> 5 albopictus adult female
#> 6 albopictus adult female
#> Interactor1Temp Interactor1TempUnit Interactor1TempMethod
#> 1 25 Celsius NA
#> 2 25 Celsius NA
#> 3 25 Celsius NA
#> 4 25 Celsius NA
#> 5 25 Celsius NA
#> 6 25 Celsius NA
#> Interactor1GrowthTemp Interactor1GrowthTempUnit
#> 1 NA <NA>
#> 2 NA <NA>
#> 3 NA <NA>
#> 4 NA <NA>
#> 5 NA <NA>
#> 6 NA <NA>
#> Interactor1GrowthDur Interactor1GrowthdDurUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1GrowthType Interactor1Acc Interactor1AccTemp
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor1AccTempNotes Interactor1AccTime
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1AccTimeNotes Interactor1AccTimeUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1OrigTemp Interactor1OrigTempNotes
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1OrigTime Interactor1OrigTimeNotes
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1OrigTimeUnit Interactor1EquilibTimeValue
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1EquilibTimeUnit Interactor1Size
#> 1 NA <NA>
#> 2 NA <NA>
#> 3 NA <NA>
#> 4 NA <NA>
#> 5 NA <NA>
#> 6 NA <NA>
#> Interactor1SizeUnit Interactor1SizeType Interactor1SizeSI
#> 1 <NA> <NA> <NA>
#> 2 <NA> <NA> <NA>
#> 3 <NA> <NA> <NA>
#> 4 <NA> <NA> <NA>
#> 5 <NA> <NA> <NA>
#> 6 <NA> <NA> <NA>
#> Interactor1SizeUnitSI Interactor1DenValue
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> 6 <NA> <NA>
#> Interactor1DenUnit Interactor1DenTypeSI
#> 1 juvenile NA
#> 2 juvenile NA
#> 3 adult NA
#> 4 adult NA
#> 5 adult NA
#> 6 adult NA
#> Interactor1DenValueSI Interactor1DenUnitSI
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1MassValueSI Interactor1MassUnitSI Interactor2
#> 1 NA NA None None
#> 2 NA NA None None
#> 3 NA NA None None
#> 4 NA NA None None
#> 5 NA NA None None
#> 6 NA NA None None
#> Interactor2Common Interactor2Kingdom Interactor2Phylum
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2Class Interactor2Order Interactor2Family
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2Genus Interactor2Species Interactor2Stage
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2Sex Interactor2Temp Interactor2TempUnit
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2TempMethod Interactor2GrowthTemp
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2GrowthTempUnit Interactor2GrowthDur
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2GrowthDurUnit Interactor2GrowthType
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2Acc Interactor2AccTemp Interactor2AccTempNotes
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2AccTime Interactor2AccTimeNotes
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2AccTimeUnit Interactor2OrigTemp
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2OrigTempNotes Interactor2OrigTime
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2OrigTimeNotes Interactor2OrigTimeUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2EquilibTimeValue Interactor2EquilibTimeUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2Size Interactor2SizeUnit Interactor2SizeType
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2SizeSI Interactor2SizeUnitSI
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2DenValue Interactor2DenUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2DenTypeSI Interactor2DenValueSI
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2DenUnitSI Interactor2MassValueSI
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2MassUnitSI PhysicalProcess PhysicalProcess_1
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> PhysicalProcess_2 FigureTable
#> 1 NA figure 1
#> 2 NA figure 1
#> 3 NA figure 1
#> 4 NA figure 1
#> 5 NA figure 1
#> 6 NA figure 1
#> Citation
#> 1 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 2 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 3 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 4 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 5 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 6 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> CuratedByCitation
#> 1 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 2 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 3 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 4 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 5 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 6 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> CuratedByDOI
#> 1 10.1038/s41467-024-52144-5
#> 2 10.1038/s41467-024-52144-5
#> 3 10.1038/s41467-024-52144-5
#> 4 10.1038/s41467-024-52144-5
#> 5 10.1038/s41467-024-52144-5
#> 6 10.1038/s41467-024-52144-5
#> DOI SubmittedBy
#> 1 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 2 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 3 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 4 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 5 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 6 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> ContributorEmail Notes DefaultChartXaxis
#> 1 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 2 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 3 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 4 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 5 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 6 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> DefaultChartCategory
#> 1 Location
#> 2 Location
#> 3 Location
#> 4 Location
#> 5 Location
#> 6 LocationRemember, data retrieved using ohvbd can be raw and depends on the format of the source database. When inspecting the data, it is useful to consider:
- What type of data has the search returned?
- How many columns are there?
- Are there any missing values?
Let’s save our data, we will use this later in the workshop:
write.csv(aedes_aegypti_data, "aedes_aegypti_data.csv", row.names = FALSE)Note: If you want to practice more examples of using the ohvbd package, try looking at these vignettes:
7.3.5 VBD Hub Forum
Earlier in this content, we found the Hub Forum under the Community tab. The VBD Hub Forum is a space to ask questions and share knowledge with the VBD community.
Discussions on the Forum are organised using categories and tags so you can easily find topics relevant to you. When you post in a discussion, this contributes to an ongoing thread with other users.
You can engage with the Forum by:
- Following discussions that interest you.
- Posting a new question or conversation prompt.
- Sharing resources and opportunities.
- Responding to existing threads.
7.3.6 VBD Hub Forum Task
Log in or sign up to the VBD Hub Forum and find the workshop discussion under the Training topic.
Create a short post to introduce yourself, including your name, research interests, and what you hope to gain from this workshop.
You might want to reply to each others’ posts to network with your fellow participants.
7.4 Data Wrangling Principles
Data wrangling is the process of cleaning, transforming, and organising raw data into a format that is suitable for your analysis.
Data retrieved through the Hub Search or the ohvbd package will often be raw, with missing values or messy data formats. Understanding data wrangling principles will help you organise your data into usable formats to make further statistical analysis smoother and easier.
7.4.1 Do’s and Don’ts of Data Wrangling
1. Understand your data first.
-
Do: Explore your data before making any changes. We can look at the first few rows of our dataset using
head(), which allows us to better understand our data, including checking the column names and data types. - Don’t: Jump straight into cleaning the data without fully understanding how it is formatted. Without understanding the data, you risk misinterpreting variables and accidentally removing useful data.
2. Save your raw data.
- Do: Keep an unchanged version of the raw dataset so you can access a previous version if something goes wrong, reproduce your work, and verify your results.
-
Don’t: Overwrite your original data. When wrangling your data in R, assign your cleaned data to a new object:
clean_data <- raw_data.
3. Use clear & consistent naming.
- Do: Use informative column, model, and object names so you know what your object is explicitly. Clear naming makes your R workflow easier for others (and yourself) to understand.
- Don’t: Use unclear names or names with messy formatting. Try to avoid spaces, special characters, or specific abbreviations.
4. Reformat your data.
-
Do: Convert your data to long format for analysis, using
pivot_longer(). In this format, each row represents a single observation, which can make the data easier to filter and analyse. - Don’t: Use data in wide format, where values are spread across multiple columns. This can limit data wrangling and processing, such as grouping across different years and generating effective visualisations.
5. Record what you do.
- Do: Keep track of your progress by recording what changes you made and why you used that approach. It is good practice to note these changes as comments in your code.
- Don’t: Rely on memory alone. It is easy to forget what analytical methods you used, why you chose that approach, and what order you processed your data in. Keeping clear records of your workflow contributes to better reproducibility.
Frequent Mistake: R cannot handle spaces in object names. There are a few options for alternative syntax, we would recommend using camelCase where letters are capitalised to indicate new words (e.g. specificModelName), or using an underscore to connect words (e.g. specific_model_name).
If using camelCase, remember that R is case sensitive - if your object is named specificModelName, but you call specificmodelname, R will show an error:
Error: object ‘specificmodelname’ not found
7.4.2 Data Wrangling Task
Let’s have a go at applying these data wrangling principles to the dataset we retrieved using ohvbd.
1. Understand your data first - run head() before making any changes to your data:
head(aedes_aegypti_data)
#> <ohvbd.data.frame>
#> Database: vt
#> Id DatasetID IndividualID OriginalID
#> 1 110069 893 SK055
#> 2 110070 893 SK055
#> 3 110065 892 SK056
#> 4 110066 892 SK056
#> 5 110067 892 SK056
#> 6 110068 892 SK056
#> OriginalTraitName OriginalTraitDef
#> 1 development time mean duration of life stage
#> 2 development time mean duration of life stage
#> 3 body size mean wing length
#> 4 body size mean wing length
#> 5 body size mean wing length
#> 6 body size mean wing length
#> StandardisedTraitName StandardisedTraitDef
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> OriginalTraitValue OriginalTraitUnit OriginalErrorPos
#> 1 21.00 days NA
#> 2 21.00 days NA
#> 3 3.01 mm 0.12
#> 4 3.30 mm 0.10
#> 5 2.80 mm 0.10
#> 6 3.22 mm 0.03
#> OriginalErrorNeg OriginalErrorUnit StandardisedTraitValue
#> 1 NA <NA> NA
#> 2 NA <NA> NA
#> 3 0.12 SE NA
#> 4 0.10 SE NA
#> 5 0.10 SE NA
#> 6 0.03 SE NA
#> StandardisedTraitUnit StandardisedErrorPos
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> StandardisedErrorNeg StandardisedErrorUnit Replicates
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Habitat LabField ArenaValue ArenaUnit ArenaValueSI
#> 1 terrestrial laboratory NA NA NA
#> 2 terrestrial laboratory NA NA NA
#> 3 terrestrial laboratory NA NA NA
#> 4 terrestrial laboratory NA NA NA
#> 5 terrestrial laboratory NA NA NA
#> 6 terrestrial laboratory NA NA NA
#> ArenaUnitSI AmbientTemp AmbientTempMethod AmbientTempUnit
#> 1 NA NA NA NA
#> 2 NA NA NA NA
#> 3 NA NA NA NA
#> 4 NA NA NA NA
#> 5 NA NA NA NA
#> 6 NA NA NA NA
#> AmbientLight AmbientLightUnit SecondStressor
#> 1 NA NA <NA>
#> 2 NA NA <NA>
#> 3 NA NA <NA>
#> 4 NA NA <NA>
#> 5 NA NA <NA>
#> 6 NA NA <NA>
#> SecondStressorDef
#> 1 estimate of total food (tetramin) provided per larvae per day
#> 2 estimate of total food (tetramin) provided per larvae per day
#> 3 estimate of total food (tetramin) provided per larvae per day
#> 4 estimate of total food (tetramin) provided per larvae per day
#> 5 estimate of total food (tetramin) provided per larvae per day
#> 6 estimate of total food (tetramin) provided per larvae per day
#> SecondStressorValue SecondStressorUnit TimeStart TimeEnd
#> 1 0.0952381 mg 2013 2013
#> 2 0.4761905 mg 2013 2013
#> 3 0.0952381 mg 2013 2013
#> 4 0.4761905 mg 2013 2013
#> 5 0.0952381 mg 2013 2013
#> 6 0.4761905 mg 2013 2013
#> TotalObsTimeValue TotalObsTimeUnit TotalObsTimeValueSI
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> TotalObsTimeUnitSI TotalObsTimeNotes ResRepValue
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> ResRepUnit ResRepValueSI ResRepUnitSI
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Location
#> 1 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 2 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 3 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 4 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 5 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> 6 Instituto Oswaldo Cruz Fundacao Oswaldo Cruz RJ Brazil
#> LocationType OriginalLocationDate LocationDate
#> 1 colony NA <NA>
#> 2 colony NA <NA>
#> 3 colony NA <NA>
#> 4 colony NA <NA>
#> 5 colony NA <NA>
#> 6 colony NA <NA>
#> LocationDatePrecision CoordinateType Latitude Longitude
#> 1 0 decimal -22.87736 -43.24356
#> 2 0 decimal -22.87736 -43.24356
#> 3 0 decimal -22.87736 -43.24356
#> 4 0 decimal -22.87736 -43.24356
#> 5 0 decimal -22.87736 -43.24356
#> 6 0 decimal -22.87736 -43.24356
#> Interactor1 Interactor1Common Interactor1Wholepart
#> 1 Aedes albopictus <NA> NA
#> 2 Aedes albopictus <NA> NA
#> 3 Aedes albopictus <NA> NA
#> 4 Aedes albopictus <NA> NA
#> 5 Aedes albopictus <NA> NA
#> 6 Aedes albopictus <NA> NA
#> Interactor1WholePartType Interactor1Number
#> 1 NA 100
#> 2 NA 20
#> 3 NA 100
#> 4 NA 20
#> 5 NA 100
#> 6 NA 20
#> Interactor1Kingdom Interactor1Phylum Interactor1Class
#> 1 Animalia Arthropoda Insecta
#> 2 Animalia Arthropoda Insecta
#> 3 Animalia Arthropoda Insecta
#> 4 Animalia Arthropoda Insecta
#> 5 Animalia Arthropoda Insecta
#> 6 Animalia Arthropoda Insecta
#> Interactor1Order Interactor1Family Interactor1Genus
#> 1 Diptera Culicidae Aedes
#> 2 Diptera Culicidae Aedes
#> 3 Diptera Culicidae Aedes
#> 4 Diptera Culicidae Aedes
#> 5 Diptera Culicidae Aedes
#> 6 Diptera Culicidae Aedes
#> Interactor1Species Interactor1Stage Interactor1Sex
#> 1 albopictus juvenile (L1 to pupa) indeterminate
#> 2 albopictus juvenile (L1 to pupa) indeterminate
#> 3 albopictus adult female
#> 4 albopictus adult female
#> 5 albopictus adult female
#> 6 albopictus adult female
#> Interactor1Temp Interactor1TempUnit Interactor1TempMethod
#> 1 25 Celsius NA
#> 2 25 Celsius NA
#> 3 25 Celsius NA
#> 4 25 Celsius NA
#> 5 25 Celsius NA
#> 6 25 Celsius NA
#> Interactor1GrowthTemp Interactor1GrowthTempUnit
#> 1 NA <NA>
#> 2 NA <NA>
#> 3 NA <NA>
#> 4 NA <NA>
#> 5 NA <NA>
#> 6 NA <NA>
#> Interactor1GrowthDur Interactor1GrowthdDurUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1GrowthType Interactor1Acc Interactor1AccTemp
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor1AccTempNotes Interactor1AccTime
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1AccTimeNotes Interactor1AccTimeUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1OrigTemp Interactor1OrigTempNotes
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1OrigTime Interactor1OrigTimeNotes
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1OrigTimeUnit Interactor1EquilibTimeValue
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1EquilibTimeUnit Interactor1Size
#> 1 NA <NA>
#> 2 NA <NA>
#> 3 NA <NA>
#> 4 NA <NA>
#> 5 NA <NA>
#> 6 NA <NA>
#> Interactor1SizeUnit Interactor1SizeType Interactor1SizeSI
#> 1 <NA> <NA> <NA>
#> 2 <NA> <NA> <NA>
#> 3 <NA> <NA> <NA>
#> 4 <NA> <NA> <NA>
#> 5 <NA> <NA> <NA>
#> 6 <NA> <NA> <NA>
#> Interactor1SizeUnitSI Interactor1DenValue
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> 6 <NA> <NA>
#> Interactor1DenUnit Interactor1DenTypeSI
#> 1 juvenile NA
#> 2 juvenile NA
#> 3 adult NA
#> 4 adult NA
#> 5 adult NA
#> 6 adult NA
#> Interactor1DenValueSI Interactor1DenUnitSI
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor1MassValueSI Interactor1MassUnitSI Interactor2
#> 1 NA NA None None
#> 2 NA NA None None
#> 3 NA NA None None
#> 4 NA NA None None
#> 5 NA NA None None
#> 6 NA NA None None
#> Interactor2Common Interactor2Kingdom Interactor2Phylum
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2Class Interactor2Order Interactor2Family
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2Genus Interactor2Species Interactor2Stage
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2Sex Interactor2Temp Interactor2TempUnit
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2TempMethod Interactor2GrowthTemp
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2GrowthTempUnit Interactor2GrowthDur
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2GrowthDurUnit Interactor2GrowthType
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2Acc Interactor2AccTemp Interactor2AccTempNotes
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2AccTime Interactor2AccTimeNotes
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2AccTimeUnit Interactor2OrigTemp
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2OrigTempNotes Interactor2OrigTime
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2OrigTimeNotes Interactor2OrigTimeUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2EquilibTimeValue Interactor2EquilibTimeUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2Size Interactor2SizeUnit Interactor2SizeType
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> Interactor2SizeSI Interactor2SizeUnitSI
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2DenValue Interactor2DenUnit
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2DenTypeSI Interactor2DenValueSI
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2DenUnitSI Interactor2MassValueSI
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> Interactor2MassUnitSI PhysicalProcess PhysicalProcess_1
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> PhysicalProcess_2 FigureTable
#> 1 NA figure 1
#> 2 NA figure 1
#> 3 NA figure 1
#> 4 NA figure 1
#> 5 NA figure 1
#> 6 NA figure 1
#> Citation
#> 1 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 2 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 3 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 4 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 5 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> 6 Lima-Camara et al 2022. Body size does not affect locomotor activity of Aedes aegypti and Aedes albopictus females (Diptera:Culicidae). Acta Tropica 231
#> CuratedByCitation
#> 1 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 2 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 3 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 4 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 5 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> 6 Brass et al. 2024. Role of vector phenotypic plasticity in disease transmission as illustrated by the spread of dengue virus by Aedes albopictus. Nat Commun 15: 7823
#> CuratedByDOI
#> 1 10.1038/s41467-024-52144-5
#> 2 10.1038/s41467-024-52144-5
#> 3 10.1038/s41467-024-52144-5
#> 4 10.1038/s41467-024-52144-5
#> 5 10.1038/s41467-024-52144-5
#> 6 10.1038/s41467-024-52144-5
#> DOI SubmittedBy
#> 1 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 2 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 3 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 4 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 5 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> 6 10.1016/j.actatropica.2022.106430 Sarah Kelly
#> ContributorEmail Notes DefaultChartXaxis
#> 1 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 2 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 3 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 4 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 5 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> 6 s.kelly@imperial.ac.uk <NA> SecondStressorValue
#> DefaultChartCategory
#> 1 Location
#> 2 Location
#> 3 Location
#> 4 Location
#> 5 Location
#> 6 LocationNotice what the columns represent, what the data types are, and whether there are any missing values.
2. Save your raw data - save your data as a new object. Any changes you make will use this new object, rather than the raw data:
clean_aedes_data <- aedes_aegypti_data3. Use clear and consistent naming - rename at least two columns with more informative names:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
clean_aedes_data <- clean_aedes_data |>
rename(
original_trait_name = OriginalTraitName,
original_trait_def = OriginalTraitDef
)4. Reformat your data - check whether your dataset is in wide or long format. Remember, we typically want each row to represent a single observation. If our variables are spread across several columns we can reformat our data using pivot_longer(). FOr this dataset, we can see the data is already in long format.
5. Record what you do - add comments to your code explaining what changes you made and why:
# Convert data from wide to long format so it is easier to filter and analyse.
Note: Don’t worry if you weren’t able to retrieve a dataset from ohvbd, we will recap this in the Live Session.
7.5 Response Form
Please complete this Response Form after finishing the tasks above.
This form is anonymous and is not an assessment. Your responses will help us to understand which areas may require more support during the Live Session. We aim to tailor the content to the group’s needs, so you gain the most from this workshop.
7.6 Conclusion & Preparation for Live Session
Ahead of the live session, ensure you keep R and RStudio installed on your device, as well as the packages we prepared earlier.
Please make sure you have Teams set up on your device and that your microphone is working. We will aim to send the link 48 hours before the live session. Please be aware that the live session will be recorded.




