9 Challenge Task
9.1 Introduction
This Challenge Task provides an opportunity for you to independently apply the skills and concepts discussed throughout this online training, including:
- Navigating the VBD Hub website and where to find key resources.
- Using Hub Search and the ohvbd package to search and retrieve data.
- Data wrangling techniques commonly applied to VBD datasets, including merging and fixing species names.
- Practice applying data wrangling principles to real-world datasets.
- Using the VBD Hub Forum and collaborating within the VBD community.
The Challenge Task has multiple levels and is designed to encourage applied thinking. Feel free to work through the levels that apply to you, but we encourage you to try all levels to make the most of the training.
During the Challenge Task, we encourage you to experiment with different approaches and discuss potential difficulties with each other via the VBD Hub Forum. Our demonstrators and I will be monitoring the Forum if you need any additional support.
After approximately 2 hours, a workbook version of this challenge will be made available on the Online Training site. This is not an answer sheet, and we encourage you to continue coding yourself, rather than reading through the solutions. This workbook will walk you through the tasks like in the examples used throughout the training, but with a bit more independence before providing the answers.
9.2 Level 1 - Retrieve a dataset
Use the ohvbd package to find and retrieve a VBD dataset of your choice. Feel free to choose a dataset that aligns with your own interest, but try to choose one that includes species data, location data, and environmental or trait variables.
Hint: Remember to use the arguments from the Pre- Live Session content to refine your search:
- query - what you are searching for, such as a species name.
- db - which databases we want to search
- fromdate - the date we want to search from.
- todate - the date we want to search up to.
- locationpoly - set our search to a geographic area.
- taxonomy - advanced search by species ID.
- exact - whether to return exact matches only.
- withoutpublished - whether to return results without a publishing date when filtering by date.
- returnlist - return the raw output list of datasets, rather than a formatted dataframe.
You can also find vignettes with examples under Reading & Resources.
View your dataset:
- Identify the data types and potential key columns.
- Check if your data contains missing values or inconsistent column names.
- Consider whether your dataset needs converting from wide to long format.
9.3 Level 2 - Wrangle your data
Apply at least two data wrangling techniques to improve the usability of your dataset. Consider why you chose those changes for your specific dataset.
Hint: Refer back to the do’s and don’t of Data Wrangling Principles from the Pre- Live Session content for general data wrangling advice. Apply what is necessary to your chosen dataset.
9.4 Level 3 - Cleaning species names
Identify the species column of your dataset and check if the species names are formatted consistently. Apply name cleaning techniques where appropriate.
Hint: Identify the inconsistencies in your species names using unique(). Once you know how the species names vary in your dataset, you can use mutate() and:
- Set all text to lowercase:
- Remove unwanted characters, such as underscores:
- Remove additional text, such as “spp.”:
- Remove any extra spaces:
9.5 Level 4 - Merging datasets
For this level, you have a choice of two options (we recommend trying Level 4a to make the most of this training session):
Level 4a - Find a second dataset that can be combined with your first, for instance, if your first dataset focused on mosquito abundance, you might look for a second dataset on environmental factors. Identify a suitable key and try to merge the datasets.
Hint: Refer back to Level 1 for advice on refining your ohvbd search. Common keys in VBD data include species name, location, or date columns.
Level 4b - If a suitable second dataset is not available, focus on preparing your first dataset for merging by identifying potential key columns and whether they are in a usable format.
Hint: Common keys in VBD data include species name, location, or date columns. Consider how these columns are formatted in your dataset, and how this formatting might vary across other datasets.