Contributed by Kathryn E. McDougal PhD, Towson University
Although the Researcher Workbench can be daunting at first, it provides a unique opportunity for undergraduates and young trainees to participate in independent research projects using a unique data resource. Potential student research options range from small group projects in a Course-based Undergraduate Research Experience (CURE) to the more traditional independent research format of one student working with a professor or post-doctoral fellow. Preparing students before they log into the Researcher Workbench is essential. This article complements the featured workspace "Genomics Undergrad Lesson Plan Exemplar (v6)".
These suggested steps were optimized in a CURE; they may need to be modified based on the education level of your students and the time they have available to work on their project.
Step 1: Put your students in a mindset to tackle this project.
For many students, this is a completely new type of project using new tools and a new vocabulary. It can be daunting, especially for students new to research or students from populations historically underrepresented in research who are unsure if they belong in the research space. Positive reinforcement and direction are imperative at this early stage. Make sure they know it is okay if they feel overwhelmed and have a lot to learn. This is a normal part of doing science! Assign them articles to read like this one.
Step 2: Have them perfect their experimental design.
It is useful if your students have a clear understanding of their project design and goals before logging into the Researcher Workbench. If they understand the design of their project well, they will be more successful in selecting cohorts and concepts in the Researcher Workbench and it will be easier for them to work independently. A week or two of literature searches and mini journal clubs can be a great way to get them started. Another option is to have your students write their own research proposal.
Step 3: Assign them tasks to teach (or refresh) coding skills.
Although excellent R and Python code snippets and Tutorial/Featured Workspaces can be found in the Researcher Workbench, it is useful if students have some coding knowledge before they begin their projects. Websites like Codecademy and others have excellent free and paid resources to help students learn coding skills independently. Additionally, giving students a practice dataset to play with outside of the Researcher Workbench (in RStudio, soon to become Posit, or Jupyter Notebook) may help them feel more confident in their coding and data manipulation skills. Step 3 can be done concurrently with Steps 1 & 2.
Step 4: Take advantage of the Getting Started/User Support/Office Hours videos in the Researcher Workbench.
Students who take advantage of these resources report having an easier time using and understanding the bespoke Researcher Workbench interface. Assigning specific tutorial videos or Office Hour sessions can direct students to the best resources to get them started. The Getting Started videos give a good overview of the system, and the New User Orientation is helpful to learn about All of Us and how the data resource is run. Additionally, do not be afraid to reach out to the User Support team with questions, they are there to help and love working with students.
Step 5: Encourage your students to be thoughtful in building their Cohorts and Concept Sets.
This can be one of the most challenging steps for students. Strong knowledge of their project focus and design is imperative. It is not uncommon for students to select Cohorts and Concepts, start analyzing their data, and then realize the are missing a variable or part of a population necessary for their project. Once your students make their Cohort/Concept selections and start their Notebooks, encourage then to look closely at what their datasets contain and make sure they have all the information they need before they start their data manipulation and analyses.
Step 6: Investigate Observational Medical Outcomes Partnership (OMOP)
OMOP is the common data model used by All of Us. Most students have never seen it before and need an introduction. ODHSI has some great resources on YouTube, particularly their tutorials and workshops where they explain what OMOP is and how it came to be. Knowing the history behind OMOP, then watching the AoU Office Hour session on OMOP is a good way to help students comprehend the data organization they are seeing in their Notebook. Gradually, as they work with the data, the OMOP common data model will become easier to understand.
Step 7: Reinforce the importance of looking at the data
Many of the coding and data manipulation issues students encounter in the Notebook can be avoided by students looking at their data. For example, a student will merge two files and continue on, not stopping to check that the new file created after the merge is exactly what they expect it to be. They do eventually realize they have a problem, perhaps five steps later when they are trying to graph data that isn’t there. Reminding your students frequently to look at their data, especially those students inexperienced in coding, can help projects run more smoothly.
Step 8: Don’t forget the code snippets and the Tutorial/Featured Workspaces
Need to change birthdate into age? There is a code snippet for that! Students do not need to come up with every line of code on their own. Resources exist in the Researcher Workbench to help users manipulate data. Encourage them to take the time to explore the code snippets and the Tutorial/Featured Workspaces.
Step 9: Keep your code neat and organized.
With a project like this, students will generate a lot of code. Keeping it neat, organized, documented, and in the correct order is imperative. Messy code with no documentation can make if hard, if not impossible, to identify problems and fix errors. Make sure students have a clear understanding of how their code should be organized and documented when they start. Examples of how students may want to arrange their code can be found in the Student Example Workspace.
Step 10: Technical errors happen. Don’t be afraid to use the User Support Team.
As mentioned previously, the team at All of Us is committed to helping users access and analyze the data. All of Us welcomes researchers at all stages of their careers, including students, so students should not feel intimidated to submit questions to the User Support Team or ask questions during Office Hours. In fact, if they listen to an Office Hour session, they may find that mid-career and late-career scientists have some of the same questions they do!
What might a student project look like?
For an example of the type of project a pair of undergraduates can produce in 3 – 4 months with weekly mentorship, see the Student Example Workspace. There is a Student Example Workspace in R for the Controlled Tier. These are just an example; depending on a particular project’s focus and design it could look very different. However, these example notebooks are useful for students to see the flow of a project, how code is organized and used, and give ideas on how to graph data. The Controlled Tier Student Example Workspace also shows how to incorporate genetic variants from the genotyping array data.
The Student Example Workspace for the Controlled Tier is ‘Genomics Undergrad Lesson Plan Exemplar (v6)’ and the Notebooks are ‘Example COPD Student Project’ and ‘Python Hail Array Data Extraction’.