Tuesday, June 2: Workshop Information

Workshops take place on Tuesday, June 2nd. Please click on the ID number for detailed information on individual workshops.

Time ID Title Location
9:00-12:00 W1 Hands-on Big Data
presenters: Ryan Womack
W2 Where Everybody Knows Your Name: Building Credible and Sustainable Data Services in a Liberal Arts College
presenters: Kristin Partlo, Danya Leebaw, Paula Lackie, Peter Rogers, & Diana Symons
W3 Introduction to International Microdata: IPUMS-International and the Integrated Demographic & Health Surveys
presenters: Lara Cleveland, Patricia Kelly Hall, & Miriam King
W4 Using NVivo 10 for Qualitative Data Analysis
presenters: Mandy Swygart-Hobaugh, Georgia State University
W5 Metadata Management Using DDI and Colectica
presenters: Jeremy Iverson, & Dan Smith
12:30-13:30 Break
13:30-15:30 W6 Data Quality in Qualtrics: Applying data management practices during design, collection, and analysis.
presenters: Andrew Sell, Thomas Lindsay, & Alicia Hofelich Mohr
W7 Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Access Tools, and Long-Term Availability
presenter: Johanna Bleckman, & Kaye Marz
W8 Managing and Sharing Qualitative Data
presenter: Colin Elman & Dessislava Kirilova
13:30-16:30 W9 New data from IPUMS-CPS and ATUS-X
presenter: Sarah Flood, & Katie Genadek
W10 The Art of the Merge: How to Merge Data in Three Statistical Software Programs
presenter: Ashley Jester, Tara Das, & Starr Hoffman

Back to top

 W1: Hands-on Big Data 

  • Time: 9:00 - 12:00
  • Presenter: Ryan Womack, Rutgers University

Abstract: This workshop is for those of you who, having read about Big Data and seen some of its results in academic studies and the commercial world, would like to get a sense of what actually working with Big Data entails.

The workshop will provide an overview of key technologies for the handling and analysis of large scale datasets, including Hadoop/MapReduce, the RHadoop package, other R packages used for large scale analysis, and Big Data handling environments such as Cloudera, Hortonworks, Tessera, and Amazon Web Services. We will also discuss a few of the primary challenges in successfully completing analysis of large scale data, such as integrating and structuring heterogenous data, handling sparse matrices, and devising effective analytical routines using parallel processing and splitting data. Participants will work with a live demonstration environment that provides a realistic introduction to Big Data Analytics using scripts that will run both on a scaled-down demonstration dataset and on truly large scale data.

Back to top

 W2: Where Everybody Knows Your Name: Building Credible and Sustainable Data Services in a Liberal Arts College 

  • Time: 9:00 - 12:00
  • Presenters:
    • Kristin Partlo, Carleton College
    • Danya Leebaw, Carleton College
    • Paula Lackie, Carleton College
    • Peter Rogers, Colgate University
    • Diana Symons, College of Saint Benedict/Saint John's University
    • Aaron Albertson, Macalester College

Abstract: Providing data services within a liberal arts college setting presents unique challenges and opportunities. Residential liberal arts colleges are characterized by a focus on teaching undergraduates, small class sizes, and individualized support from staff and faculty provided with a fraction of the technical infrastructure of research institutions.

This workshop will cover topics particularly relevant for those with emerging or established data services in a liberal arts college. Practicing librarian from four instituions will lead discussion and interactive activities designed to help participants learn more about the following as they pertain to the particular institutional context of liberal arts colleges: developing a sustainable and credible model, building on the strengths of a small community, outreach to faculty and students, identifying allies, empowering other colleagues to respond to data questions and needs, establishing data management practices, partnering with related campus initiatives like digital scholarship, integrating data into a traditional collection development model, and curating campus data projects. Participants will leave with strategies to advance data services on their own campuses. Beyond addressing these topics, an important goal for the workshop is for liberal arts data practitioners to build relationships with their colleagues at similar institutions.

Back to top

 W3: Introduction to International Microdata: IPUMS-International and the Integrated Demographic & Health Surveys 

  • Time: 9:00 - 12:00
  • Presenter:
    • Lara Cleveland, University of Minnesota
    • Patricia Kelly Hall, University of Minnesota
    • Miriam King, University of Minnesota

Abstract: The IPUMS-International (Integrated Public Use Microdata Series -International) and the IDHS (Integrated Demographic & Health Surveys) are international microdata dissemination projects of the Minnesota Population Center (MPC). IPUMS-International provides large samples of census microdata from 79 countries, from the 1960s through the latest census rounds. These records, covering over 500 million individuals, report on demographics, education, household structure, labor force participation, dwelling characteristics, and other topics. IDHS offers data on African and Indian women of childbearing age and children under 5, with information on health topics ranging from contraceptive use and prenatal care to HIV and intimate partner violence. Data from IPUMS-International and IDHS are ideal for comparative analyses across time and space. The user-friendly web interface shows variable availability at a glance, offers variable-specific information on question wording, codes and frequencies, and comparability issues, and merges files to create customized data extracts. This is a hands-on session that will introduce participants to the power and ease-of-use of IPUMS and IDHS. After an introduction to the datasets, participants will do a series of exercises to showcase the interactive metadata, customized microdata extract system, online tabulator, and classroom registration system.

Back to top

 W4: Using NVivo 10 for Qualitative Data Analysis 

  • Time: 9:00 - 12:00
  • Presenter: Mandy Swygart-Hobaugh, Georgia State University

Abstract: Many social scientists like to “get their hands dirty” by delving into deep analysis of qualitative data – be it discourse analysis, in-depth interviews, ethnographic observations, visual and textual media analysis, etc. Manually coding these data sources can become cumbersome and cluttered – and may even hinder drawing out the rich content in the data.

Through hands-on work with provided qualitative data, participants will explore ways to organize, analyze, and present qualitative research data using NVivo 10 analysis software. The workshop will cover the following topics:

  • Coding of text and multimedia sources
  • Using Queries to explore and code data
  • Creating Attribute Value Classifications to facilitate comparative analyses
  • Data visualizations

Back to top

 W5: Metadata Management Using DDI and Colectica 

  • Time: 9:00 - 12:00
  • Presenters:
    • Jeremy Iverson, Colectica
    • Dan Smith, Colectica

Abstract: The DDI Lifecycle metadata standard enables creating, documenting, managing, distributing, and discovering data. Colectica is a software tool that is built on open metadata standards, and helps facilitate adopting DDI into the research data management process.

This workshop starts with a high-level overview of the DDI content model, and then teaches how to create DDI XML, both manually and with Colectica. Finally, participants will learn how to publish DDI metadata.

This workshop covers the following topics:
  • Introduction to DDI 3.2
  • Introduction to Colectica
  • Documenting concepts and general study design
  • Designing and documenting data collection instruments and surveys
  • Documenting variables and creating linkages
  • Ingesting existing resources
  • Publishing resources
  • Hands-on: use Colectica and DDI to manage a sample study

Back to top

 W6: Data Quality in Qualtrics: Applying data management practices during design, collection, and analysis 

  • Time: 13:30 - 15:30
  • Presenters:
    • Andrew Sell, University of Minnesota
    • Thomas Lindsay, University of Minnesota
    • Alicia Hofelich Mohr, University of Minnesota
Abstract: Many universities use Qualtrics for online data collection. Its low learning curve makes it popular, but also raises significant challenges to data management, as many decisions made during implementation can affect later use of the data. In this workshop, we will discuss strategies for good data management while designing, collecting, and extracting survey data in Qualtrics. Examples and hands-on exercises will address survey flow, interoperability with statistical tools, adding embedded data fields to track experimental conditions, and workflows for downloading and preprocessing data. 

We expect this workshop to be useful both for users of Qualtrics and for those who encounter data collected in Qualtrics. Some knowledge of creating surveys in Qualtrics is expected. Participants who do not use Qualtrics are advised to explore the tool beforehand.

Back to top

 W7: Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Access Tools, and Long-Term Availability 

  • Time: 13:30 - 15:30
  • Presenters:
    • Johanna Bleckman, ICPSR
    • Kaye Marz, ICPSR
Abstract: Federal data sharing requirements increase public access to federally-funded scientific data. For researchers, data sharing is a key resource in translating research into knowledge, policies, and practices. This workshop will assist participants in facilitating data sharing in the cycle of science that starts with deposited data, which through additional use, leads to the sharing of knowledge that inspires new data collection. 

The workshop will cover several deposit options (to fully-curated archives and the public access archive, openICPSR), differences between sharing public-use and restricted-use data, and benefits to depositors through the ICPSR Website. A hands-on demonstration of making a deposit is planned.

Finding data for the unique needs of a research project can be challenging, particularly in a world that values both the liberal use and protection of research data. The workshop will describe and demonstrate the array of discovery and exploration tools that leverage ICPSR’s vast data catalog, metadata, and online analysis options, discuss the discovery, use, and publishing from restricted-use data, and include group discussion of disclosure issues and hands-on time with ICPSR data tools.

Participants will become more familiar with:
  • Federal data sharing requirements
  • Options for sharing data
  • Data discovery tools
  • Protection of confidentiality when sharing data

Back to top

 W8: Managing and Sharing Qualitative Data 

  • Time: 13:30 - 15:30
  • Presenters:
    • Colin Elman, Qualitative Data Repository
    • Dessislava Kirilova, Qualitative Data Repository 

Abstract: While data access and research transparency are becoming standard practices across the social sciences, the transition has been easier in the quantitative tradition. In part this is because most scholars who use quantitative data and analytical techniques have long accepted the norm, even if they have not regularly complied with it. Standards for making quantitative data accessible are widely acknowledged, and substantial infrastructure for that sharing has been in place for many years. 

The idea that qualitative data should be shared is much more recent and controversial. Part of the debate arises from the absence of widely shared understandings of the concrete operational practices for sharing qualitative data. Of course, many of the best practices for dealing with data that librarians, archivists, data center staff and other information professionals typically employ remain applicable. However, qualitative data present a variety of additional challenges due to their close proximity to the social world from which they were drawn. Their often-textual nature likewise poses special challenges to sharing, particularly internationally. The workshop highlights these challenges and provides a basic framework research data professionals can make use of when called upon to advise their user community about managing qualitative data.

Workshop organizers are associated with the Qualitative Data Repository (QDR). Funded by the National Science Foundation, QDR was established in 2014 to provide the infrastructure to safely store and share qualitative data and to contribute to developing the expertise and tools needed to share such data. 

Specific techniques, tools, and resources will be presented on the following topics:
  • Planning to manage qualitative data before a research project begins
  • Organizing qualitative data for analysis and writing, research transparency and potential sharing
  • Sharing qualitative data ethically and legally and in a way that facilitates broad international access
  • The uses to which shared qualitative data can be put

Back to top

W9: New data from IPUMS-CPS, ATUS-X, and IPUMS-SESTAT

  • Time: 13:30 - 16:30
  • Presenters: 
    • Sarah Flood, University of Minnesota
    • Devon Kristiansen, University of Minnesota
Abstract: The IPUMS-CPS (Integrated Public Use Microdata Series –Current Population Survey), ATUS-X (American Time Use Survey Extract System), and IPUMS-SESTAT are microdata dissemination projects of the Minnesota Population Center (MPC). The IPUMS-CPS data project was recently expanded and now includes the March Annual Social and Economic Supplement data from 1962 to 2014 and CPS Basic Monthly Samples from 1989 to 2013. In addition to the Basic Monthly data, 13 supplements including the food security, veterans, fertility, tobacco use, and voter are currently available. The ATUS-X contains annual time diary data from 2003-2014, and includes newly available health and well-being data. IPUMS-SESTAT is a new project to make information about college graduates in the United States more easily accessible and includes data since 1993. All MPC Data are harmonized for consistency across time, fully documented, and easily accessible online for the research community.

This is a hands-on session that will introduce participants to IPUMS-CPS, ATUS-X, and IPUMS-SESTAT with an overview the data available and topics of interest to research covered by these data. The presenters will lead attendees through a series of exercises to learn how to obtain the data, access the web-based documentation and metadata, and use the data in basic analyses.

Back to top

W10: The Art of the Merge: How to Merge Data in Three Statistical Software Programs

  • Time: 13:30 - 16:30
  • Presenters: 
    • Ashley Jester, Columbia University
    • Tara Das, Columbia University
    • Starr Hoffman, Columbia University

    "I need to add additional years to my dataset for a longitudinal analysis..."
    "I need to add additional variables to my dataset…"
    "I’ve found all of my variables and need to bring them into a single file…"

Have you ever heard (or said) this? Most researchers will need to merge data at some point in their research process as it is rare that all of the variables relevant to an analysis will be found in a single source. 

This workshop will focus on merging datasets using three statistical software packages: Stata, R, and SAS. It will teach the basic research principles and data requirements necessary to execute a successful merge and will apply this knowledge. Instructors will provide sample datasets and guide participants step-by-step through preparing data, completing a merge successfully, and validating results. This will be of use to researchers as well as to librarians and others who support research. If you need to merge data or assist those who do, this workshop will give you the knowledge to make your data merge a success.

Learning objectives:
  • Able to execute successful data merges in Stata, R, and SAS
  • Understand general principles necessary to complete a data merge in any application