Pecha Kucha

  • Date: Thurday, June 5
  • Time: 15:45 - 16:45
  • Location: TBC
  • Chair: Jennifer Doty, Emory University

University data ownership and management policies
Abigail Goben, University of Illinois-Chicago; Lisa Zilinski, Kristin Briney

OMERO: Finally, file-level preservation for natural scientists
Alex Garnett, Simon Fraser University

Call me maybe? It's not crazy! Data collection offices are a good partner in data management
Alicia Hofelich Mohr and Andrew Sell, University of Minnesota

Uncorked: Leveraging Data to Drink Better
Ashley Jester, Columbia University

The Data Service Centre (DSC) at Statistics Netherlands: storing and exchanging statistical data and metadata
Harold Kroeze, Statistics Netherlands

Trends in Data Submissions at a Social Science Data Archive
Amy Pienta, ICPSR, University of Michigan

The Census with Kittens: Not Just a Gimmick
Amy West, University of Minnesota Libraries

  • Presenter: Alex Garnett, Simon Fraser University
  • Abstract:  Natural scientists have typically been difficult for digital preservation and data librarians to provide services to. The sheer bit-scale of their research, as well as the proliferation of proprietary file formats output by specialized imaging devices, has led to them often preferring their own tools to libraries'. While some open solutions exist for maintaining and publishing lab notes, such as OpenWetWare, these are typically only deployed at the research group level, and as more academic libraries begin to establish their own research data curation and repository initiatives with broad interdisciplinary focus, science faculties are often neglected for lack of standardized support. OMERO, a tool developed by researchers that has yet to make headway into the digital preservation or library data communities, offers a potential solution by providing a scriptable ingest and normalization server for many imaging file formats that are not supported by popular tools such as Imagemagick. At minimum, it could serve as a drop-in addition to these tools within digital collections and preservation platforms such as Islandora and Archivematica, assisting with metadata extraction and indexing, file-level preservation, and in-browser display and annotation. It has been released under the GPL by the Open Microscopy Environment.

  • Presenters: Abigail Goben, University of Illinois-Chicago; Lisa Zilinski, Kristin Briney
  • Abstract:  Data ownership and management policies can affect how research data are supported at a university. This Pecha Kucha presentation will highlight the preliminary results of our current research on university data ownership and management policies. In contrast to previous studies on institutional data management policies, we examined the university websites of 206 institutions with a Carnegie Classification on Institutions of Higher Education of either "High" or "Very High" research level as of July 2014. Some of the major questions we asked included: Does the institution have a data sharing or management policy? What does the policy cover? Who owns the policy (e.g. Office of Research, Information Technology, Libraries)? What happens to the ownership of the data if a researcher leaves the institution? Are universities with data management services provided by the library more likely to have a policy on data management? Ultimately, our goal is to determine if universities support data management comprehensively with complementary policies and services. The topics that will be covered include data stewardship, ownership, retention, and sharing in regards to university research data policies.

  • Presenter: Alicia Hofelich Mohr and Andrew Sell, University of Minnesota
  • Abstract:  For data management professionals, attention is largely focused on the beginning and ends of the research process, as many researchers are worried about meeting federal requirements for data management plans (DMPs) and are looking for ways to share and archive their data. As a University office specializing in survey and experimental data collection, we have seen how the "middle" steps of data collection and analysis can be influenced by, and be an influence on, these upstream and downstream data management processes. In this Pecha Kucha, we will present relevant data management lessons we have learned from designing, developing, and hosting data collection tools. Challenges of anonymity and paying participants, quirks of statistical files produced by data collection tools, and transparency in the research process are among some of the issues we will discuss. As many of these challenges directly impact later sharing and curation of the data collected, we emphasize that data collection offices can be important partners in data management efforts.

  • Presenter: Ashley Jester, Columbia University
  • Abstract:  Since 2010, my partner and I have been recording information about the wine that is consumed by our household, making a point to gather data about each unique bottle. Together, we have accumulated detailed information about over 400 bottles of wine, including tasting notes, varietals, origin, and importer. While this data collection was originally intended to keep us from buying “bad” wines again, it has turned out to be a rich trove of information about the varietals we like, the importers we can trust, and the years that have proven to be good vintages. This Pecha Kucha will present an overview of this data, revealing both some of the substantive findings from our dataset and also the methodologies that have been applied to create the analysis. This Pecha Kucha will be a quick and fun tour of the international landscape through the lens of wine, with a focus on finding out the best way to use data to make more informed consumption choices.

  • Presenter: Harold Kroeze, Statistics Netherlands
  • Abstract:  The Data Service Centre (DSC) is the central repository for datasets across the entire statistical field of Statistics Netherlands (SN). Its purpose is to archive the datasets as well to enable easy, secure and monitored exchange of data and metadata. The DSC has the following characteristics: - Metadata first, data second. - Datasets are stored as text files (csv or fixed-width) and are described according to a metadata model. - Public access to metadata within SN, data access only after authorisation by data owner. - Service-oriented approach: the backend system uses web services for communication with client tools. - It promotes re-use of variables and definitions. An organisation-wide project ("The treasure chest unlocked") was set up to describe and store the microdata sets that form the basis of our published data. The project also produced a number of tools to manage metadata and data (for example a Metadata editor and a Catalogue). This resulted in a very noticable increase in the volume of metadata and datasets stored at the DSC. Data and metadata can now easily be shared within the organisation, but also with external researchers through our remote access facility.

  • Presenter: Amy Pienta, ICPSR, University of Michigan
  • Abstract:  In recent years, new data sharing policies in the US have encouraged scientists to share research data with others, many accomplishing this through archiving their data with a domain repository. Related to this trend, there is strong demand from social scientists for access to research data for a variety of secondary data analysis uses including: support of new grant applications, in classrooms for research papers, and to be used in research projects that lead to conference presentations and publications. Given that many users search for a potential secondary data through Google or through the search feature of data repository, it is possible to create and mine a database for emerging patterns in search behavior that help us better understand the demand for data and how well a domain repository is able to meet that demand. We explore data from the 100 most frequently searched keywords/phrases at ICPSR in 2014. We match these popular terms to the depth of the ICPSR holdings related to these search to determine areas where ICPSR may be lacking data. We also identify common search terms where the users exit the ICPSR web site after searching for data. We find, for example, "demoralization" was searched for 323 times in 2014 and 94% of users exited the ICPSR web site after results from the search were returned. Looking forward, ICPSR expects the number of scientists wanting access to research data collected by others to increase and this user search model may provide a greater understanding of data user needs.

  • Presenter: Amy West, University of Minnesota Libraries
  • Abstract:  Yes, illustrating a two hour virtual class session on the history of the Census Bureau and its surveys with cat pictures was initially a gimmick to maintain student engagement. Turns out though, cats are particularly effective at illustrating the more complex aspects of how the Census Bureau has developed, the functions that decennial censuses serve and the controversies they engender. I"ll demonstrate this unique qualification with comparisons to other charismatic fauna such as puppies, red pandas and otters.