Session Block 5 – Thursday, June 4, 13:30-15:30
- Time: 13:30 - 15:30
- Location: Blegen Hall 150
- Chair: Amber Leahey
- Track: Data Infrastructure and Applications
Mixed Method Approaches to GIS: Qualities, Quantities, and Quandaries
- Presenter: Andy Rutkowski, University of Southern California
- Abstract: Geographic Information Systems (GIS) have the potential to make sense out of large collections of data. Historically GIS projects have been focused on quantitative data and analysis, whereas qualitative data has been mostly limited to classifying or labeling categories or types. More recently GIS work has shown how different types of qualitative data (such as interviews, Tweets, archival newspaper classifieds, photographs, etc.) can improve our understanding of quantitative data and therefore produce more meaningful maps. I will outline some recent cases of mixed methods approaches to GIS projects and discuss how these approaches benefited from including qualitative data. I will also consider the challenges of collecting, using, and archiving qualitative data. Lastly, I will consider the politics of mixing your methods in academic and other settings.
The Landscape of Geospatial Research: A Content Analysis of Recently Published Articles
- Presenters: Mara Blake, Nicole Scholtz, and Justin Joque, University of Michigan
- Abstract: Researchers at all levels frequently refer to existing journal articles for references to data sources, tools and methods, but often the lack of clear information about these prevents continuity and reproducibility in research practices. The authors undertook this study to capture information about the body of published literature utilizing geospatial research methods. The authors present the preliminary results of an exploratory content analysis on published articles that used methods of geospatial analysis as a part of their research methodology. In order to better understand the landscape of current publishing practices and methodological approaches, the authors coded a sample of articles from a selection of journals drawn from a variety of disciplines that utilize geospatial analyses. They coded the articles for content, including: data citation; software and tools used; and specificity of research methodology description. In addition to the coded variables, the authors also compiled metadata about the articles, including: journal title; journal subject area; primary author subject affiliation; primary author sex; and number of authors. The authors present an exploration of the current state of data and geospatial related practices, especially transparency and quality of sources and methods, as well as some key challenges in applying content analysis to this domain.
GoGeo: A Jisc-funded service to promote and support spatial data management and sharing across UK academia
- Presenter: Tony Mathys, EDINA, The University of Edinburgh
- Abstract: The implementation and encouragement of good data management practices and data sharing in the social sciences is a formidable challenge, especially for spatial data within academic disciplines that embrace the use of Geographical Information Systems (GIS), image processing and statistical software for research and teaching. The Joint Information Systems Committee (Jisc) has taken the lead to provide resources to support data management and sharing across UK academia. The GoGeo service is an example of Jisc's commitment to provide resources to securely manage and share spatial data. These resources include the Geodoc online metadata tool, which allows users to create, edit, manage, import, export and publish standards-compliant (ISO 19115, UKGEMINI, INSPIRE, DDI and Dublin Core) metadata records; the GoGeo portal, which allows users to publish their records into public or private metadata catalogues; and, ShareGeo, a repository for users to upload and download spatial data. The service also offers geospatial metadata workshops to introduce academics and students to geospatial metadata, standards and to the GoGeo service's resources. This presentation will provide an overview of the GoGeo service, which started as a project between the EDINA and the UK Data Archive in 2002. Its successes and shortcomings will be summarised as well.
“Quantitative, Quantitative, Quantitative!” Is Qualitative Research the Jan Brady of Social Sciences Data Services?
- Presenter: Mandy Swygart-Hobaugh, Georgia State University
- Abstract: Librarians providing data services for researchers and learners in the social sciences should be offering data support and management services to qualitative researchers as well as quantitative ones. But, is this the case in practice? Do social sciences data services librarians devote their primary attention to quantitative researchers to the detriment of qualitative researchers? Is qualitative research the Jan Brady of social sciences data services? This presentation will present findings from: (1) a content analysis of IASSIST job repository postings from 2005-2014, gauging their requirements/responsibilities regarding qualitative data services; and (2) a survey of social sciences data services librarians and other data-support professionals, exploring the extent of qualitative data and research support they presently provide at their academic institutions and their thoughts regarding the relevance of qualitative data and research for the future of data support services.
- Gary Berg-Cross
- Reagan Moore
- Ingrid Dillo
- Mary Vardigan
- Abstract: An international group of collaborating data professionals launched the Research Data Alliance (RDA) in March 2013 with the vision of sharing data openly across technologies, disciplines, and countries to address the grand challenges of society. RDA is supported by the European Commission, the U.S. National Science Foundation, and the Australian government, and it meets in plenary twice a year. Members of the RDA voluntarily work together in Working Groups with concrete deliverables or in exploratory Interest Groups. Some of the foundational RDA Working Groups have completed the first phase of their projects and have produced results. This session is intended to highlight their activities and accomplishments
- Peter Wittenburg, Rob Pennington, Yunquiang Zhu, and Gary Berg-Cross: Data Fabric Interest Group
- Early work by five RDA Working Groups (DTR, DFT, PIT, PP, MDSD) developed a foundation that was important for progress and common understanding. As these groups completed their efforts, continued interaction and expansion to other groups was deemed useful to form a more integrated view. As a result a new Interest Group entitled "Data Fabric" was formed. Starting with a white paper, the DFIG will broadly consider and illustrate the possible directions to make data practices more efficient and cost-effective. We will describe important common components and their services, along with principles of component and service interaction and associated best practices. Over time we will seek consensus on conceptual views of the ecological landscape of components and services that are required. The intent is to promote ingredients, such as policy-based automatic procedures adhering to basic data organization principles, that are necessary to professionally deal with large datasets in ways based on well-accepted concepts and mechanisms. These discussions concretized by spin-off WGs are expected to benefit RDA groups as well as the broader research and data community.
- Andreas Rauber, Ari Asmi, Reagan Moore, and Dieter van Uytvanck: Dynamic Data Citation Working Group: Approaches to Data Citation in Non-Trivial Settings: How to Precisely Identify Subsets in Static and Dynamic Data
- Being able to reliably and efficiently identify entire or subsets of data in large and dynamically growing or changing datasets constitutes a significant challenge for a range of research domains. To repeat an earlier study, or to apply data from an earlier study to a new model, we need to be able to precisely identify the very subset of data used. While verbal descriptions of how the subset was created are hardly precise enough and do not support automated handling, keeping redundant copies of the data in question does not scale up to the big data settings encountered in many disciplines today. Furthermore, we need to handle situations where new data gets added or existing data gets corrected or modified over time. Conventional approaches are not sufficient. We will review the challenges identified above and discuss solutions that are currently elaborated within the context of the Working Group of the Research Data Alliance (RDA) on Data Citation: Making Dynamic Data Citable. The approach is based on versioned and time-stamped data sources, with persistent identifiers being assigned to the time-stamped queries/expressions that are used for creating the subset of data. We will further review results from the first pilots evaluating the approach.
- Ingrid Dillo and Simon Hodson: Data Publication: Cost Recovery for Data Centres Interest Group
- A lot of work is going on to understand the costs of maintaining long-term accessibility to digital resources, to identify different cost components, and on the basis of this to develop cost models. However, in a broader context that considers data as part of research communication, the identification of costs and development of cost models address only part of the problem. In times of tightening budgets, it is important to address the challenge of ensuring the sustainability of data centres -- and to consider this in the context of the broader processes for data publication. Many established national and international data centres have reliable sources of income from research funders. However, these income sources are generally inelastic and may be vulnerable. There is concern that basic funding of data infrastructure may not keep pace with increasing costs. And there is a need, therefore, to consider alternative cost recovery options and a diversification of revenue streams. The RDA/WDS Interest Group on Cost Recovery for Data Centres aims to contribute to strategic thinking on cost recovery by conducting research to understand current and possible cost recovery strategies for data centres. This presentation will provide an overview of the activities of the interest group.
- Mary Vardigan and Lesley Rickards: DSA-WDS Basic Certification for Repositories Working Group
- Created under the auspices of the RDA Interest Group on Audit and Certification, this Working Group is a partnership between the Data Seal of Approval (DSA) and the World Data System (WDS) to develop a common set of requirements for basic assessment and certification of data repositories. Both the DSA and the WDS are lightweight certification mechanisms and their criteria have much in common, so it makes sense to bring them together. In addition the Working Group seeks to develop common assessment procedures, a shared testbed for assessment, and ultimately a framework for certification that includes other standards like Nestor and ISO 16363 as well. This presentation will provide an overview of the activities of the working group, including a review of the harmonized requirements and procedures.
- Time: 13:30 - 15:30
- Location: Blegen Hall 135
- Chair: Bobray Bordelon
- Track: Data Services Professional Development
Bridging the business data divide: insights into primary and secondary data use by business researchers
- Presenter: Linda Lowry, Brock University Library
- Abstract: Academic librarians and data specialists use a variety of approaches to gain insight into how researcher data needs and practices vary by discipline, including surveys, focus groups, and interviews. Some published studies have included small numbers of business school faculty and graduate students in their samples, but provided little, if any, insight into variations within the business discipline. Business researchers employ a variety of research designs and methods and engage in quantitative and qualitative data analysis. The purpose of this paper is to provide deeper insight into primary and secondary data use by business graduate students at one Canadian university based on a content analysis of a corpus of 32 Master of Science in Management theses. This paper explores variations in research designs and data collection methods between and within business subfields (e.g., accounting, finance, operations and information systems, marketing, or organization studies) in order to better understand the extent to which these researchers collect and analyze primary or secondary data sources, including commercial and open data sources. The results of this analysis will inform the work of data specialists and liaison librarians who provide research data management services for business school researchers.
Listening to the user-voice
- Presenters: Sarah King Hele (presenter) and Vanessa Higgins, UK Data Service
- Abstract: The UK Data Service is a resource funded to support researchers, students, lecturers and policymakers who depend on high-quality social and economic data. This presentation will discuss the methods we use to consult with users and track their behaviour on the website in order to improve our services to them. Our approaches include an annual stakeholder consultation, a continuous pop-up survey on the website, ad-hoc consultations with specific user groups, regular user-testing of the website, monitoring of Google Analytics, user conferences and monitoring of feedback and attendance figures from training events. These developments allow the "user-voice" to come through loud and clear in a variety of formats in listening to the user voice we are able to deliver an improved and targeted service. We also discuss future plans to reach new audiences, including expanding our use of data visualisation and a new dissertation zone.
Was it good for you? User Experience Research to improve dissemination of census aggregate data via InFuse
- Presenter: Richard Wiseman, UK Data Service - Census Support
- Abstract: InFuse (infuse.mimas.ac.uk) provides easy access to aggregate data from the UK's 2011 and 2011 censuses based on a fundamental remodelling of the thousands of disparate aggregate datasets produced by the three UK census agencies into a single, integrated, standards-compliant dataset suitable for global and automated operations. To date, efforts have mainly been focussed on the enormous task of data processing. This presentation will outline the next phase of development aimed at enhancing users' experiences of InFuse. It will include details of user experience research already carried out, and the ways in which results have guided current development, as well as describing future plans, challenges and opportunities.
Understanding Academic Users' Data Needs through Virtual Reference Transcripts
- Presenters: Margaret Smith (New York University), Jill Conte (New York University) and Samantha Guss (University of Richmond)
- Abstract: New York University Libraries has a very high volume chat reference service--averaging more than 14,500 transactions per academic year for the past few years. This popularity offers a unique opportunity for insight into our patrons' conceptualization of their data needs and how these needs are changing. Through analysis of four years' worth of chat transcripts, we assessed user needs and familiarity related to locating secondary data and statistics, performing data analysis, and using existing data services. We used a grounded theory approach, exploring the data through coding and categorization. We will discuss the process and results of our investigation, as well as implications for training virtual reference service staff on the data reference interview and other data topics, and improving overall service quality.
Publishing Codebooks via CED2AR to enable variable cross-searching between datasets
- Presenters: Janet Heslop and Ben Perry, Cornell University - CISER
- Abstract: The Comprehensive Extensible Data Documentation and Access Repository (CED2AR) is designed to improve the discoverability of data collections based upon codebooks and metadata of the holdings. CED2AR utilizes DDI 2.5 metadata standards for documenting the holdings, along with schema.org for microdata markup to allow search engines to parse the semantic information from the DDI metadata. This combined solution enhances the discoverability of DDI metadata and displays it through a user friendly web interface. In addition to making individual codebooks searchable, CED2AR also facilitates cross-codebook searching and browsing. Based on the CED2AR application the Cornell Institute of Social and Economic Research (CISER) is currently in the midst of bringing our data archive metadata into DDI 2.5 through the CED2AR application. The presentation will describe the steps taken to accomplish this task and a demonstration on the status of producing an extensible data archive down to the variable level.
Update on Taxonomy / Lexicon Project at the US Bureau of Labor Statistics
- Presenter: Daniel Gillman, US Bureau of Labor Statistics
- Abstract: The taxonomy and lexicon project at the US Bureau of Labor Statistics was started in summer 2013 with the goal to provide consistent access to BLS data and documents. Each search criterion should provide data and documents that are related. The taxonomy portion of the work is to improve searching for data, and the lexicon portion is to improve tagging, cataloging, and searching for documents. The work has advanced significantly since it was initially described at IASSIST 2014. There are 5 areas of note: 1) The development of a 3 level hierarchy over all the measures and characteristics encompassing BLS data 2) Linkage of all low level characteristics across measures 3) The identification of common confusions and plain language similarities for all BLS data 4) Cognitive evaluations of the 3 level hierarchies 5) The development of a web-based implementation of the taxonomy 6) The inclusion of the taxonomy into the new DataFinder series dissemination tool 7) Assessment of the impact on standardizing all BLS terms Each of these developments will be discussed in more detail, and in particular the impact of each is described.
Doing DDI: Operationalising DDI for longitudinal studies
- Presenter: Gemma Seabrook, Institute of Education
- Abstract: Funders are rightly concerned that they get the maximum value for their investments. The UK longitudinal studies represent a unique data collection that continues to give value and it is important to ensure that this collection remains relevant. Providing the quality of documentation that modern researchers expect for studies that begin as early as 1946 provides a significant challenge. The CLOSER project brings together nine of these studies. It seeks to enhance the metadata available to provide complete questionnaire and variable metadata that can support ongoing data management and be used to populate a search platform, enabling discovery in general and cross-cohort research in particular. Bringing historic metadata up to the DDI standard being adopted presents a variety of challenges such as the condition of supporting documentation for the oldest parts of the studies, the wide variety of methods and formats, the sheer scale of the number of questionnaires and tools used and the priorities and capacity of the various stakeholders. This paper will detail how CLOSER has addressed these at an operational level (protocols, processes, planning, etc.) and how others might learn from these experiences and make use of the outputs CLOSER provides.
Big Metadata: Bringing Researchers CLOSER to Longitudinal Data with an Advanced Discovery Platform
- Presenter: Jeremy Iverson (Colectica) and Jack Kneeshaw (UK Data Archive - University of Essex)
- Abstract: CLOSER (www.closer.ac.uk) -- funded by the ESRC and MRC -- aims to maximise the use, value and impact of nine of the UK's longitudinal studies. A central component of CLOSER will be a metadata discovery platform that will enable the discovery of a range of data collection and variable metadata from each of the participating studies. The scale and detail of the metadata to be included will make it amongst the largest and most detailed of such repositories in the world. The discovery platform -- a customisation of the Colectica search portal -- will offer cutting-edge search technologies and innovative and user-friendly ways to discover, navigate and display the metadata. This presentation will describe how the metadata were created and harmonized, discuss how the search portal was built, and showcase the innovative search and discovery interfaces that are critical to allowing researchers to understand and leverage a massive data resource.
- Jen Darragh (Johns Hopkins)
- Ryan Womack (Rutgers)
- Jennifer Doty (Emory)
- Jamene Brooks-Kieffer (University of Kansas)
- Sarah Irwin (Penn State)
- Abstract: The increasing availability of secondary datasets has also stoked demand for restricted data, and researchers often expect these data to be as accessible as the other digital resources they use. Yet restricted data pose a special challenge due to the strict security requirements typically associated with their use. Scholars who wish to work with restricted data but lack unit-level or departmental resources may have no central, secure research services at their institutions; this lack may be felt especially strongly at poorly resourced institutions. Consequently, many researchers are forced to independently obtain and secure restricted data for research, creating ad hoc, unmonitored workspaces across campuses. This proliferation of restricted data use in unsecured spaces is an urgent problem facing many institutions. The session will explore the academic library as a trusted central resource for restricted data support. Session panelists will describe resources provided by their libraries, along with a history of how their support has evolved and plans for future development. A broader conversation will focus on the value of libraries in providing restricted data support, unique challenges and opportunities for libraries in this domain, and pragmatic suggestions for how to initiate services.