OCLC Research


OCLC Research

Time filter
Source Type

Koopman R.,OCLC Research | Wang S.,OCLC Research | Scharnhorst A.,Anna van Saksenlaan 51
Scientometrics | Year: 2017

This paper describes how semantic indexing can help to generate a contextual overview of topics and visually compare clusters of articles. The method was originally developed for an innovative information exploration tool, called Ariadne, which operates on bibliographic databases with tens of millions of records (Koopman et al. in Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. doi:10.1145/2702613.2732781, 2015b). In this paper, the method behind Ariadne is further developed and applied to the research question of the special issue “Same data, different results”—the better understanding of topic (re-)construction by different bibliometric approaches. For the case of the Astro dataset of 111,616 articles in astronomy and astrophysics, a new instantiation of the interactive exploring tool, LittleAriadne, has been created. This paper contributes to the overall challenge to delineate and define topics in two different ways. First, we produce two clustering solutions based on vector representations of articles in a lexical space. These vectors are built on semantic indexing of entities associated with those articles. Second, we discuss how LittleAriadne can be used to browse through the network of topical terms, authors, journals, citations and various cluster solutions of the Astro dataset. More specifically, we treat the assignment of an article to the different clustering solutions as an additional element of its bibliographic record. Keeping the principle of semantic indexing on the level of such an extended list of entities of the bibliographic record, LittleAriadne in turn provides a visualization of the context of a specific clustering solution. It also conveys the similarity of article clusters produced by different algorithms, hence representing a complementary approach to other possible means of comparison. © 2017 Akadémiai Kiadó, Budapest, Hungary

Koopman R.,OCLC Research | Wang S.,OCLC Research
Scientometrics | Year: 2017

After a clustering solution is generated automatically, labelling these clusters becomes important to help understanding the results. In this paper, we propose to use a mutual information based method to label clusters of journal articles. Topical terms which have the highest normalised mutual information with a certain cluster are selected to be the labels of the cluster. Discussion of the labelling technique with a domain expert was used as a check that the labels are discriminating not only lexical-wise but also semantically. Based on a common set of topical terms, we also propose to generate lexical fingerprints as a representation of individual clusters. Eventually, we visualise and compare these fingerprints of different clusters from either one clustering solution or different ones. © 2017 Akadémiai Kiadó, Budapest, Hungary

Wang S.,OCLC Research | Koopman R.,OCLC Research
Scientometrics | Year: 2017

Document clustering is generally the first step for topic identification. Since many clustering methods operate on the similarities between documents, it is important to build representations of these documents which keep their semantics as much as possible and are also suitable for efficient similarity calculation. As we describe in Koopman et al. (Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29 June to 3 July, 2015. Bogaziçi University Printhouse. http://www.issi2015.org/files/downloads/all-papers/1042.pdf, 2015), the metadata of articles in the Astro dataset contribute to a semantic matrix, which uses a vector space to capture the semantics of entities derived from these articles and consequently supports the contextual exploration of these entities in LittleAriadne. However, this semantic matrix does not allow to calculate similarities between articles directly. In this paper, we will describe in detail how we build a semantic representation for an article from the entities that are associated with it. Base on such semantic representations of articles, we apply two standard clustering methods, K-Means and the Louvain community detection algorithm, which leads to our two clustering solutions labelled as OCLC-31 (standing for K-Means) and OCLC-Louvain (standing for Louvain). In this paper, we will give the implementation details and a basic comparison with other clustering solutions that are reported in this special issue. © 2017 Akadémiai Kiadó, Budapest, Hungary

O'Neill E.,Senior Research Scientist | Zumer M.,University of Ljubljana | Mixter J.,OCLC Research
Library Resources and Technical Services | Year: 2015

Aggregates have been a frequent topic of discussion between library science researchers. This study seeks to better understand aggregates through the analysis of a sample of bibliographic records and review of the cataloging treatment of aggregates. The study focuses on determining how common aggregates are in library collections, what types of aggregates exist, how aggregates are described in bibliographic records, and the criteria for identifying aggregates from the information in bibliographic records. A sample of bibliographic records representing textual resources was taken from OCLC's WorldCat database. More than 20 percent of the sampled records represented aggregates and more works were embodied in aggregates than were embodied in single work manifestations. A variety of issues, including cataloging practices and the varying definitions of aggregates, made it difficult to accurately identify and quantify the presence of aggregates using only the information from bibliographic records.

Shah C.,Rutgers University | Radford M.L.,Rutgers University | Connaway L.S.,OCLC Research
Library and Information Science Research | Year: 2015

Virtual reference services (VRS) and social question and answer (SQA) are two different platforms that share many facets of their functionality, leading to an opportunity to create synergic solutions by bringing complimentary aspects of these services together. This article describes the use of participatory design, a method commonly used in human-computer interaction (HCI), for investigating design and deployment challenges to create a new hybrid question-answer (Q&A) system. A set of three design sessions was conducted with 17 experts from academia and industry. These semi-guided discussions asked the experts for their opinions and suggestions on various issues concerning what a potential hybrid Q&A system could look like. In addition, the participants were encouraged to provide design and implementation ideas based on expertise in their relative fields. The suggestions, comments, and ideas resulted in the development of 11 themes within three categories: (1) provision of more information; (2) provision of control; and (3) focus on user-friendly design. This paper provides details of the method, the sessions, and the design suggestions including the 11 themes and three broad categories. The paper provides a synthesis of the implications of the findings for virtual reference and social Q&A service providers and system designers. Finally, the participatory design method is compared to other methods, and implications for its use in library and information science are presented. © 2015 Elsevier Inc.

Radford M.L.,Rutgers University | Connaway L.S.,OCLC Research
Library and Information Science Research | Year: 2013

Research reveals that users of virtual reference services (VRS) value accurate answers to their queries and a pleasant interpersonal encounter. Findings from a longitudinal study compare two sets of randomly selected VRS transcripts, one of 850 live chat sessions from 2004 to 2006, and the second of 560 live chat and instant messaging (Qwidget) sessions from 2010. The investigation of the international QuestionPoint (OCLC, 2012) transcripts includes comparisons by query type (e.g., ready reference, policy and procedural, subject search) and by accuracy of answers to the subset identified as ready reference (e.g., fact-based queries). Findings indicate that percentages of ready reference queries are remaining stable, having increased slightly from 27% (243 of 915 queries found in 850 transcripts) in 2004-2006 to 31% (179 of 575 queries found in 560 transcripts) in the 2010 dataset. Additionally, accuracy of answers was found to have improved. The percentage of correct and complete responses with citations given by VRS librarians or staff members answering ready reference questions was found to have increased from 78% (141) in 2004-2006 to 90% (151) in 2010. © 2012 OCLC Online Computer Library Center, Inc.

Connaway L.S.,OCLC Research | Dickey T.J.,OCLC Research | Radford M.L.,Rutgers University
Library and Information Science Research | Year: 2011

In today's fast-paced world, anecdotal evidence suggests that information tends to inundate people, and users of information systems want to find information quickly and conveniently. Empirical evidence for convenience as a critical factor is explored in the data from two multi-year, user study projects funded by the Institute of Museum and Library Services. The theoretical framework for this understanding is founded in the concepts of bounded rationality and rational choice theory, with Savolainen's (2006) concept of time as a context in information seeking, as well as gratification theory, informing the emphasis on the seekers' time horizons. Convenience is a situational criterion in peoples' choices and actions during all stages of the information-seeking process. The concept of convenience can include their choice of an information source, their satisfaction with the source and its ease of use, and their time horizon in information seeking. The centrality of convenience is especially prevalent among the younger subjects ("millennials") in both studies, but also holds across all demographic categories-age, gender, academic role, or user or non-user of virtual reference services. These two studies further indicate that convenience is a factor for making choices in a variety of situations, including both academic information seeking and everyday-life information seeking, although it plays different roles in different situations. © 2011 Elsevier Inc.

Koopman R.,OCLC Research | Wang S.,OCLC Research
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries | Year: 2014

Finding similar journals within a large amount of existing ones is not a trivial task. Based on the hypothesis that similar journals publish similar articles, we propose in this paper a scalable method based on random projection to calculate the similarities between 35K journals based on 67 millions articles published with them. We evaluate our results against Dewey Decimal codes and analyse the networks of similar journals. © 2014 IEEE.

Connaway L.S.,OCLC Research | White D.,University of Oxford | Lanclos D.,University of North Carolina at Charlotte
Proceedings of the ASIST Annual Meeting | Year: 2011

This 3-year project is funded by JISC, OCLC, Oxford University, and the University of North Carolina, Charlotte. It does not aim to answer 'What works?' but 'Why does it work?'. If we gain a better understanding of student and scholar motivations for engaging in the information environment, we have a greater chance of meeting expectations and creating services which are used and ultimately good value for money. We cannot continue to provide an educational version of every available platform in an attempt to mirror the web within institutions. We must make informed decisions on how to move forward to ensure that we will not be at the mercy of every 'new' technology that becomes available nor will we be expending funds on services, systems, and facilities that are not used. The project is an attempt to fill the gap in user behaviour studies identified in the JISC Digital Information Seeker Report (2010). Connaway and Dickey (2010) call for a longitudinal study "to identify how individuals engage in both the virtual and physical worlds to get information for different situations could be conducted" (p. 56). They believe that "Such an investigation would contribute to a better understanding of how individuals navigate in multiple information environments and could influence the design and integration of systems and services for devices and applications, as well as cloud computing" (Connaway and Dickey (2010, p. 56). It utilises the visitors and residents principle described in the TALL blog (White 2008), which hypothesizes that neither age nor gender determines whether one is a visitor (one who logs on to the virtual environment, performs a specific task or acquires specific information, and then logs off) or a resident (one who has an ongoing, developing presence online).

Faniel I.M.,OCLC Research | Kriesberg A.,University of Michigan | Yakel E.,University of Michigan
Proceedings of the ASIST Annual Meeting | Year: 2012

We know little about the data reuse practices of novice data users. Yet large scale data reuse over the long term depends in part on uptake from early career researchers. This paper examines 22 novice social science researchers and how they make sense of social science data. Novices are particularly interested in understanding how data: 1) are transformed from qualitative to quantitative data, 2) capture concepts not well-established in the literature, and 3) can be matched and merged across multiple datasets. We discuss how novice data users make sense of data in these three circumstances. We find that novices seek to understand the data producer's rationale for methodological procedures and measurement choices, which is broadly similar to researchers in other scientific communities. However we also find that they not only reflect on whether they can trust the data producers' decisions, but also seek guidance from members of their disciplinary community. Specifically, novice social science researchers are heavily influenced by more experienced social science researchers when it comes to discovering, evaluating, and justifying their reuse of other's data.

Loading OCLC Research collaborators
Loading OCLC Research collaborators