Cuzzocrea A.,CNR Institute for High Performance Computing and Networking |
Fisichella M.,search Center
Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics | Year: 2011
One of the main advantages of Web services is that they can be composed into more complex processes in order to achieve a given business goal. However, such potentiality cannot be fully exploited until suitable methods and techniques allowing us to enable automatic discovery of composed processes are provided. Indeed, nowadays service discovery still focuses on matching atomic services by typically checking the similarity of functional parameters, such as inputs and outputs. However, a more profitable process discovering can be reached if both internal structure and component services are taken into account. Based on this main intuition, in this paper we describe a method for discovering composite OWL-S processes that founds on the following main contributions: (i) proposing a graph-based representation of composite OWL-S processes; and (ii) introducing an algorithm that matches over such (graph-based) representations and computes their degree of matching via combining the similarity of the atomic services they comprise and the similarity of the control flow among them. Finally, as another contribution of our research, we conducted a comprehensive experimental campaign where we tested our proposed algorithm by deriving insightful trade-offs of benefits and limitations of the overall framework for discovering Semantic Web services. © 2011 IEEE.
Krestel R.,search Center |
Fankhauser P.,German Research Center for Artificial Intelligence
Neurocomputing | Year: 2012
More and more content on the Web is generated by users. To organize this information and make it accessible via current search technology, tagging systems have gained tremendous popularity. Especially for multimedia content they allow to annotate resources with keywords (tags) which opens the door for classic text-based information retrieval. To support the user in choosing the right keywords, tag recommendation algorithms have emerged. In this setting, not only the content is decisive for recommending relevant tags but also the user's preferences.In this paper we introduce an approach to personalized tag recommendation that combines a probabilistic model of tags from the resource with tags from the user. As models we investigate simple language models as well as Latent Dirichlet Allocation. Extensive experiments on a real world dataset crawled from a big tagging system show that personalization improves tag recommendation, and our approach significantly outperforms state-of-the-art approaches. © 2011 Elsevier B.V.
Doerfel S.,University of Kassel |
Jaschke R.,search Center |
Stumme G.,University of Kassel
ACM Transactions on Intelligent Systems and Technology | Year: 2016
Social bookmarking systems have established themselves as an important part in today's Web. In such systems, tag recommender systems support users during the posting of a resource by suggesting suitable tags. Tag recommender algorithms have often been evaluated in offline benchmarking experiments. Yet, the particular setup of such experiments has rarely been analyzed. In particular, since the recommendation quality usually suffers from difficulties such as the sparsity of the data or the cold-start problem for new resources or users, datasets have often been pruned to so-called cores (specific subsets of the original datasets), without much consideration of the implications on the benchmarking results. In this article, we generalize the notion of a core by introducing the new notion of a set-core, which is independent of any graph structure, to overcome a structural drawback in the previous constructions of cores on tagging data. We show that problems caused by some types of cores can be eliminated using set-cores. Further, we present a thorough analysis of tag recommender benchmarking setups using cores. To that end, we conduct a large-scale experiment on four real-world datasets, in which we analyze the influence of different cores on the evaluation of recommendation algorithms. We can show that the results of the comparison of different recommendation approaches depends on the selection of core type and level. For the benchmarking of tag recommender algorithms, our results suggest that the evaluation must be set up more carefully and should not be based on one arbitrarily chosen core type and level. © 2016 ACM.
Doerfel S.,University of Kassel |
Jaschke R.,search Center
RecSys 2013 - Proceedings of the 7th ACM Conference on Recommender Systems | Year: 2013
Since the rise of collaborative tagging systems on the web, the tag recommendation task suggesting suitable tags to users of such systems while they add resources to their collection has been tackled. However, the (offine) evaluation of tag recommendation algorithms usually suffers from dificulties like the sparseness of the data or the cold start problem for new resources or users. Previous studies therefore often used so-called post-cores (specific subsets of the original datasets) for their experiments. In this paper, we conduct a large-scale experiment in which we analyze different tag recommendation algorithms on different cores of three real-world datasets. We show, that a recommender's performance depends on the particular core and explore correlations between performances on different cores. © 2013 ACM.
Siersdorfer S.,search Center |
Chelaru S.,search Center |
Nejdl W.,search Center |
San Pedro J.,Telefonica
Proceedings of the 19th International Conference on World Wide Web, WWW '10 | Year: 2010
An analysis of the social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for these comments. In this paper, we present an in-depth study of commenting and comment rating behavior on a sample of more than 6 million comments on 67,000 YouTube videos for which we analyzed dependencies between comments, views, comment ratings and topic categories. In addition, we studied the influence of sentiment expressed in comments on the ratings for these comments using the SentiWordNet thesaurus, a lexical WordNet-based resource containing sentiment annotations. Finally, to predict community acceptance for comments not yet rated, we built different classifiers for the estimation of ratings for these comments. The results of our large-scale evaluations are promising and indicate that community feedback on already rated comments can help to filter new unrated comments or suggest particularly useful but still unrated comments. © 2010 International World Wide Web Conference Committee (IW3C2).
Bruni E.,University of Trento |
Tran N.K.,search Center |
Baroni M.,University of Trento
Journal of Artificial Intelligence Research | Year: 2014
Distributional semantic models derive computational representations of word meaning from the patterns of co-occurrence of words in text. Such models have been a success story of computational linguistics, being able to provide reliable estimates of semantic relatedness for the many semantic tasks requiring them. However, distributional models extract meaning information exclusively from text, which is an extremely impoverished basis compared to the rich perceptual sources that ground human semantic knowledge. We address the lack of perceptual grounding of distributional models by exploiting computer vision techniques that automatically identify discrete "visual words" in images, so that the distributional representation of a word can be extended to also encompass its co-occurrence with the visual words of images it is associated with. We propose a flexible architecture to integrate text- and image-based distributional information, and we show in a set of empirical tests that our integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter. © 2014 AI Access Foundation.
Hennig P.,search Center |
Balke W.-T.,search Center
ICWS 2010 - 2010 IEEE 8th International Conference on Web Services | Year: 2010
Data intensive applications, e.g. in life sciences, pose new efficiency challenges to the service composition problem. Since today computing power is mainly increased by multiplication of CPU cores, algorithms have to be redesigned to benefit from this evolution. In this paper we present a framework for parallelizing service composition algorithms investigating how to partition the composition problem into multiple parallel threads. But in contrast to intuition, the straightforward parallelization techniques do not lead to superior performance as our baseline evaluation reveals. To harness the full power of multi-core architectures, we propose two novel approaches to evenly distribute the workload in a sophisticated fashion. In fact, our extensive experiments on practical life science data resulted in an impressive speedup of over 300% using only 4 cores. Moreover, we show that our techniques can also benefit from all advanced pruning heuristics used in sequential algorithms. © 2010 IEEE.
Alrifai M.,search Center |
Skoutas D.,search Center |
Risse T.,search Center
Proceedings of the 19th International Conference on World Wide Web, WWW '10 | Year: 2010
Web service composition enables seamless and dynamic integration of business applications on the web. The performance of the composed application is determined by the performance of the involved web services. Therefore, non-functional, quality of service aspects are crucial for selecting the web services to take part in the composition. Identifying the best candidate web services from a set of functionally-equivalent services is a multi-criteria decision making problem. The selected services should optimize the overall QoS of the composed application, while satisfying all the constraints specified by the client on individual QoS parameters. In this paper, we propose an approach based on the notion of skyline to effectively and efficiently select services for composition, reducing the number of candidate services to be considered. We also discuss how a provider can improve its service to become more competitive and increase its potential of being included in composite applications. We evaluate our approach experimentally using both real and synthetically generated datasets. © 2010 International World Wide Web Conference Committee (IW3C2).
Velasco E.,Robert Koch Institute |
Agheneza T.,Robert Koch Institute |
Denecke K.,search Center |
Kirchner G.,Robert Koch Institute |
Eckmanns T.,Robert Koch Institute
Milbank Quarterly | Year: 2014
Context The exchange of health information on the Internet has been heralded as an opportunity to improve public health surveillance. In a field that has traditionally relied on an established system of mandatory and voluntary reporting of known infectious diseases by doctors and laboratories to governmental agencies, innovations in social media and so-called user-generated information could lead to faster recognition of cases of infectious disease. More direct access to such data could enable surveillance epidemiologists to detect potential public health threats such as rare, new diseases or early-level warnings for epidemics. But how useful are data from social media and the Internet, and what is the potential to enhance surveillance? The challenges of using these emerging surveillance systems for infectious disease epidemiology, including the specific resources needed, technical requirements, and acceptability to public health practitioners and policymakers, have wide-reaching implications for public health surveillance in the 21st century. Methods This article divides public health surveillance into indicator-based surveillance and event-based surveillance and provides an overview of each. We did an exhaustive review of published articles indexed in the databases PubMed, Scopus, and Scirus between 1990 and 2011 covering contemporary event-based systems for infectious disease surveillance. Findings Our literature review uncovered no event-based surveillance systems currently used in national surveillance programs. While much has been done to develop event-based surveillance, the existing systems have limitations. Accordingly, there is a need for further development of automated technologies that monitor health-related information on the Internet, especially to handle large amounts of data and to prevent information overload. The dissemination to health authorities of new information about health events is not always efficient and could be improved. No comprehensive evaluations show whether event-based surveillance systems have been integrated into actual epidemiological work during real-time health events. Conclusions The acceptability of data from the Internet and social media as a regular part of public health surveillance programs varies and is related to a circular challenge: the willingness to integrate is rooted in a lack of effectiveness studies, yet such effectiveness can be proved only through a structured evaluation of integrated systems. Issues related to changing technical and social paradigms in both individual perceptions of and interactions with personal health data, as well as social media and other data from the Internet, must be further addressed before such information can be integrated into official surveillance systems. © 2014 Milbank Memorial Fund.
Marenzi I.,search Center |
Zerr S.,search Center
IEEE Transactions on Learning Technologies | Year: 2012
This paper discusses the development of LearnWeb2.0, a search and collaboration environment for supporting searching, organizing, and sharing distributed resources, and our pedagogical setup based on the multiliteracies approach. In LearnWeb2.0, collaborative and active learning is supported through project-focused search and aggregation, with discussion and comments directly linked to the resources. We are developing the LearnWeb2.0 platform through an iterative evaluation-driven design-based research approach-this paper describes the first iteration and part of the second one. In the first iteration, we developed LearnWeb2.0 and evaluated it in two Content and Language Integrated Learning (CLIL) courses We followed the multiliteracies approach, using authentic content from a variety of sources and contexts to provide important input for CLIL. We present evaluation design and results for both courses, and discuss how the differences in both scenarios influenced student performance and satisfaction. In the second iteration, we improved LearnWeb2.0 based on these experiences-we describe improvements as well as problems addressed. Finally, we sketch the evaluation planned for the second cycle, and close with a reflection of our experiences with the design-based research approach for developing a collaborative learning environment, and on multiliteracies as a suitable approach for CLIL. © 2011 IEEE.