Cuzzocrea A.,CNR Institute for High Performance Computing and Networking |
Fisichella M.,search Center
Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics | Year: 2011
One of the main advantages of Web services is that they can be composed into more complex processes in order to achieve a given business goal. However, such potentiality cannot be fully exploited until suitable methods and techniques allowing us to enable automatic discovery of composed processes are provided. Indeed, nowadays service discovery still focuses on matching atomic services by typically checking the similarity of functional parameters, such as inputs and outputs. However, a more profitable process discovering can be reached if both internal structure and component services are taken into account. Based on this main intuition, in this paper we describe a method for discovering composite OWL-S processes that founds on the following main contributions: (i) proposing a graph-based representation of composite OWL-S processes; and (ii) introducing an algorithm that matches over such (graph-based) representations and computes their degree of matching via combining the similarity of the atomic services they comprise and the similarity of the control flow among them. Finally, as another contribution of our research, we conducted a comprehensive experimental campaign where we tested our proposed algorithm by deriving insightful trade-offs of benefits and limitations of the overall framework for discovering Semantic Web services. © 2011 IEEE.
Hoang T.-A.,search Center |
Lim E.-P.,Singapore Management University
ACM Transactions on Intelligent Systems and Technology | Year: 2017
Microblogging encompasses both user-generated content and behavior. When modeling microblogging data, one has to consider personal and background topics, as well as how these topics generate the observed content and behavior. In this article, we propose the Generalized Behavior-Topic (GBT) model for simultaneously modeling background topics and users' topical interest in microblogging data. GBT considersmultiple topical communities (or realms) with different background topical interests while learning the personal topics of each user and the user's dependence on realms to generate both content and behavior. This differentiates GBT from other previous works that consider either one realm only or content data only. By associating user behavior with the latent background and personal topics, GBT helps to model user behavior by the two types of topics. GBT also distinguishes itself from other earlier works by modeling multiple types of behavior together. Our experiments on two Twitter datasets show that GBT can effectively mine the representative topics for each realm. We also demonstrate that GBT significantly outperforms other state-of-The-Art models in modeling content topics and user profiling. © 2017 ACM 2157-6904/2017/04-ART44 15.00.
Gottschalk S.,search Center |
Demidova E.,search Center
ACM Transactions on the Web | Year: 2017
In this article, we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences, and build a basis for qualitative analysis of the articles. An important challenge in this context is the tradeoff between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian, and English Wikipedia and collect a user-Annotated benchmark. Then we propose MultiWiki, a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. The MultiWiki demonstration is publicly available and currently supports four language pairs. © 2017 ACM 1559-1131/2017/04-ART6 $15.00.
Krestel R.,search Center |
Fankhauser P.,German Research Center for Artificial Intelligence
Neurocomputing | Year: 2012
More and more content on the Web is generated by users. To organize this information and make it accessible via current search technology, tagging systems have gained tremendous popularity. Especially for multimedia content they allow to annotate resources with keywords (tags) which opens the door for classic text-based information retrieval. To support the user in choosing the right keywords, tag recommendation algorithms have emerged. In this setting, not only the content is decisive for recommending relevant tags but also the user's preferences.In this paper we introduce an approach to personalized tag recommendation that combines a probabilistic model of tags from the resource with tags from the user. As models we investigate simple language models as well as Latent Dirichlet Allocation. Extensive experiments on a real world dataset crawled from a big tagging system show that personalization improves tag recommendation, and our approach significantly outperforms state-of-the-art approaches. © 2011 Elsevier B.V.
Doerfel S.,University of Kassel |
Jaschke R.,search Center |
Stumme G.,University of Kassel
ACM Transactions on Intelligent Systems and Technology | Year: 2016
Social bookmarking systems have established themselves as an important part in today's Web. In such systems, tag recommender systems support users during the posting of a resource by suggesting suitable tags. Tag recommender algorithms have often been evaluated in offline benchmarking experiments. Yet, the particular setup of such experiments has rarely been analyzed. In particular, since the recommendation quality usually suffers from difficulties such as the sparsity of the data or the cold-start problem for new resources or users, datasets have often been pruned to so-called cores (specific subsets of the original datasets), without much consideration of the implications on the benchmarking results. In this article, we generalize the notion of a core by introducing the new notion of a set-core, which is independent of any graph structure, to overcome a structural drawback in the previous constructions of cores on tagging data. We show that problems caused by some types of cores can be eliminated using set-cores. Further, we present a thorough analysis of tag recommender benchmarking setups using cores. To that end, we conduct a large-scale experiment on four real-world datasets, in which we analyze the influence of different cores on the evaluation of recommendation algorithms. We can show that the results of the comparison of different recommendation approaches depends on the selection of core type and level. For the benchmarking of tag recommender algorithms, our results suggest that the evaluation must be set up more carefully and should not be based on one arbitrarily chosen core type and level. © 2016 ACM.
Doerfel S.,University of Kassel |
Jaschke R.,search Center
RecSys 2013 - Proceedings of the 7th ACM Conference on Recommender Systems | Year: 2013
Since the rise of collaborative tagging systems on the web, the tag recommendation task suggesting suitable tags to users of such systems while they add resources to their collection has been tackled. However, the (offine) evaluation of tag recommendation algorithms usually suffers from dificulties like the sparseness of the data or the cold start problem for new resources or users. Previous studies therefore often used so-called post-cores (specific subsets of the original datasets) for their experiments. In this paper, we conduct a large-scale experiment in which we analyze different tag recommendation algorithms on different cores of three real-world datasets. We show, that a recommender's performance depends on the particular core and explore correlations between performances on different cores. © 2013 ACM.
Siersdorfer S.,search Center |
Chelaru S.,search Center |
Nejdl W.,search Center |
San Pedro J.,Telefonica
Proceedings of the 19th International Conference on World Wide Web, WWW '10 | Year: 2010
An analysis of the social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for these comments. In this paper, we present an in-depth study of commenting and comment rating behavior on a sample of more than 6 million comments on 67,000 YouTube videos for which we analyzed dependencies between comments, views, comment ratings and topic categories. In addition, we studied the influence of sentiment expressed in comments on the ratings for these comments using the SentiWordNet thesaurus, a lexical WordNet-based resource containing sentiment annotations. Finally, to predict community acceptance for comments not yet rated, we built different classifiers for the estimation of ratings for these comments. The results of our large-scale evaluations are promising and indicate that community feedback on already rated comments can help to filter new unrated comments or suggest particularly useful but still unrated comments. © 2010 International World Wide Web Conference Committee (IW3C2).
Bruni E.,University of Trento |
Tran N.K.,search Center |
Baroni M.,University of Trento
Journal of Artificial Intelligence Research | Year: 2014
Distributional semantic models derive computational representations of word meaning from the patterns of co-occurrence of words in text. Such models have been a success story of computational linguistics, being able to provide reliable estimates of semantic relatedness for the many semantic tasks requiring them. However, distributional models extract meaning information exclusively from text, which is an extremely impoverished basis compared to the rich perceptual sources that ground human semantic knowledge. We address the lack of perceptual grounding of distributional models by exploiting computer vision techniques that automatically identify discrete "visual words" in images, so that the distributional representation of a word can be extended to also encompass its co-occurrence with the visual words of images it is associated with. We propose a flexible architecture to integrate text- and image-based distributional information, and we show in a set of empirical tests that our integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter. © 2014 AI Access Foundation.
Alrifai M.,search Center |
Skoutas D.,search Center |
Risse T.,search Center
Proceedings of the 19th International Conference on World Wide Web, WWW '10 | Year: 2010
Web service composition enables seamless and dynamic integration of business applications on the web. The performance of the composed application is determined by the performance of the involved web services. Therefore, non-functional, quality of service aspects are crucial for selecting the web services to take part in the composition. Identifying the best candidate web services from a set of functionally-equivalent services is a multi-criteria decision making problem. The selected services should optimize the overall QoS of the composed application, while satisfying all the constraints specified by the client on individual QoS parameters. In this paper, we propose an approach based on the notion of skyline to effectively and efficiently select services for composition, reducing the number of candidate services to be considered. We also discuss how a provider can improve its service to become more competitive and increase its potential of being included in composite applications. We evaluate our approach experimentally using both real and synthetically generated datasets. © 2010 International World Wide Web Conference Committee (IW3C2).
Velasco E.,Robert Koch Institute |
Agheneza T.,Robert Koch Institute |
Denecke K.,search Center |
Kirchner G.,Robert Koch Institute |
Eckmanns T.,Robert Koch Institute
Milbank Quarterly | Year: 2014
Context The exchange of health information on the Internet has been heralded as an opportunity to improve public health surveillance. In a field that has traditionally relied on an established system of mandatory and voluntary reporting of known infectious diseases by doctors and laboratories to governmental agencies, innovations in social media and so-called user-generated information could lead to faster recognition of cases of infectious disease. More direct access to such data could enable surveillance epidemiologists to detect potential public health threats such as rare, new diseases or early-level warnings for epidemics. But how useful are data from social media and the Internet, and what is the potential to enhance surveillance? The challenges of using these emerging surveillance systems for infectious disease epidemiology, including the specific resources needed, technical requirements, and acceptability to public health practitioners and policymakers, have wide-reaching implications for public health surveillance in the 21st century. Methods This article divides public health surveillance into indicator-based surveillance and event-based surveillance and provides an overview of each. We did an exhaustive review of published articles indexed in the databases PubMed, Scopus, and Scirus between 1990 and 2011 covering contemporary event-based systems for infectious disease surveillance. Findings Our literature review uncovered no event-based surveillance systems currently used in national surveillance programs. While much has been done to develop event-based surveillance, the existing systems have limitations. Accordingly, there is a need for further development of automated technologies that monitor health-related information on the Internet, especially to handle large amounts of data and to prevent information overload. The dissemination to health authorities of new information about health events is not always efficient and could be improved. No comprehensive evaluations show whether event-based surveillance systems have been integrated into actual epidemiological work during real-time health events. Conclusions The acceptability of data from the Internet and social media as a regular part of public health surveillance programs varies and is related to a circular challenge: the willingness to integrate is rooted in a lack of effectiveness studies, yet such effectiveness can be proved only through a structured evaluation of integrated systems. Issues related to changing technical and social paradigms in both individual perceptions of and interactions with personal health data, as well as social media and other data from the Internet, must be further addressed before such information can be integrated into official surveillance systems. © 2014 Milbank Memorial Fund.