Institute of Information Science
Institute of Information Science
Peng H.-P.,Academia Sinica, Taiwan |
Peng H.-P.,National Yang Ming University |
Peng H.-P.,Institute of Information Science |
Lee K.H.,Academia Sinica, Taiwan |
And 4 more authors.
Proceedings of the National Academy of Sciences of the United States of America | Year: 2014
Natural antibodies are frequently elicited to recognize diverse protein surfaces, where the sequence features of the epitopes are frequently indistinguishable from those of nonepitope protein surfaces. It is not clearly understood how the paratopes are able to recognize sequence-wise featureless epitopes and how a natural antibody repertoire with limited variants can recognize seemingly unlimited protein antigens foreign to the host immune system. In this work, computational methods were used to predict the functional paratopes with the 3D antibody variable domain structure as input. The predicted functional paratopes were reasonably validated by the hot spot residues known from experimental alanine scanning measurements. The functional paratope (hot spot) predictions on a set of 111 antibody-antigen complex structures indicate that aromatic, mostly tyrosyl, side chains constitute the major part of the predicted functional paratopes, with short-chain hydrophilic residues forming the minor portion of the predicted functional paratopes. These aromatic side chains interact mostly with the epitope main chain atoms and side-chain carbons. The functional paratopes are surrounded by favorable polar atomistic contacts in the structural paratope-epitope interfaces; more that 80% these polar contacts are electrostatically favorable and about 40% of these polar contacts form direct hydrogen bonds across the interfaces. These results indicate that a limited repertoire of antibodies bearing paratopes with diverse structural contours enriched with aromatic side chains among short-chain hydrophilic residues can recognize all sorts of protein surfaces, because the determinants for antibody recognition are common physicochemical features ubiquitously distributed over all protein surfaces.
Bartol T.,University of Ljubljana |
Budimir G.,Institute of Information Science |
Juznic P.,University of Ljubljana |
Stopar K.,University of Ljubljana
Scientometrics | Year: 2016
Fields of science (FOS) can be used for the assessment of publishing patterns and scientific output. To this end, WOS JCR (Web of Science/Journal Citation Reports) subject categories are often mapped to Frascati-related OECD FOS (Organization for Economic Co-operation and Development). Although WOS categories are widely employed, they reflect agriculture (one of six major FOS) less comprehensively. Other fields may benefit from agricultural WOS mapping. The aim was to map all articles produced nationally (Slovenia) by agricultural research groups, over two decades, to their corresponding journals and categories in order to visualize the strength of links between the categories and scatter of articles, based on WOS-linked raw data in COBISS/SciMet portal (Co-operative Online Bibliographic System and Services/Science Metrics) and national CRIS—Slovenian Current Research Information System. Agricultural groups are mapped into four subfields: Forestry and Wood Science, Plant Production, Animal Production, and Veterinary Science. Food science is comprised as either plant- or animal-product-related. On average, 50 % of relevant articles are published outside the scope of journals mapped to WOS agricultural categories. The other half are mapped mostly to OECD Natural-, Medical- and Health Sciences, and Engineering-and-Technology. A few selected journals and principal categories account for an important part of all relevant documents (core). Even many core journals/categories as ascertained with power laws (Bradford’s law) are not mapped to agriculture. Research-evaluation based on these classifications may underestimate multidisciplinary dimensions of agriculture, affecting its position among scientific fields and also subsequent funding if established on such ranking. © 2016 Akadémiai Kiadó, Budapest, Hungary
Bartol T.,University of Ljubljana |
Budimir G.,Institute of Information Science |
Dekleva-Smrekar D.,University of Ljubljana |
Pusnik M.,University of Ljubljana |
Juznic P.,University of Ljubljana
Scientometrics | Year: 2014
Web of Science (wos) and scopus have often been compared with regard to user interface, countries, institutions, author sets, etc., but rarely employing a more systematic assessment of major research fields and national production. The aim of this study was to appraise the differences among major research fields in scopus and wos based on a standardized classification of fields and assessed for the case of an entire country (Slovenia). We analyzed all documents and citations received by authors who were actively engaged in research in Slovenia between 1996 and 2011 (50,000 unique documents by 10,000 researchers). Documents were tracked and linked to scopus and wos using complex algorithms in the Slovenian cobiss bibliographic system and sicris research system where the subject areas or research fields of all documents are harmonized by the Frascati/oecd classification, thus offsetting some major differences between wos and scopus in database-specific subject schemes as well as limitations of deriving data directly from databases. scopus leads over wos in indexed documents as well as citations in all research fields. This is especially evident in social sciences, humanities, and engineering & technology. The least citations per document were received in humanities and most citations in medical and natural sciences, which exhibit similar counts. Engineering & technology reveals only half the citations per document compared to the previous two fields. Agriculture is found in the middle. The established differences between databases and research fields provide the Slovenian research funding agency with additional criteria for a more balanced evaluation of research. © 2013 Akadémiai Kiadó, Budapest, Hungary.
Bosnjak A.,Institute of Information Science |
Podgorelec V.,University of Maribor
Expert Systems with Applications | Year: 2016
Harmonising the metadata format alone does not solve the issue of efficient access to relevant information in heterogeneous environments, when different systems use different content, contextual and semantic concepts for certain entities. One such type of heterogeneous systems are also Current Research Information Systems (CRIS), which store their data primarily in local relational databases, using different formats and various local concepts. In this article, we study the possibilities and propose a new ontologically supported semantic search engine (OSSSE) which, in addition to the harmonisation of the metadata format among local CRIS systems, also ensures that the meaning of data and/or concepts that belong to various metadata entities are also harmonised. A special model of ontological infrastructure was designed, and dedicated test ontology was created alongside with a new simplified algorithm for creating ontology, the basis of which is the distinction between new and already existing classes in terms of content. Finally, we evaluated the proposed OSSSE model using a simulation of the search process on the base of 41,113 real searches within SICRIS. The obtained results show that regardless of the search situation, the proposed OSSSE is always at least as efficient as a search without ontological support in terms of precision, while recall remains the same; the improvement has been shown to be statistically significant with a high confidence interval (p<0.005). The proposed OSSSE model is able to solve the issue of harmonizing the data where different heterogeneous systems use different content, contextual and semantic concepts, which is the case in many advanced expert systems. In this manner, the more the search is carried out based on the properties described by the supporting ontology, the more the infrastructure can help a searcher. The proposed concepts, ontological infrastructure and the designed semantic search engine may well help to improve search precision in several information retrieval systems. © 2016 Elsevier Ltd
Tai C.-H.,National Taiwan University |
Yu P.S.,University of Illinois at Chicago |
Yang D.-N.,Institute of Information Science |
Chen M.-S.,National Taiwan University
Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011 | Year: 2011
How to protect individual privacy in public data is always a concern. For social networks, the challenge is that, the structure of the social network graph can be utilized to infer the private and sensitive information of users. The existing anonymity schemes mostly focus on the anonymity of vertex identities, such that a malicious attacker cannot associate an user with a specific vertex. In real social networks, however, each vertex is usually associated with not only a vertex identity but also a community identity, which could represent the private information for the corresponding user, such as the political party affiliation or disease information sensitive to the public. In this paper, we first show that the attacker can still infer the community identity of an user even though the graph is protected by previous anonymity schemes. Afterward, we propose the structural diversity, which ensures the existences of at least k communities containing vertices with the same degree for every vertex in the graph, to provide the anonymity of the community identities. Specifically, we formulate a new problem, k-Structural Diversity Anonymization (k-SDA), which protects the community identity of each individual in publishing social networks. We propose an Integer Programming formulation to find the optimal solutions to k-SDA. Moreover, we devise three scalable heuristics to solve the large instances of k-SDA with different perspectives. The experiments on real data sets demonstrate the practical utility of our privacy model and our approaches. Copyright © SIAM.
Wu C.-J.,National Taiwan University |
Wu C.-J.,Institute of Information Science |
Ho J.-M.,Academia Sinica, Taiwan |
Chen M.-S.,Academia Sinica, Taiwan |
Chen M.-S.,National Taiwan University
IEEE Transactions on Mobile Computing | Year: 2013
Social network applications are becoming increasingly popular on mobile devices. A mobile presence service is an essential component of a social network application because it maintains each mobile user's presence information, such as the current status (online/offline), GPS location and network address, and also updates the user's online friends with the information continually. If presence updates occur frequently, the enormous number of messages distributed by presence servers may lead to a scalability problem in a large-scale mobile presence service. To address the problem, we propose an efficient and scalable server architecture, called PresenceCloud, which enables mobile presence services to support large-scale social network applications. When a mobile user joins a network, PresenceCloud searches for the presence of his/her friends and notifies them of his/her arrival. PresenceCloud organizes presence servers into a quorum-based server-to-server architecture for efficient presence searching. It also leverages a directed search algorithm and a one-hop caching strategy to achieve small constant search latency. We analyze the performance of PresenceCloud in terms of the search cost and search satisfaction level. The search cost is defined as the total number of messages generated by the presence server when a user arrives; and search satisfaction level is defined as the time it takes to search for the arriving user's friend list. The results of simulations demonstrate that PresenceCloud achieves performance gains in the search cost without compromising search satisfaction. © 2012 IEEE.
Chen S.C.-C.,National Yang Ming University |
Chen S.C.-C.,Institute of Information Science |
Chen S.C.-C.,Academia Sinica, Taiwan |
Chuang T.-J.,Academia Sinica, Taiwan |
And 2 more authors.
Molecular Biology and Evolution | Year: 2011
Many indicators of protein evolutionary rate have been proposed, but some of them are interrelated. The purpose of this study is to disentangle their correlations. We assess the strength of each indicator by controlling for the other indicators under study. We find that the number of microRNA (miRNA) types that regulate a gene is the strongest rate indicator (a negative correlation), followed by disorder content (the percentage of disordered regions in a protein, a positive correlation); the strength of disorder content as a rate indicator is substantially increased after controlling for the number of miRNA types. By dividing proteins into lowly and highly intrinsically disordered proteins (L-IDPs and H-IDPs), we find that proteins interacting with more H-IDPs tend to evolve more slowly, which largely explains the previous observation of a negative correlation between the number of protein-protein interactions and evolutionary rate. Moreover, all of the indicators examined here, except for the number of miRNA types, have different strengths in L-IDPs and in H-IDPs. Finally, the number of phosphorylation sites is weakly correlated with the number of miRNA types, and its strength as a rate indicator is substantially reduced when other indicators are considered. Our study reveals the relative strength of each rate indicator and increases our understanding of protein evolution. © 2011 The Author.
Hsieh C.-H.,Institute of Information Science |
Liu J.-S.,Institute of Information Science
IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM | Year: 2012
This paper address the set-point regulation problem of a nonholonomic wheeled mobile robot with obstacle avoidance in a known dynamic environment populated with static and moving obstacles subject to robot kinematic and dynamic constraints by using the nonlinear model predictive control in polar coordinate. The terminal state penalty, terminal state constraints, and the input saturation constraints are taken into consideration in this optimization problem to guarantee the closed-loop regulation performance and stability. Simulation results are shown for illustrating the effectiveness of the control algorithm in steering a nonholonomic wheeled mobile robot. © 2012 IEEE.
Lin J.-W.,Tunghai University |
Lin F.-S.,Institute of Information Science
11th International Symposium on Communications and Information Technologies, ISCIT 2011 | Year: 2011
Most legacy computer systems only well support input and display of 20,902 Han characters (Hanzis for short) encoded in Unicode 1.0. In 2010, Unicode 6.0 has encoded 75,616 Hanzis. However, it is not easy to use these newly encoded Hanzis, even in the latest computers. Most of these newly encoded Hanzis are rarely used in daily lives. Some are only used in ancient literature or individual Sinospherical countries. Users may have confusion of their glyph shapes, pronunciations, meanings, and usages. Most Chinese IMEs (input method editors) require users to have good knowledge of Hanzis. As a result, users cannot input these Hanzis. We present an auxiliary Unicode Hanzi lookup service based on glyph shape similarity. One can key in a similar Hanzi by any IME to look up the wanted Hanzi. Each Unicode Hanzi is decomposed as a glyph expression. The similarity of glyph shapes of two Hanzis is calculated based on a derived edit distance on their glyph expressions. As a result, the system provides users a convenient way to look up unfamiliar Hanzis. © 2011 IEEE.
Huang P.-C.,Institute of Information Science |
Chang Y.-H.,Institute of Information Science |
Lam K.-Y.,University of Hong Kong |
Wang J.-T.,University of Hong Kong |
Huang C.-C.,Institute of Information Science
ACM Transactions on Design Automation of Electronic Systems | Year: 2014
Recently, flash-based embedded databases have gained their momentum in various control and monitoring systems, such as cyber-physical systems (CPSes). To support the functionality to access the historical data, a multiversion index is adopted to simultaneously maintain multiple versions of data items, as well as their index information. However, maintaining a multiversion index on flash memory incurs considerable performance overheads on garbage collection, which is to reclaim the spaces occupied by the outdated/invalid data items and their index information on flash memory. In this work, we propose an efficient garbage collection strategy to solve the garbage collection issues of flash-based multiversion databases. In particular, a version-trackingmethod is proposed to accelerate the performance on the process on identifying/reclaiming the space of invalid data and their indexes, and a pre-summary method is also designed to solve the cascading update problem that is caused by the write-once nature of flashmemory and is worsened when more versions refer to the same data item. The capability of the proposed strategy is then verified by analytical and experimental studies. © 2014 ACM 1084-4309/2014/06-ART25 $15.00.