Entity

Time filter

Source Type


Liu J.,Key Laboratory of Data Engineering and Knowledge Engineering | Liu J.,Renmin University of China | Liu J.,National Computer System Engineering Research Institute of China | Chai Y.,Key Laboratory of Data Engineering and Knowledge Engineering | And 3 more authors.
IEEE Symposium on Mass Storage Systems and Technologies | Year: 2014

Data deduplication techniques improve cost efficiency by dramatically reducing space needs of storage systems. SSD-based data cache has been adopted to remedy the declining I/O performance induced by deduplication operations in the latency-sensitive primary storage. Unfortunately, frequent data updates caused by classical cache algorithms (e.g., FIFO, LRU, and LFU) inevitably slow down SSDs' I/O processing speed while significantly shortening SSDs' lifetime. To address this problem, we propose a new approach - PLC-Cache - to greatly improve the I/O performance as well as write durability of SSDs. PLC-Cache is conducive to amplifying the proportion of the Popular and Long-term Cached (PLC) data, which is infrequently written and kept in SSD cache in a long time period to catalyze cache hits, in an entire SSD written data set. PLC-Cache advocates a two-phase approach. First, non-popular data are ruled out from being written into SSDs. Second, PLC-Cache makes an effort to convert SSD written data into PLC-data as much as possible. Our experimental results based on a practical deduplication system indicate that compared with the existing caching schemes, PLC-Cache shortens data access latency by an average of 23.4%. Importantly, PLC-Cache improves the lifetime of SSD-based caches by reducing the amount of data written to SSDs by a factor of 15.7. © 2014 IEEE. Source


Gao R.,University of Chinese Academy of Sciences | Hao B.,University of Chinese Academy of Sciences | Li H.,National Computer System Engineering Research Institute of China | Gao Y.,University of Chinese Academy of Sciences | Zhu T.,University of Chinese Academy of Sciences
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

The words that people use could reveal their emotional states, intentions, thinking styles, individual differences, etc. LIWC (Linguistic Inquiry and Word Count) has been widely used for psychological text analysis, and its dictionary is the core. The Traditional Chinese version of LIWC dictionary has been released, which is a translation of LIWC English dictionary. However, Simplified Chinese which is the world's most widely used language has subtle differences with Traditional Chinese. Furthermore, both English LIWC dictionary and Traditional Chinese version dictionary were both developed for relatively formal text. Microblog has become more and more popular in China nowadays. Original LIWC dictionaries take less consideration on microblog popular words, which makes it less applicable for text analysis on microblog. In this study, a Simplified Chinese LIWC dictionary is established according to LIWC categories. After translating Traditional Chinese dictionary into Simplified Chinese, five thousand words most frequently used in microblog are added into the dictionary. Four graduate students of psychology rated whether each word belonged in a category. The reliability and validity of Simplified Chinese LIWC dictionary were tested by these four judges. This new dictionary could contribute to all the text analysis on microblog in future. © Springer International Publishing 2013. Source


Ning Y.,University of Chinese Academy of Sciences | Zhu T.,University of Chinese Academy of Sciences | Wang Y.,National Computer System Engineering Research Institute of China
ICPCA10 - 5th International Conference on Pervasive Computing and Applications | Year: 2010

When browsing news on the web, various emotions may be evoked in readers and furthermore cause different influence on their minds and life. We expect that emotional analysis and classification of text may provide good performance and significance to users surfing the Internet. Most previous research only focus on bi-emotion classification, that is, Positive and Negative, e.g., identifying whether a comment is for praising or criticizing. In this paper, we propose a χ2-based Chinese text emotion classification with five sentiment categories. We run two experiments, one uses sentiment words extracted from How Net and a Chinese thesaurus: TongYiCi CiLin, and the other is not. The results shows that adding affective words can make better prediction in the sentiment classification. ©2010 IEEE. Source


Fang Z.,National Computer System Engineering Research Institute of China | Ning Y.,University of Chinese Academy of Sciences | Zhu T.,University of Chinese Academy of Sciences
ICPCA10 - 5th International Conference on Pervasive Computing and Applications | Year: 2010

Internet is becoming an increasingly important platform for ordinary life and work. It is expected that keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. In this paper, we propose to use text clustering method based on semi-supervised learning to get focuses of social topics in a large amount of text. We develop a novel keyword extraction method named NATF-PDF, which is based on TFPDF algorithm, combined with supervised learning theory for keyword extraction. We compare its performance with TFIDF in comparison, and the results show that our method get better accuracy and recall ratio. ©2010 IEEE. Source


Li J.,CAS Beijing Institute of Acoustics | Xu Y.,CAS Beijing Institute of Acoustics | Xiong H.,CAS Beijing Institute of Acoustics | Wang Y.,National Computer System Engineering Research Institute of China
Proceedings - 2010 IEEE 2nd Symposium on Web Society, SWS 2010 | Year: 2010

Recently, much work have been done on text emotion classification. However, they mainly focused on the emotions expressed by authors instead of the readers. In addition, researches on simplified Chinese text emotion classification are extremely less. In this paper, we proposed a simplified Chinese text emotion classification based on readers' emotions. Mass of documents with readers' emotion tag are used as raw text sets, and Vector Space Model is used to represent each document. An emotion dictionary is created semi-automatically by using WordNet to build text vectors. We then train a Support Vector Machine classifier on preprocessed data with four emotion classes, and compared the predicate results with that from Naive Bayes classifier. Experiment results indicate that our approach performs much better on classify accuracy and efficiency. © 2010 IEEE. Source

Discover hidden collaborations