Time filter

Source Type

Sun X.,Australian Council for Educational Research
Studies in Computational Intelligence | Year: 2014

In this chapter, we propose a new approach to find the most dependent test items in students' response data by adopting the concept of entropy from information theory. We define a distance metric to measures the amount of mutual independency between two items, and it is used to quantify how independent two items are in a test. Based on the proposed measurement, we present a simple yet efficient dependency tree searching algorithm to find the best dependency tree from the students' response data, which shows the hierarchical relationship between test items. The extensive experimental study has been performed on synthetic datasets, and results show that the proposed algorithm for finding the best dependency tree is fast and scalable, and the comparison with item correlations has been made to confirm the effectiveness of the approach. Finally, we discuss the possible extension of the method to find dependent item sets and to determine dimensions and sub-dimensions from the data. © 2014 Springer International Publishing Switzerland. Source

Edwards D.,Monash University | Friedman T.,Australian Council for Educational Research | Pearce J.,University of Melbourne
BMC Medical Education | Year: 2013

Background: Admission to medical school is one of the most highly competitive entry points in higher education. Considerable investment is made by universities to develop selection processes that aim to identify the most appropriate candidates for their medical programs. This paper explores data from three undergraduate medical schools to offer a critical perspective of predictive validity in medical admissions. Methods. This study examined 650 undergraduate medical students from three Australian universities as they progressed through the initial years of medical school (accounting for approximately 25 per cent of all commencing undergraduate medical students in Australia in 2006 and 2007). Admissions criteria (aptitude test score based on UMAT, school result and interview score) were correlated with GPA over four years of study. Standard regression of each of the three admissions variables on GPA, for each institution at each year level was also conducted. Results: Overall, the data found positive correlations between performance in medical school, school achievement and UMAT, but not interview. However, there were substantial differences between schools, across year levels, and within sections of UMAT exposed. Despite this, each admission variable was shown to add towards explaining course performance, net of other variables. Conclusion: The findings suggest the strength of multiple admissions tools in predicting outcomes of medical students. However, they also highlight the large differences in outcomes achieved by different schools, thus emphasising the pitfalls of generalising results from predictive validity studies without recognising the diverse ways in which they are designed and the variation in the institutional contexts in which they are administered. The assumption that high-positive correlations are desirable (or even expected) in these studies is also problematised. © 2013 Edwards et al.; licensee BioMed Central Ltd. Source

Sun X.,Australian Council for Educational Research | Li M.,University of Southern Queensland | Wang H.,University of Southern Queensland
Future Generation Computer Systems | Year: 2011

Privacy preservation is an important issue in the release of data for mining purposes. Recently, a novel l-diversity privacy model was proposed. However, even an l-diverse data set may have some severe problems leading to the revelation of individual sensitive information. In this paper, we remedy the problem by introducing distinct (l,α)-diversity, which, intuitively, demands that the total weight of the sensitive values in a given QI-group is at least α, where the weight is controlled by a pre-defined recursive metric system. We provide a thorough analysis of the distinct (l,α)-diversity and prove that the optimal distinct (l,α)-diversity problem with its two variants entropy (l,α)-diversity and recursive (c,l,α)-diversity are NP-hard, and propose a top-down anonymization approach to solve the distinct (l,α)-diversity problem with its variants. We show in the extensive experimental evaluations that the proposed methods are practical in terms of utility measurements and can be implemented efficiently. © 2010 Elsevier Inc. All rights reserved. Source

Li M.,University of Southern Queensland | Sun X.,Australian Council for Educational Research | Wang H.,University of Southern Queensland | Zhang Y.,Victoria University of Melbourne | Zhang J.,University of Southern Queensland
World Wide Web | Year: 2011

With the significant development of mobile commerce, privacy becomes a major concern for both customers and enterprises. Although data generalization can provide significant protection of an individual's privacy, over-generalized data may render data of little value or useless. In this paper, we devise generalization boundary techniques to maximize data usability while, minimizing disclosure of privacy. Inspired by the fact that the permissible generalization level results in a much finer level access control, we propose a privacy-aware access control model in web service environments. We also analyze how to manage a valid access process through a trust-based decision and ongoing access control policies. The extensive experiments on both real-world and synthetic data sets show that the proposed privacy aware access control model is practical and effective. © 2011 Springer Science+Business Media, LLC. Source

Sun X.,Australian Council for Educational Research | Wang H.,University of Southern Queensland | Li J.,University of South Australia | Zhang Y.,Victoria University of Melbourne
Computer Journal | Year: 2012

In this paper, we study a problem of protecting privacy of individuals in large public survey rating data. We propose a novel (k,ε, l)-anonymity model to protect privacy in large survey rating data, in which each survey record is required to be similar to at least k-1 other records based on the non-sensitive ratings, where the similarity is controlled by ε, and the standard deviation of sensitive ratings is at least l. We study an interesting yet non-trivial satisfaction problem of the proposed model, which is to decide whether a survey rating data set satisfies the privacy requirements given by the user. For this problem, we investigate its inherent properties theoretically, and devise a novel slicing technique to solve it. We analyze the computation complexity of the proposed slicing technique and conduct extensive experiments on two real-life data sets, and the results show that the slicing technique is fast and scalable with data size and much more efficient in terms of execution time and space overhead than the heuristic pairwise method. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. Source

Discover hidden collaborations