Entity

Time filter

Source Type

UCT
Austin, TX, United States

The University of Cape Town is a public research university located in Cape Town in the Western Cape province of South Africa. UCT was founded in 1829 as the South African College, and is the oldest university in South Africa and the second oldest extant university in Africa. The language of instruction is English. Wikipedia.


Background: Anorexia Nervosa (AN) is a debilitating, sometimes fatal eating disorder (ED) whereby restraint of appetite and emotion is concomitant with an inflexible, attention-to-detail perfectionist cognitive style and obsessive-compulsive behaviour. Intriguingly, people with AN are less likely to engage in substance use, whereas those who suffer from an ED with a bingeing component are more vulnerable to substance use disorder (SUD). Discussion: This insight into a beneficial consequence of appetite control in those with AN, which is shrouded by the many other unhealthy, excessive and deficit symptoms, may provide some clues as to how the brain could be trained to exert better, sustained control over appetitive and impulsive processes. Structural and functional brain imaging studies implicate the executive control network (ECN) and the salience network (SN) in the neuropathology of AN and SUD. Additionally, excessive employment of working memory (WM), alongside more prominent cognitive deficits may be utilised to cope with the experience of negative emotions and may account for aberrant brain function. Summary: WM enables mental rehearsal of cognitive strategies while regulating, restricting or avoiding neural responses associated with the SN. Therefore, high versus low WM capacity may be one of the factors that unites common cognitive and behavioural symptoms in those suffering from AN and SUD respectively. Furthermore, emerging evidence suggests that by evoking neural plasticity in the ECN and SN with WM training, improvements in neurocognitive function and cognitive control can be achieved. Thus, considering the neurocognitive processes of excessive appetite control and how it links to WM in AN may aid the application of adjunctive treatment for SUD. © 2016 Brooks. Source


Weinberg E.G.,UCT
Current Allergy and Clinical Immunology | Year: 2011

The World Allergy Organization (WAO) White Book on Allergy presents an up-to-date review of the specialty of allergology from a global perspective. This book is of great importance and relevance for the practice of allergology in South Africa. It is of particular interest at this momentous time in the history of allergology in this country. Allergology has just been recognised as a subspecialty of internal medicine, paediatrics and general practice. Source


Anderson P.M.L.,UCT | O'Farrell P.J.,South African Council for Scientific and Industrial Research
Ecology and Society | Year: 2012

Rapid global urbanization and the knowledge that ecological systems underpin the future sustainability and resilience of our cities, make an understanding of urban ecology critical. The way humans engage with ecological processes within cities is highly complex, and both from a social and ecological perspective these engagements cannot be interpreted meaningfully on the basis of a single timeframe. Historical analyses offer useful insights into the nature of social-ecological interactions under diverse conditions, enabling improved decision-making into the future. We present an historical review of the evolving relationship between the urban settlement of Cape Town and the ecological processes inherent to its natural surroundings. Since its establishment, the people of Cape Town have been acutely aware of, and exploited, the natural resources presented by Table Mountain and its surrounding wilderness area. An examination of this pattern of engagement, explored through an ecological process lens, in particular drawing on the terminology provided by the ecosystem services framework, reflects a journey of the changing needs and demands of a growing urban settlement. Ecological processes, and their ensuing flow of ecosystem services, have been exploited, overexploited, interrupted, reestablished, conserved, and variably valued through time. Processes of significance, for example water provision, soil erosion, the provision of wood and natural materials, and the role of fire, are presented. This historical analysis documents the progression from a wilderness to a tamed and largely benign urban environment. Evident is the variable valuing of ecosystem service attributes through time and by different people, at the same time, dependent on their immediate needs. © 2012 by the author(s). Source


News Article | April 13, 2016
Site: http://www.techtimes.com/rss/sections/science.xml

A mysterious alignment has been witnessed in a remote area of the universe. Sixty-four supermassive black holes have been observed to be spinning out radio jets from their centers, all pointing towards the same direction. Black holes are well known to emit radio emissions. However, this is the first time their alignment is of such a great magnitude. This phenomenon implies that the force governing these black holes is much greater and older, hence the alignment has been linked to "primordial mass fluctuations" in the early universe. "Since these black holes don't know about each other, or have any way of exchanging information or influencing each other directly over such vast scales, this spin alignment must have occurred during the formation of the galaxies in the early universe," said Professor Andrew Russ Taylor, joint UWC/UCT SKA Chair, Director of the recently launched Inter-University Institute for Data Intensive Astronomy, and principal author of the Monthly Notices study. The astronomers have been puzzled over this alignment and have speculated a few theories that could have been responsible for triggering this large scale phenomenon. Few of the speculated theories include cosmic strings – theoretical fault lines in the universe, exotic particles like axions or cosmic magnetic fields, or maybe something entirely different altogether, which is yet to be ascertained. Experts said the recent observation of black hole alignment could provide evidence of the environmental influences that contributed to the formation and evolution of galaxies as well as the primordial fluctuations that brought about the structure of the universe. This strange phenomenon was captured as a result of three years of deep radio imaging carried out by the Giant Metrewave Radio Telescope (GMRT) located in India. The alignment may hold clues about the early universe when the black holes had initially formed. The study was published in the Monthly Notices of the Royal Astronomical Society. © 2016 Tech Times, All rights reserved. Do not reproduce without permission.


News Article
Site: http://www.nature.com/nature/current_issue/

Many games of perfect information, such as chess, checkers, othello, backgammon and Go, may be defined as alternating Markov games39. In these games, there is a state space (where state includes an indication of the current player to play); an action space defining the legal actions in any given state s ∈  ; a state transition function f(s, a, ξ) defining the successor state after selecting action a in state s and random input ξ (for example, dice); and finally a reward function ri(s) describing the reward received by player i in state s. We restrict our attention to two-player zero-sum games, r1(s) = −r2(s) = r(s), with deterministic state transitions, f(s, a, ξ) = f(s, a), and zero rewards except at a terminal time step T. The outcome of the game z  = ±r(s ) is the terminal reward at the end of the game from the perspective of the current player at time step t. A policy p(a|s) is a probability distribution over legal actions . A value function is the expected outcome if all actions for both players are selected according to policy p, that is,  . Zero-sum games have a unique optimal value function v*(s) that determines the outcome from state s following perfect play by both players, The optimal value function can be computed recursively by minimax (or equivalently negamax) search40. Most games are too large for exhaustive minimax tree search; instead, the game is truncated by using an approximate value function v(s) ≈ v*(s) in place of terminal rewards. Depth-first minimax search with alpha–beta pruning40 has achieved superhuman performance in chess4, checkers5 and othello6, but it has not been effective in Go7. Reinforcement learning can learn to approximate the optimal value function directly from games of self-play39. The majority of prior work has focused on a linear combination v (s) = φ(s) · θ of features φ(s) with weights θ. Weights were trained using temporal-difference learning41 in chess42, 43, checkers44, 45 and Go30; or using linear regression in othello6 and Scrabble9. Temporal-difference learning has also been used to train a neural network to approximate the optimal value function, achieving superhuman performance in backgammon46; and achieving weak kyu-level performance in small-board Go28, 29, 47 using convolutional networks. An alternative approach to minimax search is Monte Carlo tree search (MCTS)11, 12, which estimates the optimal value of interior nodes by a double approximation, . The first approximation, , uses n Monte Carlo simulations to estimate the value function of a simulation policy Pn. The second approximation, , uses a simulation policy Pn in place of minimax optimal actions. The simulation policy selects actions according to a search control function , such as UCT12, that selects children with higher action values, Qn(s, a) = −Vn(f(s, a)), plus a bonus u(s, a) that encourages exploration; or in the absence of a search tree at state s, it samples actions from a fast rollout policy  . As more simulations are executed and the search tree grows deeper, the simulation policy becomes informed by increasingly accurate statistics. In the limit, both approximations become exact and MCTS (for example, with UCT) converges12 to the optimal value function . The strongest current Go programs are based on MCTS13, 14, 15, 36. MCTS has previously been combined with a policy that is used to narrow the beam of the search tree to high-probability moves13; or to bias the bonus term towards high-probability moves48. MCTS has also been combined with a value function that is used to initialize action values in newly expanded nodes16, or to mix Monte Carlo evaluation with minimax evaluation49. By contrast, AlphaGo’s use of value functions is based on truncated Monte Carlo search algorithms8, 9, which terminate rollouts before the end of the game and use a value function in place of the terminal reward. AlphaGo’s position evaluation mixes full rollouts with truncated rollouts, resembling in some respects the well-known temporal-difference learning algorithm TD(λ). AlphaGo also differs from prior work by using slower but more powerful representations of the policy and value function; evaluating deep neural networks is several orders of magnitude slower than linear representations and must therefore occur asynchronously. The performance of MCTS is to a large degree determined by the quality of the rollout policy. Prior work has focused on handcrafted patterns50 or learning rollout policies by supervised learning13, reinforcement learning16, simulation balancing51, 52 or online adaptation30, 53; however, it is known that rollout-based position evaluation is frequently inaccurate54. AlphaGo uses relatively simple rollouts, and instead addresses the challenging problem of position evaluation more directly using value networks. To efficiently integrate large neural networks into AlphaGo, we implemented an asynchronous policy and value MCTS algorithm (APV-MCTS). Each node s in the search tree contains edges (s, a) for all legal actions . Each edge stores a set of statistics, where P(s, a) is the prior probability, W (s, a) and W (s, a) are Monte Carlo estimates of total action value, accumulated over N (s, a) and N (s, a) leaf evaluations and rollout rewards, respectively, and Q(s, a) is the combined mean action value for that edge. Multiple simulations are executed in parallel on separate search threads. The APV-MCTS algorithm proceeds in the four stages outlined in Fig. 3. Selection (Fig. 3a). The first in-tree phase of each simulation begins at the root of the search tree and finishes when the simulation reaches a leaf node at time step L. At each of these time steps, t < L, an action is selected according to the statistics in the search tree, using a variant of the PUCT algorithm48, , where c is a constant determining the level of exploration; this search control strategy initially prefers actions with high prior probability and low visit count, but asymptotically prefers actions with high action value. Evaluation (Fig. 3c). The leaf position s is added to a queue for evaluation v (s ) by the value network, unless it has previously been evaluated. The second rollout phase of each simulation begins at leaf node s and continues until the end of the game. At each of these time-steps, t ≥ L, actions are selected by both players according to the rollout policy, . When the game reaches a terminal state, the outcome is computed from the final score. Backup (Fig. 3d). At each in-tree step t ≤ L of the simulation, the rollout statistics are updated as if it has lost n games, N (s , a ) ← N (s , a ) + n ; W (s , a ) ← W (s , a ) −n ; this virtual loss55 discourages other threads from simultaneously exploring the identical variation. At the end of the simulation, t he rollout statistics are updated in a backward pass through each step t ≤ L, replacing the virtual losses by the outcome, N (s , a ) ← N (s , a ) −n  + 1; W (s , a ) ← W (s , a ) + n  + z . Asynchronously, a separate backward pass is initiated when the evaluation of the leaf position s completes. The output of the value network v (s ) is used to update value statistics in a second backward pass through each step t ≤ L, N (s , a ) ← N (s , a ) + 1, W (s , a ) ← W (s , a ) + v (s ). The overall evaluation of each state action is a weighted average of the Monte Carlo estimates, , that mixes together the value network and rollout evaluations with weighting parameter λ. All updates are performed lock-free56. Expansion (Fig. 3b). When the visit count exceeds a threshold, N (s, a) > n , the successor state s′ = f(s, a) is added to the search tree. The new node is initialized to {N (s′, a) = N (s′, a) = 0, W (s′, a) = W (s′, a) = 0, P(s′,a) = p (a|s′)}, using a tree policy p (a|s′) (similar to the rollout policy but with more features, see Extended Data Table 4) to provide placeholder prior probabilities for action selection. The position s′ is also inserted into a queue for asynchronous GPU evaluation by the policy network. Prior probabilities are computed by the SL policy network with a softmax temperature set to β; these replace the placeholder prior probabilities, , using an atomic update. The threshold n is adjusted dynamically to ensure that the rate at which positions are added to the policy queue matches the rate at which the GPUs evaluate the policy network. Positions are evaluated by both the policy network and the value network using a mini-batch size of 1 to minimize end-to-end evaluation time. We also implemented a distributed APV-MCTS algorithm. This architecture consists of a single master machine that executes the main search, many remote worker CPUs that execute asynchronous rollouts, and many remote worker GPUs that execute asynchronous policy and value network evaluations. The entire search tree is stored on the master, which only executes the in-tree phase of each simulation. The leaf positions are communicated to the worker CPUs, which execute the rollout phase of simulation, and to the worker GPUs, which compute network features and evaluate the policy and value networks. The prior probabilities of the policy network are returned to the master, where they replace placeholder prior probabilities at the newly expanded node. The rewards from rollouts and the value network outputs are each returned to the master, and backed up the originating search path. At the end of search AlphaGo selects the action with maximum visit count; this is less sensitive to outliers than maximizing action value15. The search tree is reused at subsequent time steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded. The match version of AlphaGo continues searching during the opponent’s move. It extends the search if the action maximizing visit count and the action maximizing action value disagree. Time controls were otherwise shaped to use most time in the middle-game57. AlphaGo resigns when its overall evaluation drops below an estimated 10% probability of winning the game, that is, . AlphaGo does not employ the all-moves-as-first10 or rapid action value estimation58 heuristics used in the majority of Monte Carlo Go programs; when using policy networks as prior knowledge, these biased heuristics do not appear to give any additional benefit. In addition AlphaGo does not use progressive widening13, dynamic komi59 or an opening book60. The parameters used by AlphaGo in the Fan Hui match are listed in Extended Data Table 5. The rollout policy is a linear softmax policy based on fast, incrementally computed, local pattern-based features consisting of both ‘response’ patterns around the previous move that led to state s, and ‘non-response’ patterns around the candidate move a in state s. Each non-response pattern is a binary feature matching a specific 3 × 3 pattern centred on a, defined by the colour (black, white, empty) and liberty count (1, 2, ≥3) for each adjacent intersection. Each response pattern is a binary feature matching the colour and liberty count in a 12-point diamond-shaped pattern21 centred around the previous move. Additionally, a small number of handcrafted local features encode common-sense Go rules (see Extended Data Table 4). Similar to the policy network, the weights π of the rollout policy are trained from 8 million positions from human games on the Tygem server to maximize log likelihood by stochastic gradient descent. Rollouts execute at approximately 1,000 simulations per second per CPU thread on an empty board. Our rollout policy p (a|s) contains less handcrafted knowledge than state-of-the-art Go programs13. Instead, we exploit the higher-quality action selection within MCTS, which is informed both by the search tree and the policy network. We introduce a new technique that caches all moves from the search tree and then plays similar moves during rollouts; a generalization of the ‘last good reply’ heuristic53. At every step of the tree traversal, the most probable action is inserted into a hash table, along with the 3 × 3 pattern context (colour, liberty and stone counts) around both the previous move and the current move. At each step of the rollout, the pattern context is matched against the hash table; if a match is found then the stored move is played with high probability. In previous work, the symmetries of Go have been exploited by using rotationally and reflectionally invariant filters in the convolutional layers24, 28, 29. Although this may be effective in small neural networks, it actually hurts performance in larger networks, as it prevents the intermediate filters from identifying specific asymmetric patterns23. Instead, we exploit symmetries at run-time by dynamically transforming each position s using the dihedral group of eight reflections and rotations, d (s), …, d (s). In an explicit symmetry ensemble, a mini-batch of all 8 positions is passed into the policy network or value network and computed in parallel. For the value network, the output values are simply averaged, . For the policy network, the planes of output probabilities are rotated/reflected back into the original orientation, and averaged together to provide an ensemble prediction, ; this approach was used in our raw network evaluation (see Extended Data Table 3). Instead, APV-MCTS makes use of an implicit symmetry ensemble that randomly selects a single rotation/reflection j ∈ [1, 8] for each evaluation. We compute exactly one evaluation for that orientation only; in each simulation we compute the value of leaf node s by v (d (s )), and allow the search procedure to average over these evaluations. Similarly, we compute the policy network for a single, randomly selected rotation/reflection, . We trained the policy network p to classify positions according to expert moves played in the KGS data set. This data set contains 29.4 million positions from 160,000 games played by KGS 6 to 9 dan human players; 35.4% of the games are handicap games. The data set was split into a test set (the first million positions) and a training set (the remaining 28.4 million positions). Pass moves were excluded from the data set. Each position consisted of a raw board description s and the move a selected by the human. We augmented the data set to include all eight reflections and rotations of each position. Symmetry augmentation and input features were pre-computed for each position. For each training step, we sampled a randomly selected mini-batch of m samples from the augmented KGS data set, and applied an asynchronous stochastic gradient descent update to maximize the log likelihood of the action, . The step size α was initialized to 0.003 and was halved every 80 million training steps, without momentum terms, and a mini-batch size of m = 16. Updates were applied asynchronously on 50 GPUs using DistBelief 61; gradients older than 100 steps were discarded. Training took around 3 weeks for 340 million training steps. We further trained the policy network by policy gradient reinforcement learning25, 26. Each iteration consisted of a mini-batch of n games played in parallel, between the current policy network p that is being trained, and an opponent that uses parameters ρ− from a previous iteration, randomly sampled from a pool of opponents, so as to increase the stability of training. Weights were initialized to ρ = ρ− = σ. Every 500 iterations, we added the current parameters ρ to the opponent pool. Each game i in the mini-batch was played out until termination at step Ti, and then scored to determine the outcome from each player’s perspective. The games were then replayed to determine the policy gradient update, , using the REINFORCE algorithm25 with baseline for variance reduction. On the first pass through the training pipeline, the baseline was set to zero; on the second pass we used the value network v (s) as a baseline; this provided a small performance boost. The policy network was trained in this way for 10,000 mini-batches of 128 games, using 50 GPUs, for one day. We trained a value network to approximate the value function of the RL policy network p . To avoid overfitting to the strongly correlated positions within games, we constructed a new data set of uncorrelated self-play positions. This data set consisted of over 30 million positions, each drawn from a unique game of self-play. Each game was generated in three phases by randomly sampling a time step U ~ unif{1, 450}, and sampling the first t = 1,… U − 1 moves from the SL policy network, a  ~ p (·|s ); then sampling one move uniformly at random from available moves, a  ~ unif{1, 361} (repeatedly until a is legal); then sampling the remaining sequence of moves until the game terminates, t = U + 1, … T, from the RL policy network, a  ~ p (·|s ). Finally, the game is scored to determine the outcome z  = ±r(s ). Only a single training example (s , z ) is added to the data set from each game. This data provides unbiased samples of the value function . During the first two phases of generation we sample from noisier distributions so as to increase the diversity of the data set. The training method was identical to SL policy network training, except that the parameter update was based on mean squared error between the predicted values and the observed rewards, . The value network was trained for 50 million mini-batches of 32 positions, using 50 GPUs, for one week. Each position s was pre-processed into a set of 19 × 19 feature planes. The features that we use come directly from the raw representation of the game rules, indicating the status of each intersection of the Go board: stone colour, liberties (adjacent empty points of stone’s chain), captures, legality, turns since stone was played, and (for the value network only) the current colour to play. In addition, we use one simple tactical feature that computes the outcome of a ladder search7. All features were computed relative to the current colour to play; for example, the stone colour at each intersection was represented as either player or opponent rather than black or white. Each integer feature value is split into multiple 19 × 19 planes of binary values (one-hot encoding). For example, separate binary feature planes are used to represent whether an intersection has 1 liberty, 2 liberties,…, ≥8 liberties. The full set of feature planes are listed in Extended Data Table 2. The input to the policy network is a 19 × 19 × 48 image stack consisting of 48 feature planes. The first hidden layer zero pads the input into a 23 × 23 image, then convolves k filters of kernel size 5 × 5 with stride 1 with the input image and applies a rectifier nonlinearity. Each of the subsequent hidden layers 2 to 12 zero pads the respective previous hidden layer into a 21 × 21 image, then convolves k filters of kernel size 3 × 3 with stride 1, again followed by a rectifier nonlinearity. The final layer convolves 1 filter of kernel size 1 × 1 with stride 1, with a different bias for each position, and applies a softmax function. The match version of AlphaGo used k = 192 filters; Fig. 2b and Extended Data Table 3 additionally show the results of training with k = 128, 256 and 384 filters. The input to the value network is also a 19 × 19 × 48 image stack, with an additional binary feature plane describing the current colour to play. Hidden layers 2 to 11 are identical to the policy network, hidden layer 12 is an additional convolution layer, hidden layer 13 convolves 1 filter of kernel size 1 × 1 with stride 1, and hidden layer 14 is a fully connected linear layer with 256 rectifier units. The output layer is a fully connected linear layer with a single tanh unit. We evaluated the relative strength of computer Go programs by running an internal tournament and measuring the Elo rating of each program. We estimate the probability that program a will beat program b by a logistic function , and estimate the ratings e(·) by Bayesian logistic regression, computed by the BayesElo program37 using the standard constant c  = 1/400. The scale was anchored to the BayesElo rating of professional Go player Fan Hui (2,908 at date of submission)62. All programs received a maximum of 5 s computation time per move; games were scored using Chinese rules with a komi of 7.5 points (extra points to compensate white for playing second). We also played handicap games where AlphaGo played white against existing Go programs; for these games we used a non-standard handicap system in which komi was retained but black was given additional stones on the usual handicap points. Using these rules, a handicap of K stones is equivalent to giving K − 1 free moves to black, rather than K − 1/2 free moves using standard no-komi handicap rules. We used these handicap rules because AlphaGo’s value network was trained specifically to use a komi of 7.5. With the exception of distributed AlphaGo, each computer Go program was executed on its own single machine, with identical specifications, using the latest available version and the best hardware configuration supported by that program (see Extended Data Table 6). In Fig. 4, approximate ranks of computer programs are based on the highest KGS rank achieved by that program; however, the KGS version may differ from the publicly available version. The match against Fan Hui was arbitrated by an impartial referee. Five formal games and five informal games were played with 7.5 komi, no handicap, and Chinese rules. AlphaGo won these games 5–0 and 3–2 respectively (Fig. 6 and Extended Data Table 1). Time controls for formal games were 1 h main time plus three periods of 30 s byoyomi. Time controls for informal games were three periods of 30 s byoyomi. Time controls and playing conditions were chosen by Fan Hui in advance of the match; it was also agreed that the overall match outcome would be determined solely by the formal games. To approximately assess the relative rating of Fan Hui to computer Go programs, we appended the results of all ten games to our internal tournament results, ignoring differences in time controls.

Discover hidden collaborations