Posts Tagged ‘techtalk’

Seattle Conference on Scalability: Scalable Wikipedia with E

February 8, 2010 - 7:37 pm 6 Comments

Google Tech Talks
June 14, 2008

ABSTRACT

IGlobal online services at Amazon, eBay, Myspace, YouTube, or Google serve millions of customers with tens of thousands of servers located throughout the world. At this scale, components fail continuously and it is difficult to maintain a consistent state while hiding failures from the application. Peer-to-peer protocols provide availability by replicating services among peers, but they are mostly limited to write-once/read-many data sharing. To extend them beyond the typical file sharing, the support of fast transactions on distributed hash tables (DHTs) is an important yet missing feature.

We will present a distributed key/value store based on a DHT that supports consistent writes. Our system comprises three layers:

- a DHT layer for scalable, reliable access to replicated data,
- a transaction layer to ensure data consistency in the face of concurrent write operations,
- an application layer with an extremely high access rate.
For the application layer, we selected a distributed, scalable Wiki with full transaction support. We will show that our Wiki outperforms the public Wikipedia in terms of served page requests per second and
we will discuss how the development of the distributed code benefited from the use of Erlang.

This is joint work of Zuse Institute Berlin and onScale solutions GmbH.

Speaker: Thorsten Schuett, Zuse Institute Berlin
Thorsten Schütt is a senior researcher with the Zuse Institute Berlin (ZIB) and a co-founder of onScale solutions GmbH. He received a CS diploma with distinction in 2002 from the Technical University Berlin. Since then he works as a research staff member in the Computer Science Research Department at ZIB and participates in several EU projects like GridLab, XtreemOS and Selfman. He is the principal system architect of the scalable, transactional key/value store at ZIB. His research interests include distributed data management, scalable grid systems, p2p algorithms and self-managing transactional
storage systems.

Slides for this talk are available at http://groups.google.com/group/seattle-scalability-conference

Duration : 0:26:31

(more…)

Seattle Conference on Scalability: CARMEN: A Scalable Scienc

December 28, 2009 - 4:05 am 4 Comments

Google Tech Talks
June 14, 2008

ABSTRACT

CARMEN is a $9M project building a scalable science cloud. Its focus is on supporting neuroscientists who will use it to store, share and analyze 100s of TBs of data. Understanding how the brain works is a major scientific challenge which will benefit medicine, biology and computer science. Globally, over 100,000 neuroscientists are working on this problem. However, the data that forms the basis for their work is rarely shared even though it is difficult and expensive to produce.

The CARMEN project (www.carmen.org.uk) is addressing these challenges by developing a scalable cloud architecture to enable data sharing, integration, and analysis supported by metadata. An expandable range of services are provided in the cloud to extract value from raw and transformed data. This promotes the sharing of analysis services as well as data, and allows services to execute close to the data on which they operate. This is essential to avoid having to ship vast quantities (TBs) of data out of the cloud to the user’s machine for analysis. Internally, the CARMEN cloud is built as a set of Web Services. Through experience of a wide variety of e-scientific projects over the past 8 years, we have identified a core set of generic services that we believe are needed to support science. These services, their scalability issues and novel features are:

- Data repository. Most of the primary data is time series signal data. Searching for patterns (such as neuronal spikes) is a key requirement. CARMEN uses a novel parallel search infrastructure to find patterns quickly, even in vast quantities of data.

- Metadata repository. Users need to be able to quickly search metadatametdata describing tens of thousands of datasets in order to locate data that is of interest. Ontologies are used to structure experimental metadata, and techniques are needed to quickly search this type of data.

- Service repository and dynamic deployment. A novel feature of the architecture is that the analysis services are stored in a repository in the cloud. Users can write services in a variety of languages, package them as web services and then upload them into the cloud. These are then dynamically deployed on compute nodes as required to meet user requests.

- Workflow Enactment Engine. Users can build workflows from the available services in order to orchestrate the entire process of analysis. These are then executed in the cloud.

- Security. Scientists wish to control precisely who has access to their data and services. This service ensures that these desires are met.

The talk will describe the design of the CARMEN system and show how it addresses the key scalability issues. It will cover the cloud services, explaining how each is designed to scale up to support thousands of users analysing TBs of data. We will present results from the CARMEN prototype to illustrate solutions and issues.

Speaker: Paul Watson
Paul Watson is Professor of Computer Science and Director of the North East Regional e-Science Centre. He graduated in 1983 with a BSc (I) in Computer Engineering from Manchester University, followed by a PhD in 1986. In the 80s, as a Lecturer at Manchester University, he was a designer of the Alvey Flagship and Esprit EDS systems. From 1990-5 he worked for ICL as a system designer of the Goldrush MegaServer parallel database server, which was released as a product in 1994. In August 1995 he moved to Newcastle University, where he has been an investigator on research projects worth over $20M. His research interests are in scalable information management, in particular parallel database systems and data-intensive e-science.

Slides for this talk are available at http://groups.google.com/group/seattle-scalability-conference

Duration : 0:27:39

(more…)