Microsoft's Internet Explorer browser
has no built-in vector graphics machinery
required for "loss-free" gradient background themes.
Please upgrade to a better browser
such as Firefox, Opera,
Safari or others
with built-in vector graphics machinery and much more.
(Learn more or post questions or comments
at the Slide Show (S9)
project site. Thanks!)
Plasma
graph based distributed computing
Big Picture
Europe has about 490,000,000 people, 7.3% of global population
60% penetration => 20.8 % of total internet usage
(Internet World Stats)
Between 50% and 65% of internet traffic is from P2P downloads
61% video
11.3% audio
27.2% applications
avg. file size of 1 gigabyte
(CacheLogic)
Google has more than 450,000 servers, with another couple hundred
thousand on the way.
8,180 racks per data center x 40 dual core CPUs per rack = 654,400 CPUs!
$1.5 billion per year to fund server growth
(New York Times, ZDnet)
Centralized servers are being decentralized
Modern servers are distributed systems using P2P technology
Amazon's Dynamo, EC2, Simple-DB
Google's MapReduce, Big Table
IBM and Sun compute clusters
Who are the mothers
Data driven information assistant
Mother please
Distributed recommendation systems
community structuring of shared content
research areas
document ratings
interest feeds
Community based paper review
And this is just the beginning...
The state of networking
Transparency:
location
concurrency
replication
failures
mobility
Protocol level
Structured streams
Asynchronous message passing and mailboxes
SOAP, RMI, CORBA
REST and resource based interfaces
Middleware
tuple spaces
shared queues
distributed object systems
P2P frameworks
Distributed hash table (DHT)
probabilistic search (gossip, random walk)
P2P Topologies: Structured Geometries
network[object-id] => object-address
Unstructured "Random" Graphs
network.search(query) => result-objects
A shared substrate
Published at IEEE Hot P2P 2007
NAT, sockets, memory, bandwidth, and complexity
A multi-service abstraction
Service
Topology
Purpose
Distributed hash table
ring (hypercube)
object lookup by ID
Random walk search
random graph
supports arbitrary queries
Local flood
star field
neighborhood communications
Probabilistic Gossip
random graph
maintenance, monitoring, measurement
Torrent and data stream
trees and forrests
media streaming
Data driven flow
trees and forrests
content based routing
Dynamic flow graphs
directed graph pipeline
media streams and streaming databases
Distributed clustering
small world graph
recommendation systems
P2P and distributed computing systems are too complex
NAT traversal, resource usage, timeouts
failure, churn, maintenance
consistency and synchronization with massive parallelism
Research Problem
How can we support:
data driven algorithms
multiple topologies
concise, re-usable data operations
efficient implementations
On a wide range of targets:
desktop applications (torrents, tunes and bibliographies)
data center services (Amazon, Google, Slashdot)
distributed data mining (research documents, gene profiles)
Back to the roots
Complexity, meetabstraction.
Data driven = information structures
Data driven algorithms require
mechanisms to represent application information
a way to query and manipulate the information
networking capabilities and topology representation
Bachman's Network Model
Navigational data access
Codd's Relational Model
Logical layer vs storage layer
Simple to understand schema
Sound theoretical framework for data access
What's in a database?
Efficient Storage Layer
B+ tree
Semi-structured data and RDF
XML databases
The Semantic web (RDF, OWL)
Logic statements vs navigational structure
Research Problem
How can we support:
data driven algorithms
multiple topologies
concise, re-usable data operations
efficient implementations
On a wide range of targets:
desktop applications (torrents, tunes and bibliographies)
data center services (Amazon, Google, Slashdot)
distributed data mining (research documents, gene profiles)
Plasma
Organize data in a semi-structured manner
Also need to support networking, in at least a semi-transparent way
Network of resources as a database
Both data and topology in the same structure
A new logical abstraction
Work at the logical level, letting optimizations happen underneath
grid oriented service mapping, discovery, monitoring
Goals and quantitative evaluation
We aren't worse in performance in large deployments
With automatic optimizations we can do better in many cases
Algorithms implementations are concise
Components implemented on graphs are re-usable
Plasma Project Plan
Platform Development
Time Estimate
Basic set of graph operators
1 month
Distributed graph queries
2 months
Paper on data model and operators
1 month
Improved query engine (optimizer)
2 months
Implement evaluation algorithms
3 months
System deployment and testing
2 month
Analysis and final paper
2 month
(In Amsterdam!)
Questions?
Teaching and Mentoring
Atelier 1 => bash scripts, html, css
Atelier 4 => full course design, web application development
Atelier 4 => improved course, web application development
Atelier 2 => user interface design, java projects
UROP => mentored two undergraduate students
Atelier 4
Developed a new course from scratch
Creating database backed web applications
Using Ruby on Rails
Self assessment
Student directed content
Open source development model
Research Foci
Focus is P2P networking and graph based distributed computing
Sub-focus is graph databases
Summer school on game theory
Amer's course on profiling
Lectures on information retrieval
Reading group on pattern classification
Reading group on neuro-science
Presentations and Writing
Presentations
Ruby on Rails tutorial at swiss object oriented society
Presented Spinneret paper at Hot P2P
Gave final survey talk at the MICS workshop
Peer Reviewed Articles:
J. Rose, A. Carzaniga, "Plasma: a graph based distributed computing model," Tech Report, 2008.
J. Rose, C. Hall, A. Carzaniga, "Spinneret: A Log Random Substrate for P2P Networks," IEEE HotP2P, 2007
S. Bhatti, J. Carlson, H. Dai, J. Deng, J. Rose, A. Sheth, B. Shucker, C. Gruenwald, A. Torgerson, R. Han, "MANTIS OS: An Embedded Multithreaded Operating System for Wireless Micro Sensor Platforms, " ACM/Kluwer Mobile Networks & Applications (MONET), Special Issue on Wireless Sensor Networks, vol. 10, no. 4, August 2005, guest co-editors P. Ramanathan, R. Govindan and K. Sivalingam, pp. 563-579.
H. Abrach, S. Bhatti, J. Carlson, H. Dai, J. Rose, A. Sheth, B. Shucker, J. Deng, R. Han, " MANTIS: System Support For MultimodAl NeTworks of In-situ Sensors," 2nd ACM International Workshop on Wireless Sensor Networks and Applications (WSNA) 2003, pp. 50-59.
Book Chapters:
M. Jazayeri, C. Mesnage, J. Rose, "Modern Web Application Development," Wiley, 2007.
B. Shucker, J. Rose, A. Sheth, J. Carlson, S. Bhatti, H. Dai, J. Deng, and R. Han, Book chapter 6 on "Embedded Operating Systems for Wireless Microsensor Nodes", in Handbook of Sensor Networks: Algorithms and Architectures, Wiley, 2005, editor Ivan Stojmenovic, pp. 173-197.