[T-1]PDCプレカンファレンスにおけるAzureワークショップの様子はtwitterでダダ漏れ
PDC09参戦組は昨日から今日にかけてLos Angeles入りし、明日のDay1キーノートおよび
その後に続くテクニカルセッションにワクワク感を高めつつあるのだが、本日、現地時間
16日からカンファレンスのレジストレーションと、プレカンファレンスと称される1dayみっちりの
ワークショップ(別途有償)が開催されている。
残念ながら中の人である私は、別途招集されていた社内向けミーティングに出席せざるを
得ず参加できなかったのだが、
Architecting and Developing for Windows Azure
というAzure解説ワークショップは、参加された方の話を聞くに、かなり評判が高かったようだ。
本セッションに参加されていたJavaやアーキテクチャ論界隈で著名な浅海氏による
twitter上のTLがGoogle App Engineとの比較も要所にいれていただき大変参考になる。
Intergenというニュージーランドの企業に所属しているChris Auldという社外の方が
ノリノリで講師を務めていたのだが、オージーなまりがひどかったという話も聞く中、
状況をライブで伝えていただいた浅海氏に敬意を表し、@asami224 さんのTLを
引用させていただきたい。ハッシュタグなどは一括置換で除外しているが、編集なしノーカット版。
ちなみに、このセッションを含むプレカンファレンスは、資料やストリーミングが公開される
予定はないと聞いているので、このメモはより一層貴重な資産となる。
改めて、みんなのためにありがとう!浅海さん。
明日以降のPDCの様子も、引き続きハッシュタグ #pdc09 で check it out!
日本からのツアー組は少々早めだがホテル7時集合とさせていただいており、
前のめり&かぶりつきな感じ。可能な限り現場の興奮を日本のみなさまにもお届けしたい。
---
セッションルームに到着。無線LANがつながった。
『Architecting and Developing for the Windows Azure Platform』
RESTでどこまでできるかなー。チェックしたいポイント。
『Introduction. Intro to example app』
TicketDirect : example application
デモ動かない
画面はAjax使ったMac風のWeb
立ち上げなおしたら動いた。
Azure Roles, Azure Storages, SQL Azure, Client Applications
Sofare as a Service, White-label VARs/ISVs, User/Customers/Consumers
Platform as a ServiceまでWindows Azureはカバーしている。Amazon EC2はInfrastructre as a Serviceまで。
Salesforce.com, Microsoft CRM LiveはSoftware as a Serviceの半分までカバー。
Market Quadrant, Pay as you go/Buy up-front, Platform niche player/Vertially Integrated
Microsoft, Google SalesforceはPay as yo go + Vertically Integrated。
Amazonはpay as you go
Pay as yo goのcashflowは平準化する。
Consolication Cloud, Scale Out Cloud
Reliability through Software
High Scale Application Archetype
Periodicity of Deman、ピーク時の不足、閑散期の無駄
forward processing, stupidなリクエストをクラウドで処理してon primseに送る
Sisyphean Task データセンターの維持
Data Sovereignty and Security
Service Level Agreements
セキュリティとかSLAの問題をどのようにして受け入れてもらえるようにするのか。問題提起ということかな。
『Scalability, Caching and Elasticity』
vertical scale up, horizontal scale out
for small scenarios scale up is cheaper.
For larger scenarios scale out only solution.
Scald out offers promise of linear, infinite scale.
scale outがonly solutionという認識が大事。
Scalability != Performance
リアルタイムが早いと勘違いしているケースもある。スケーラビリティも早いと勘違いするケースもありそう。
Achieving Linear Scale Out。リニアにスケールさせることが大事。
Reduce or eliminate shared resources
homogenous, stateless computation nodes
Azureは最低限動作するインスタンス数を*.cscfgで指定できる。(appengineはアクセスがないと全て落ちる→spin off問題)
インスタンスを立ち上げっぱなしにすると相応のコストがかかるはず。課金すればよいけど。
Small 1, Medium 2, Large 4, Extra Large 8
Small 1, 1.7GB, 250GB/ Medium 2, 3.5GB, 500GB/ Large 4, 7GB, 100GB/Extra large 8, 15gb, 2000gb
VMサイズを選べるのはapengineにない特長。 #azure
デモしてる人のブラウザにgoogle tool barが。
マルチコアで高速に動かないアプリはクラウドでも高速に動かない。
ParellForeach
Caching
cachingがクラウドの鍵となる技術。 Few Azure applications can do without caching.
cachi can improve both performance and scalability.
caching strategies/ client side caching, static content genreration, output caching, fragment cdachieng, +1
client caching - ETags, soft caching, header added on http response
ETag problems: stil requires round trip to server. may require execution of server side code to recreate ETag befre checking.
clinet cacing - Cache-Control
cache-control, hard caching, header added on http rsponse, client may cache file without further request for 30 days
very useful for static files, no way to implement in Blob storage
cache-control problems, no blob sorage support, what if files do change in the 30 days?
benfits prevents un-necessary http requests, prevents un-necessary donloads
azure technique: put static files in web role, use cache-control + url flipping
static content generatin
generate content periodically in worker role
content may be full pages, resource, content fragments
push static content into Blog storage
serve direct out of blob storage, retrieve via a web role, may be local database ...
problems - need to deal with stale data.
benefits - reduce load on web roles, potentially reduce load on data tier, response times improved,
can bombine with cache-control and etags
Output Caching
ASP.NET feature
problems - mutiplie copies fof cached data. 1/instance, Still requires client roud trip each request,
may beed URL flipping to refresh client cache.
Fragment Caching
problems - multiple copies of dcached data. 1/instance,
still requires client round trip each request, + 1
Data Caching
ASP.NET Cache/ Memcached, Velocity, Shared Cache/Cloud arguably 'cache' in Blob storage.
most flexible - most code to write
specialized cache servers vs caching on all roles
problems basic asp.net cache suffers consistency issues
3rd party cache tools require work for azure support.
stale data still an issue.
benefis - very flexible, reduces load on data tier, most efficient use of memory, any type of data.
Web Roles, WCF roles, dynamic worker, distributed cache worker, partitioner worker --- デモのazure role
3rd party cache frameworkを使っている。
web and wcf roles share common cache.
session state (our own impl) is stored in cache.
『Elastic Scale Out』
application load almost always varies
reqular and predictable, irregular an dpredictable, unpredictable
maintaining excess capacity or headroom, adding/removing additional capacity --- Dealing with variable load takes two forms
head room in azure platform services - web roles, run additonal web roles, handle additional laod before performance degrages
worker roles - if possible just buffer into queues, will be driven by tolerable level of latency,
start addional role sonly if queues not cleanring.
Azure Storage - Host nodes *may* have headroom. 3 replicas give storage fabric options, opqeue to the azure customer.
SQL Azure - Non-deterministic throttle gives little indication, run extra instances
web roles/worker roles - enable more instances, editing instance count in config leaves existing instances running
#pac09 change to using larger VMs- will require redeploy. Need to see pricding.
Azure Storage - opqeue to use, partition agressivly ,
could 'heat up' a partition to encourage scale up - constly due to transaction costs.
SQL Azure - add more databases, very difficult to achieve mid-stream,
requires moving hot data, maintaining consistency across multiple dbs, will ***
Rule Based Scaling
use service management api
pdc09 predicable or periodic demand, unpredictable demand
Monitor metrics
requests per second, queue messages processed/ interval --- primary metrics
cpu utilization, queue length, response time -- secandary metrics
gathering metrics
capture various metrics - azure logs, performance counters, IIS logs, etc... --- capture varios metrics
Evaluating business rules
requests, jobs in my queue, money this month?
take action -- add/remove instances, change role size, send notifications
情報を集める方法、集めた情報からスケジュールする方法
Dynamic Scaling Engine
summary -- designing for multiple instances provides scale out, availability, elasticity options.
#psc09 cacing should be a key component of any azure application.
昼食は、チーズバーガピザ。
午前のScalability an dElasticityのsummaryの最後の項目が抜けてた。多分、Scalabilityの管理、自動化だと思う。
午後の最初『Asyncrouns Work』
improve apparent responsiveness.
establish mutiple units of scale.
basic asyncronous patterns
Web RoleがWorker RoleにQueueで仕事をリクエストする。
アップロードされたイメージをBlobに入れて、Blog URIとメタデータをqueue経由でworker roleに渡す。
soft transaction concept.
worker roleはサムネールを生成してtableに入れて、キューの内容を消す。
idempotency f(x) = f(f(x))
message process At Least Once。
idempotencyとかAt Least Once懐かしー
solving the idempotency problem
Transaction IDを比較してチェックする。
crud - readとdeleteはidempotent。createは違う。updateは...
idempotencyのある処理と違う処理を区別する、idempotencyのない処理はtransaction idをチェックして対処する、ということになるのかな。これから先のストーリ。
Replay Log Storage
Poison Message Handling
poison messages may become zombies
even idempotent messages can be poisonous
spitting the poison pill - queueを使う。message id GUID
a guid created by queue service on enqueue immutable and not user defined.
insert/update ID + Count in storage.
keep a count of the number of times a message has been processed.
set threasholds before a message is placed into a Poison queue and delted.
ensure that your poison test is at the top of the batch. +1
Replay Log vs Poison Tracker
run the risk of wroker crashing before poison check.
Dynamic Worker Role
Dynamic Worker Roles
worker roles most cost effective at 100% utilization
often have many work types, none of which requires 100% of 1 instance
want to add new work types without redeploying
solution: Use a generic queue.
pdc09 encode message with info to resolve work. type laod assembly to process mesage from blob storage. dynamically instantiate and execute
http://tinyurl.com/lokad-cloud
demoでOpenIdログインした。
use a smart polling approach for queues
use a new AppDomain to separete loaded types
may be value in including Replay Log and Poison tracking as part of Dynamic Worker framework.
The MapReduce Pattern
very paralelizable - less efficient. -- Trading performance for increased scale again.
Specific impl. of dynamics workers.
Map function, grouping, redeuce function
generally processor intensive/RAM light - do worker rokes have too much RAM? 43分前
No out of the box support yet. - see lokad demo shortly 43分前
still limitations in Azure with Colocation of Data and Computation 43分前
dataとcomputationのcolocationができないのが問題。Azureの新しいサービスの前振りかも。 41分前
このデモではMapReduceを自作してるけど、Azureの新機能でMapReduceが出てくる前振りかもしれない。 38分前
asynchronous processing can massively increase the perceived speed of your application 35分前
dispatch via azure queues requirees some case. - ensure that your messages are idempotennt or have a commponsation. ... 35分前
『Partitioninig Strategy』 7分前
Horizontal partitioninig 7分前
Spread Data Across Similar Nodes 6分前
Archieve Massive Scale Out (Data and Load) 6分前
Intra-Partition Queries Simple 6分前
Cross-Partition Queries Harder 6分前
Vertical Partitioning 6分前
Spread Data Across dis-Similar Nodes 6分前
Place frequently queried data in 'expensive' indexed storage 6分前
place large data in 'cheap' binary storage 5分前
retrieving a whole row requires > 1 query 5分前
Hybrid partitioning 5分前
Horizontal partitioning - azure tables 5分前
Azure Table Storage 5分前
#pdc auto-balanced, hot partitions meay be scaled up, partition key AND row key = primary key 4分前
continuation tokens may be returned from cross partition queries 3分前
key columns up to 1kb in size 3分前
# be aggressive3分前
SQL Azure - Key Pointsl..続き中
partition for data volumen > 10GB
all partition logic up to teh developer -- algorithmic, lookup based
partitions are not Auto-Balanced
(transaction throttle (non determinisstic) Always code for retry
choosing a partition key -- natural keys, mathematical, ...
Using Modulo
Using Hash Values (partitionining)
do not use a cryptographic hash function
http://tinyurl.com/part-hash
be careful if using Object.GetHashCode()
partition stability over time
may need pto change artitionning scheme
re-partition all data or; version partitioning scheme
v1 = GUID mod 4 v2 = GUID mod 10
Just in time partitionning
in sql azure partitions cost money, In highly elastic scenarios partitions may be needed for just a few hours or days,
If load is predictable - partition before load commences, de-partition after load has subsides.
goals for vertical partitioning
balance performance vs cost
use appropriate storage for type of data
#psc09 sql azure - fuly indexable, noquery transaxtion charge, $9.99/gb
azure storage - limited indexed ...
vertical partitioning example
searchable data in azure tables or SQL azure - indexed, no cost per query
thumbnails in azure tables
binary properties < 64Kb
batch queries saves transaction costs, - duplicated for EACH user
full photos in azure blob storage - can handle larege data - can stream full sized files direct...
Cost per month = .0026 cents
de-normalized - cost per month = .0017 cents
『NoSQL/Non-Relational Data Modeling』
Azure Tables != RDBMS, Storage is cheap, cross partition queries are resource intensive, de-normalization foten name of the game
tweetからworker roleがtweetindexを生成。ここに全てが入っている。with azure tables we go the whole way.
TweetIndexでなくてmentionIndexでした。
And then we start going crazy.
just in time partitionninig is everything.
we partion just for peak sales period.
partition our seating tables into new databases
summary - partitioning data key to cloud scale apps
horizontallly partition for scale out
verically partition for cost/performance
azure storage requires different approach to data modeling
don't be afraid to aggressively de-normalize and duplicate data.
モデリングはさわりぐらいだった。
『pricing and Business Model』
プライスの話かーと思ったけど、アプリケーション・アーキテクチャによって運用コストが変わってくるので、テクノロジー的にも大事。
data within datacenter is unmetered. Bwtween datacenter is metered.
12c per hour for each deployed instance.
workload resource usage -- storage txn, data in/out SoraUsagiで
static resource usage -- bytes on disk, requireed sql azure instances SoraUsagiで
required instances -- based on simple average requests/second or; more sophisticated model SoraUsagiで
add instances to make numbers 'reasonable'.
compress blob data on disk -- set content-encoding header when putting blob
use caching to reduce request txn count
use css sprites and data uris to rereduce txn cout
batch your requests
in the web role -- incurs additional 'free' but resource consuming requests to retrieve
In blob storage -- ...
sql azure - vertically partition large data columns out to Azure Storage
support dynamic partitioning if possible
consider just-in-time partitioning
pull archive data out of the cloud to cheaper on premise storage or to Azure storage.
『Diagnostics and Deployment』
fabricにアプリケーションのメタ情報を定義するのかな。
diagnosticsを取ってくるデモしてる。
dev fabric prevents off machine calls
to resolve. reverse proxy on local machine. - use internet redirection server (rinetd). - use fiddler
reduces security risk footprint, makes running test/integration servers hard ← dev fabric prevents off machine calss
fault domains, upgrade domains
運用システムと開発中システムの2つを用意して、インスタンスをスワップしてシステムを入れ替えることができる、らしい。VIP Swap Upgrade
for the best user experience. Invest in warming up instances in staging before swapping.
In-place Upgrade
fabricのXMLをいじって、upgradeのstrategyを変えている。デモ。
In-place upgrade -- only 2 upgrade domains. - instances torn down before new instances brought up
geo-location & affinity groups
affinitized - groups services with dependent resources - ensures close geolocation
- un-affinitized - can still set explicit geo region - may not be guaranteed
Programmatic Deployment
Automating Deployment
# おわりー