The registration code is S4 in System Engineering section.
Hope to see you there.
Possibly pithy insights into computer performance analysis and capacity planning based on the Guerrilla series of books and training classes provided by Performance Dynamics Company.
The registration code is S4 in System Engineering section.
Hope to see you there.
Intel met or beat their projected availability schedule (depending on how you count) for Penryn by essentially rebooting their foundries. Very impressive.
In my Guerrilla classes I like to pose the question: Can you procure a 10 GHz microprocessor? On thinking about it (usually for the first time), most people begin to realize that they can't, but they don't know why not. Clearly, clock frequency limitations have an impact on both performance and server-side capacity. Then, I like to point out that programming multicores (since that decision has already been made for you) is much harder than it is for uniprocessors. Moreover, there is not too much in the way of help from compilers and other development tools, at the moment, although that situation will continually improve, presumably. Intel TSX (Transactional Synchronization Extensions) for Haswell multicores offers assistance of that type at the hardware level. In particular, TSX instructions are built into Haswell cores to boost the performance and scalability of certain types of multithreaded applications. But more about that in a minute.
I also like to point out that Intel and other microprocessor vendors (of which there are fewer and fewer due the enormous cost of entry), have little interest in how well your database, web site, or commercial application runs on their multicores. Rather, their goal is to produce the cheapest chip for the largest commodity market, e.g., PCs, laptops, and more recently mobile. Since that's where the profits are, the emphasis is on simplest design, not best design.
Server-side performance is usually relegated to low man on the totem pole because of its relatively smaller market share. The implicit notion is that if you want more performance, just add more cores. But that depends on the threadedness of the applications running on those cores. Of course, there can also be side benefits, such as inheriting lower power servers from advances in mobile chip technology.Fast, cheap, reliable: pick two.
Intel officially announced multicore processors based on the Haswell architecture in 2013. Because scalability analysis can reveal a lot about limitations of the architecture, it's generally difficult to come across any quantitative data in the public domain. In their 2012 marketing build up, however, Intel showed some qualitative scalability characteristics of the Haswell multicore with TSX. See figure above. You can take it as read that these plots are based on actual measurements.
Most significantly, note the classic USL scaling profiles of transaction throughput vs. number of threads. For example, going from coarse-grain locking without TSX (red curve exhibiting retrograde throughput) to coarse-grain locking with TSX (green curve) has reduced the amount of contention (i.e., USL α coefficient). It's hard to say what is the impact of TSX on coherency delay (i.e., USL β coefficient) without being in possession of the actual data. As expected, however, the impact of TSX on fine-grain locking seems to be far more moderate. A 2012 AnandTech review summed things up this way:
TSX will be supported by GCC v4.8, Microsoft's latest Visual Studio 2012, and of course Intel's C compiler v13. GLIBC support (rtm-2.17 branch) is also available. So it looks like the software ecosystem is ready for TSX. The coolest thing about TSX (especially HLE) is that it enables good scaling on our current multi-core CPUs without an enormous investment of time in the fine tuning of locks. In other words, it can give developers "fined grained performance at coarse grained effort" as Intel likes to put it.In theory, most application developers will not even have to change their code besides linking to a TSX enabled library. Time will tell if unlocking good multi-core scaling will be that easy in most cases. If everything goes according to Intel's plan, TSX could enable a much wider variety of software to take advantage of the steadily increasing core counts inside our servers, desktops, and portables.
With claimed clock frequencies of 4.6 GHz (i.e., nominal 5000 MIPS), Haswell with TSX offers superior performance at the usual price point. That's two. What about reliability? Ah, there's the rub. TSX has been disabled in the current manufacturing schedule due to a design bug.
To see the Little's law triplet, consider the line of customers at the grocery store checkout lane shown in Figure 1. Following the usual queueing theory convention, the queue includes not only the customers waiting but also the customer currently in service.
As an aside, it is useful to keep in mind that there are only three types of performance metric:
In this webinar, Neil Gunther (Performance Dynamics Company) and Sai Subramanian (Cognizant Technology Solutions) will present a new type of model that is not a simulation, but instead acts like continuous radar that warns developers of potential performance and scalability issues during the CI process. This radar model corresponds to a virtual testing framework that precludes the need for developing performance test scripts or setting up a separate load testing environment. Far from being a mere idea, radar methodology is based on a strong analytic foundation that will be demonstrated by examining a successful case study.
Broadcast Date and Time: Tuesday, July 22, 2014, at 11 am Pacific
Using the data supplied in the story, I wanted to see how the restaurant performance would look when expressed as a PDQ model. First, I created a summary data frame in R, based on the observed times:
> df
obs.2004 obs.2014
wifi.data 0 5
menu.data 8 8
menu.pix 0 13
order.data 6 6
eat.mins 46 43
eat.pix 0 20
paymt.data 5 5
paymt.pix 0 15
total.mins 65 115
The 2004 situation can be represented schematically by the following queueing network
Referring to Figure 1:
> ppois(0:15,4)
[1] 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638
[9] 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511
As the number of events increases from 0 to 15 the CDF approaches 1. See Figure.
[njg]~/Desktop% uptime 16:18 up 9 days, 15 mins, 4 users, load averages: 2.11 1.99 1.99
For the book, I used Linux 2.6 source because it was accessible on the web with convenient hyperlinks to navigate the code. Somewhere in the kernel scheduler, the following C code appeared:
#define FSHIFT 11 /* nr of bits of precision */
#define FIXED_1 (1<<FSHIFT) /* 1.0 as fixed-point */
#define LOAD_FREQ (5*HZ) /* 5 sec intervals */
#define EXP_1 1884 /* 1/exp(5sec/1min) fixed-pt */
#define EXP_5 2014 /* 1/exp(5sec/5min) */
#define EXP_15 2037 /* 1/exp(5sec/15min) */
#define CALC_LOAD(load,exp,n) \
load *= exp; \
load += n*(FIXED_1-exp); \
load >>= FSHIFT;
where the C macro CALC_LOAD computes the following equation
"...the topic was raised about the notion that we are Capacity Management not Performance Management. It made me think about whether performance is indeed a facet of Capacity, or if it belongs completely separate."
As a matter of course, I address this question in my Guerrilla training classes. There, I like to appeal to a simple example—a multiserver queue—to exhibit how the performance characteristics are intimately related to system capacity. Not only are they related but, as the multiserver queue illustrates, the relationship is nonlinear. In terms of daily operations, however, you may choose to focus on one aspect more than the other, but they are still related nonetheless.
XLConnect doesn't require a running installation of Microsoft Excel or any other special drivers to be able to read and write Excel files. The only requirement is a recent version of a Java Runtime Environment (JRE). Moreover, XLConnect can handle older .xls (BIFF) as well as the newer .xlsx (Office XML) file formats. Internally, XLConnect uses Apache POI (Poor Obfuscation Implementation) to manipulate Microsoft Office documents.
As a simple demonstration, the following worksheet, from a Guerrilla Capacity Planning workbook, will be displayed in R.
First, the Excel workbook is loaded as an R object:
Classic topics include:
All classes are held at the Larkspur Landing hotel in Pleasanton, California. Directions are available on the registration page. Larkspur Landing also provides free wi-fi Internet in their residence-style rooms as well as the training room.
Discounts are available for 3 or more people from the same company. Enquire for details.
All classes are held at our lovely Larkspur Landing location in Pleasanton. Larkspur Landing also provides free Wi-Fi Internet access in their residence-style rooms, as well as our classroom.
Some of the topics covered will include:
Attendees should bring their laptops to the class as course materials are provided on a flash drive and calculational tools like OpenOffice, Excel and R will be useful.
Before registering online, take a look at what former students have said about their Guerrilla training experience.
In general, Melbourne is said to have a mediterranean climate, but it can also be subject to cold blasts of air coming up from Antarctic regions at any time, but especially during the winter. Fortunately, the island state of Tasmania acts as something of a geographical barrier against those winds. Understanding possible relationships between these effects presents an interesting exercise in correlation analysis.
The idea is that none of the hosts should use more than 75% of their available capacity: the blue areas on the left side of Fig. 1. The total consumed capacity is assumed to be $4 \times 3/4 = 3$ or 300% of the total host configuration (rather than all 4 hosts or 400% capacity). Then, when any single host fails, its lost capacity is compensated by redistributing that same load across the remaining three available hosts (each running 100% busy after failover). As we shall show in the next section, this is a misconception.
The circles in Fig. 1 represent hosts and rectangles represent incoming requests buffered at the load-balancer. The blue area in the circles signifies the available capacity of a host, whereas white signifies unavailable capacity. When one of the hosts fails, its load must be redistributed across the remaining three hosts. What Fig. 1 doesn't show is the performance impact of this capacity redistribution.
In the aftermath of a discussion about software management, I looked up the Mythical Man-Month concept on Wikipedia. The main thesis of Fred Brooks, often referred to as "Brooks's law [sic]," can simply be stated as:
Adding manpower to a late software project makes it later.In other words, some number of cooks are necessary to prepare a dinner, but adding too many cooks in the kitchen can inflate the delivery schedule.
They use epidemiological models to explain adoption and abandonment of social networks, where user adoption is analogous to infection and user abandonment is analogous to recovery from disease, e.g., the precipitous attrition witnessed by MySpace. To this end, they employ variants of an SIR (Susceptible Infected Removed) model to predict a precipitous decline in Facebook activity in the next few years.
Channeling Mark Twain†, FB engineers lampooned this conclusion by pointing out that Princeton would suffer a similar demise under the same assumptions.
Irrespective of the merits of the Princeton paper, I was impressed that they used an SIR model. It's the same one I used, in R, last year to reinterpret Florence Nightingale's zymotic disease data during the Crimean War as resulting from epidemic spreading.
Another way in which FB was inadvertently dinged by incorrect interpretation of information—this time it was the math—occurred in the 2010 movie, "The Social Network" that tells the story of how FB (then called Facemash) came into being. While watching the movie, I noticed that the ranking metric that gets written on a dorm window (only in Hollywood) is wrong! The correct ranking formula is analogous to the Fermi-Dirac distribution, which is key to understanding how electrons "rank" themselves in atoms and semiconductors.
New topics include:
Classic topics include:
As usual, all classes are held at our lovely Larkspur Landing Pleasanton location in California. Attendees should bring their laptops to the class as course materials are provided on a flash drive. Larkspur Landing also provides free wi-fi Internet in their residence-style rooms as well as the training room.
In a nutshell, the original question concerned whether or not it was possible for a single core to be observed running at 200% busy, as reported by Linux top, when HT is enabled.
This question is an old canard (well, "old" for multicore technology). I call it the "Missing MIPS" paradox. Regarding the question, "Is it really possible for a single core to be 200% busy?" the short answer is: never! So, you are quite right to be highly suspicious and confused.You don't say which make of processor is running on your hardware platform, but I'll guess Intel. Very briefly, the OS (Linux in your case) is being lied to. Each core has 2 registers where inbound threads are stored for processing. Intel calls these AS (Architectural State) registers. With HT *disabled*, the OS only sees a single AS register as being available. In that case, the mapping between state registers and cores is 1:1. The idea behind HT is to allow a different application thread to run when the currently running app stalls; due to branch misprediction, bubbles in the pipeline, etc. To make that possible, there has to be another port or AS register. That register becomes visible to the OS when HT is enabled. However, the OS (and all the way up the food chain to whatever perf tools you are using) now thinks twice the processor capacity is available, i.e., 100% CPU at each AS port.