Skip to main content
Posts by:

Tim Booher

Coding for a New House

I don’t want to read about it, just take me to your super cool interactive map.

We are moving to the greater New York City area this summer where Tim joins the leadership team of Colgate-Palmolive. As tempting as it is to spend all our time on Trulia, Estately or Zillow looking at specific houses, we knew that our focus was best spent on understanding the different areas and the trades they presented. I’m an analyst at heart, and always try to do the analysis at the right level of detail. At this stage, this means a map that incorporates (in order) schools, commute times, and lifestyle options. As an advisor to spatial.ai, Tim’s been inspired to create insightful mashups. Maps are pretty much the most excellent example of useful data where one can quickly do analysis without any voicetrack. The right map can serve as a common medium for discussion with friends, realtors and our own understanding as we try to hone in on the right area. With a good contextualized spatial understanding, we can be ready to make the quick decisions that house-hunting presents.

This is why a large number of sites display helpful data geared towards house-hunters. As we started looking at different map-based real estate search engines online, we found different merits to each one but no site gave us the commute, school information and the lifestyle options we care about in one interface. Estately was the most promising. The product side was clearly driven by developers with clean url lookups and clever metrics like walkability. Trulia is the most full featured with some really cool features, like price heatmaps that would be useful if they didn’t have so many blank regions. I enjoy Trulia the most, but it doesn’t have the latest listings.

Zillow has an awesome api but legally can’t provide anything that can be called "bulk data". Redfin’s killer feature is the ability to search by school district. This is pretty much critical, since the school district doesn’t often match the town name and we started falling in love with houses online that we had to give up once we found out it wasn’t in a school district we were ok with.

Schools

In Alexandria, we love our house, elementary school, church and community. In order to find the best school system possible, we relied heavily on the rankings assigned by njmonthly.com. Their ranking was a composite of school environment, student performance and student outcomes. These scores were based entirely on data reported by the schools to the state Department of Education (DOE) and published in the School Performance Reports section of the DOE website. You can read more about their methodology at njmonthly.com. We also looked at Great Schools to crosscheck the list. Tim used python, the google geocoding API and google sheets to get geocoordinates for each school. He then was able to pull these into google maps builder and assign a color corresponding to the schools’ rank. While there is a lot more work in the future to better understand the potential at each school, the map below was very helpful for us.

Commute Time

Ok, this is the fun part where Tim’s gets to use his ninja programming skillz. Tim is going to be traveling a lot, but when he is home he will often be in Piscataway, NJ and Manhattan. Nothing online would show the average, maximum or minimum commute times for multiple locations. Additionally, we wanted combined different traffic patterns and the optimal route found by comparing public transit and driving. In order to build this, Tim build a python script that used the google directions api and the associated python library to provide transportation times. He then used matplotlib and basemap to put a grid across the region of interest and then used the contour features to generate contour lines for regions that were 20, 30, 40, 50, 60, and 70 minutes away. This produced lots of plots that helped get a feel of the major transportation routes and how traffic varied by time of day.

Of course, Tim did excursions over time of day and built maps that looked at optimistic- and worst-case scenarios. In the end, it worked best to make each excursion a map layer and to bring in different data sets as we had questions. The most helpful map presented the contour lines made from averaging the best commute from each grid point (in this case a 15 x 15 grid):

How much does commute vary?

The sparkline in each row below shows the commute time for 12 times between 4:30am to 10am each morning. Transit options weren’t possible to Colgate’s technology center, but they generally were to NYC. Commute times below are in minutes. I’m was expecting to see more variance in the commute time. This is either an error in my code or Google really isn’t forecasting commute times based on historical traffic.


Colgate NYC
  Driving Transit
Location mean Commute mean Commute mean Commute
Westfield, Union 31 46 93
livingston 40 48 118
Chatham, Morris 40 45 122
West Windsor-Plainsboro South, Mercer 34 70 132
Holmdel, Monmouth 34 59
West Windsor-Plainsboro North, Mercer 34 71 195
Livingston, Essex 44 43 106
Montgomery, Somerset 34 78
Haddonfield Memorial, Camden 64 98 223
Princeton, Mercer 33 71 170
short hills 36 39 137
New Providence, Union 35 46 138
Ridge (Basking Ridge), Somerset 25 53 131
westfield 30 46 85
Watchung Hills Regional (Warren), Somerset 26 48    
Millburn, Essex 37 40    
Glen Rock, Bergen 55 35 105
Kinnelon, Morris 52 50 128


Lifestyle

Our social structure revolves around our church, then around fitness (CrossFit and Rock Climbing Gyms) and other town-centered options (like shopping at Whole Foods, or a charming downtown). We wanted to live as close to the city as possible, while still able to find a nice home with the right school. The most helpful way to incorporate this information was to build several lists and then use the google geocoding API to get the geocoordinates. From here, it was simple to export to CSV and upload into the mashup. This produced this insanely cool composite map.

Results: Potential Locations

Princeton, Montgomery, West Windsor

We love the downtown, schools and academic atmosphere of Princeton. It is close to cool companies like SRI and major research centers. It also has nice neighborhoods and is very walkable. It has a train to NYC and has good church options. It is much farther from the city than we want to be and the house prices are much higher in Princeton proper when compared with the local area.

Westfield, Milburn, Short Hills, Livingston, Monclair

There was another cluster much closer to the city. We also like the option of attending Redeemer Church of Montclair. However, we hate to give up the university town and high tech feel of the town.

Summary

In all, we now look forward to visiting in person and getting a feel for these neighborhoods. I feel like we have a good map that we can update as we get to know the area better. Hats of to Google for making so much accessible through APIs and for making such nice interfaces to view everything with. Open standards just plain rock.

We put this post together to collect our thoughts, share our code and methodology, but also to help the dialogue with our friends. If you have any thoughts on the above, please email us at tim_and_chrissy@www.theboohers.org.

Resources

Please feel free to use, modify and enjoy the code we wrote for this. Feel free to see and edit our spreadsheet

Links

Code

Code to build commute maps

Code to build commute table

By 3 Comments

Common Mathmatical Libraries: Computation, History, and Trust

A common complaint among engineers is that modern computers obscure the details of the complicated math they produce. We cringe when children are exalted as “knowing computers” from a young age. Bill Gosper and Richard Greenblatt had perhaps the maximum benefit to “learn computers”. They didn’t have wolfram alpha or even spreadsheets, they had to understand arithmetic logic units and figure out how to optimize code to get basic math operations to run on the TRS-80. Today, there is a desire to re-learn hardware through arduino or other low-level hardware, but most folks start very high on the stack. That is a shame, because under the hood is a rich and exciting legacy of numerical computing software that forms the basis of the trust we place in computers.

My trust in computers started to erode when Mark Abramson gave a lecture on the possible errors of floating point arithmetic. All programmers are aware of the problems possible by a division by zero or a narrowing conversion that loses information. However, other times the cause can be the futile attempt of a software developer to round a floating-point number. This was shocking: one of the most basic operations in math, a thing that we learn to do before we can ride a bike, eludes the combined efforts of the finest engineers over the last 30 years. To me, this was something intuitively nonsensical: if computers can’t consistently and reliability round numbers, how can we trust them to do anything?

This turns out to be one of those things like the halting problem that proves it is impossible for a computer to understand computer code. Floating points are represented internally in the IEEE-754 specification. Just like the impossible realization of a random number1, there is no such thing as a true fractional number in a computer. The key thing is to recognize that floating types do not represent numbers exactly. Internally the value is not a continuous range of numbers; instead it is represented as an exponent multiplied by an arithmetical series. For example:

$$
\frac{1}{2}^1 + \frac{1}{2}^2 + \frac{1}{2}^3 + \ldots + \frac{1}{2}^{30}
$$

From the above, you can see that in any range of floating point numbers there are gaps. Between any two floating point numbers there is a difference at the granularity of the smallest element in the arithmetical series (here $\frac{1}{2}^{30}$). If a number falls in such a gap, the difference between the real number is the approximation error. This leads to a fascinating and esoteric subject, that is best covered by others.

In grad school, my faith in computers only decreased when I realized the challenges associated with Eigenvalue problems, Singular value decomposition and LU/Cholesky/QR/… decompositions. If you want to understand how these are handled, a rough understanding of high performance computing libraries is necessary. These libraries are important because of the issues described above. It is very important (and very hard) to have both an efficient and correct algorithm for matrix decomposition.

I spent my teens coding in BASIC, my college years in MATLAB, and professionally I’ve used lots of Julia, FORTRAN, Ruby, Python, C, Haskell, and x86 or MIPS assembly. As I’ve used different languages, I kept coming back to familiar libraries that every language used with strange names like LAPACK, BLAS, FFTW, and ATLAS.

MATLAB will always be the language that I call home. It is probably the only language I was paid at work to write production code with. In building and running large models, I had to get into the internals of MATLAB optimization and learned that MATLAB was basically a wrapper around excellent matrix optimization libraries. As Cleve Moler writes, the basic motivation for MATLAB was to make LINPACK, EISPACK easy to use. Finally, on a New Year’s holiday, I had the chance to pull together a quick summary of the main math libraries I’ve come across: LAPACK, BLAS, ATLAS and FFTW.

LAPACK – Linear Algebra Package

LAPACK is the most common library for numerical linear algebra. I’ve used its routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. LAPACK is written in FORTRAN and is powerful and popular since it handles both real and complex matrices in both single and double precision (subject to the caveats above).

LAPACK is the successor of EISPACK and LINPACK, both libraries for linear algebra algorithms, Developed in the early 70s (credits to Jack Dongarra, Jim Bunch, Cleve Moler, Pete Stewart). LINPACK is still used as benchmark for the most powerful supercomputers. The EISPACK and LINPACK software libraries were designed for early supercomputers (think the CDC-7600, Cyber 205, and Cray-1). These machines featured multiple functional units pipelined for increased performance. The CDC-7600 was basically a high-performance scalar computer, while the Cyber 205 and Cray-1 were early vector computers.2 One drawback to early “vector-based” architectures is the absence of optimized locality in data access. Consequently, EISPACK and LINPACK were subject to low performance on computers with deep memory hierarchies which became popular in the late 80s.

LAPACK was designed to specifically reimplement the algorithms as “block-based” to incorporate locality by Jim Demmel, Jack Dongarra and many others. (See a great history here.) Many computers have cache memory that is much faster than main memory; keeping matrix manipulations localized allows better usage of the cache. LAPACK was able to use this knowledge to run efficiently on shared memory and vector supercomputers. More recently, the ScaLAPACK software library extends the use of LAPACK to distributed memory concurrent supercomputers and all routines have been redesigned for distributed memory MIMD (multiple instruction, multiple data) parallel computers.

BLAS – Basic Linear Algebra Subroutines

LAPACK implemented on top of BLAS, a collection of low-level matrix and vector arithmetic operations. BLAS performs basic operations such as “multiply a vector by a scalar”, “multiply two matrices and add to a third matrix”. As a programmer, this is the type of thing I would definitely either get wrong or implement inefficiently. By contrast, LAPACK is a collection of higher-level linear algebra operations. LAPACK has routines for matrix factorizations (LU, LLt, QR, SVD, Schur, etc) that are used to answer more complex questions such as “find the eigenvalues of a matrix”, or potentially expensive operations like “find the singular values of a matrix”, or “solve a linear system”. LAPACK is built on top of the BLAS; many users of LAPACK only use the LAPACK interfaces and never need to be aware of the BLAS at all. LAPACK is generally compiled separately from the BLAS, and can use whatever highly-optimized BLAS implementation you have available.

In this sense, Basic Linear Algebra Subprograms (BLAS) is more of a specification than a specific software package. It prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. Due to the academic focus in this area, they are the de facto standard for low-level routines for linear algebra libraries. Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. Modern BLAS implementations take advantage of special floating point hardware such as vector registers or Single instruction, multiple data instructions.

BLAS is often divided into three main areas:

  • BLAS1: vector-vector operations (e.g., vector sum)
  • BLAS2: matrix-vector operations (e.g., matrix-vector product)
  • BLAS3: matrix-matrix operations (mainly matrix-matrix product)

ATLAS — Automatically Tuned Linear Algebra Software

ATLAS is a modern attempt to make a BLAS implementation with higher performance and more portability. As excellent as BLAS is, it has to be specifically compiled and optimized for different hardware. ATLAS is a portable and reasonably good implementation of the BLAS interfaces, that also implements a few of the most commonly used LAPACK operations. ATLAS defines many BLAS operations in terms of some core routines and then tries to automatically tailor the core routines to have good performance. For example, it performs a search to choose good block sizes that may depend on the computer’s cache size and architecture. It also accounts for differing implementations by providing tests to see if copying arrays and vectors improves performance.3

FFTW — Fastest Fourier Transform in the West (FFTW)

For an example of a much more specific library, FFTW is a software library for computing discrete Fourier transforms (DFTs).4 According to by regular benchmarks, FFTW is known as the fastest free software implementation of the Fast Fourier transform. It can compute transforms of real and complex-valued arrays of arbitrary size and dimension in $O(n \log n)$ time.

The magic of FFTW is to choose the optimal algorithm from a wide array of options. It works best on arrays of sizes with small prime factors, with powers of two being optimal and large primes being worst case (but still $O(n \log n)$). For example, to decompose transforms of composite sizes into smaller transforms, it chooses among several variants of the Cooley–Tukey FFT algorithm, while for prime sizes it uses either Rader’s or Bluestein’s FFT algorithm. Once the transform has been broken up into subtransforms of sufficiently small sizes, FFTW uses hard-coded unrolled FFTs for these small sizes that were produced by code generation. It goes meta.

For more detail from some real experts, check out Numerical Analysis: Historical Developments in the 20th Century by Authors C. Brezinski, L. Wuytack


  1. It is impossible to get a truly random number from a computer. Even if the sequence never repeats, which is not guaranteed for random numbers, another run of the program with the same inputs will produce the same results. So, someone else can reproduce your random numbers at a later time, which means it wasn’t really random at all. This causes all kinds of problems, but the opposite case would cause many more. 
  2. In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors, compared to scalar processors, whose instructions operate on single data items. 
  3. For example, it may be advantageous to copy arguments so that they are cache-line aligned so user-supplied routines can use SIMD instructions. Today, most commodity CPUs implement architectures that feature instructions for a form of vector processing on multiple (vectorized) data sets, typically known as SIMD (Single Instruction, Multiple Data). 
  4. Yay! FFTW was developed by Matteo Frigo and Steven G. Johnson at the Massachusetts Institute of Technology. 
By 3 Comments

Tax time automation

Happy New Year! Bring in the new year with some code to transform USAA bank statements into a set of transactions for 2016. It is a little tricky because there are some 2015 transactions in the list and USAA puts transactions on two lines.

To use, just save the bank statement pdf as text and run this script. Then you can open the resultant CSV in excel. You will need to have a basic version of ruby installed.

By 0 Comments

Review: Our Kids

Following a polarizing election that has left many in my community surprised and confused, Robert Putnam’s “Our Kids” provides a refreshing collection of research and engaging anecdotes told with generosity of spirit. Perhaps more than any other modern voice, Putnam excels at describing American society in a bi-partisan way. He and his previous best-seller, “Bowling Alone”, has been influential in multiple administrations. His emphasis on improving society appeals to traditionalist conservatives who emphasize historic institutions and to progressives who value his call for a more proactive government.

Increasing economic inequality and decreasing upward mobility will hurt our children

While his previous work provided a critique of declining civic life, “Our Kids” fits nicely with the narrative of “Coming Apart” by Charles Murray and “Hillbilly Elegy” that describe the dynamics and effects of widening income disparity and decreasing economic mobility. Where Putnam differs is his emphasis on a topic unprecedented in its importance and urgency to a parent: what is happening to the dreams of our children?

To answer this question, he focuses on state of upward mobility and its consequences on the future of our children through an emphasis on recent subtle, but profound changes to three parts of society: family life, neighborhoods and schools. In all his analysis, he defines rich and poor simply: “rich” parents finished college; “poor” parents have high school degrees or less. In the end, he describes a radical reordering of social capital towards those with the resources, networks and know-how to use it. He labels this as an “an incipient class apartheid”. His narrative is a constant meta-analysis of academic studies along with contrasting portraits of high- and low-­income families.

While “The Second Machine Age” is a much better book to understand how we got here, Putnam makes a compelling and personal appeal to understand just how bleak life is and is becoming for those without the resources to buffer their children from life’s dangers. While the rich children profiled faced plenty of challenges (divorce, substance and physical abuse, academic challenges), their parents were able to change their environment to help, often with dramatically different circumstances than the poor children presented. When life became difficult, their networks responded and their resources opened doors closed to the poor.


With the benefit of specific names and locations, he describes the differing ways that rich and poor experience school sports, obesity, maternal employment, single parenthood, financial stress, college graduation, church attendance, friendship networks, college graduation and family dinners.

Every story sets the stage for a set of new statistics. An example: affluent kids with low high-school test scores are as likely to get a college degree (30{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4}) as high-scoring kids from poor families (29{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4}). Education is supposed to help level the playing field, but it is increasingly achieving the opposite.

Another example is that rich kids get almost 50 percent more nurturing time from their parents, when there used to be no class difference. In fact, many factors are diverging that used to be independent of class (e.g. church attendance rates and political engagement). Most troubling to me is that increasingly only the rich retain hope and pass it on to their children. While Horatio Alger marked the aspirations of a generation of upwardly mobile children at the beginning of the 20th century, Putnam makes the case that apathy and pessimism reign at the bottom of the social order at the end.

Some of the stories were heart-wrenching to a degree that distracts from the larger story. One subject, “Elijah”, was beaten by his father after the son was jailed for arson, thrown out of the house by his mother because of drinking and drugs, and unable to escape the lure of the streets. Such stories arise my passion and compassion. They take up an elevated place in my mind, and leave me hungry for more information and for statistics to know how unique their stories are.

Current social dynamics are making it worse

Putnam makes it clear that things aren’t getting better. Everyone knows rich kids have advantages, but he shows that their advantages are large and growing with no bounds to stop the trend. He shows the growing gap through numerous “scissor charts”. One way to look at the diverging trends is through the Adverse Childhood Experiences Scale:

  1. Household adult humiliated or threatened you physically
  2. Household adult hit, slapped, or injured you
  3. Adult sexually abused you
  4. Felt no one in family loved or supported you
  5. Parents separated/divorced
  6. You lacked food or clothes or your parents were too drunk or high to care for you
  7. Mother/stepmother was physically abused
  8. Lived with an alcoholic or drug user
  9. Household member depressed or suicidal
  10. Household member imprisoned

Each of these factors are becoming more frequent in poor families and he has both charts to convince the left side of our brain and stories to convince the right.

One chart that he highly emphasizes is the trend in family dinners. From the mid-1970s to the early 1990s family dinners became rarer in all social echelons, but since then that trend leveled off for households led by college-educated parents, and only continued downward for high school–only educated families.

Causes

In the end, he argues that inadequate empathy and weakened civic institutions are the primary cause. While “Bowling Alone” made a solid case for the decline of civic institutions, he makes a strong argument for growing inadequate understanding reenforced by self selectivity and the dynamics of technology and productivity.

Current dynamics provide big advantages to children at the top and restrict the capability of those without resources to work their way up the social ladder. This has cascading effects and is a particularly difficult problem with long time constants. Rich parents have a growing edge in access to good day care, then better schools with a suite of extracurricular activities and the spending gap has tripled on activities. Malcolm Gladwell covered these dynamics well in Outliers.

Potential Improvements

While his reliance on anecdotes risks informing our emotions more than our minds, the biggest weakness in his book was the absence of a discussion of the political forces that shape the world he so aptly described. The stories of the poor are heartbreaking. It is clear that Putnam was moved by the people he met and the stories are moving, but how real do the stats hold up? It is nice that he goes beyond raw data and listens to those who are otherwise voiceless in our society. By blending portraits of individual people with aggregate data, he gives us a remarkably clear picture of inequality in the United States. But is this clear picture an accurate picture? That said, I’m ready to avoid politics for a bit and the dialogue was definitely enhanced from his refusal to fall into the neat bins of our current culture wars.

He never tackles the finances of the individuals involved. His reluctance to wade in politics prevents him from politically inconvenient observations such that that rich kids still grow up with two parents, and poor kids don’t. For me, he also invites cognitive dissonance between the narrative of victimhood/entitlement and the gumption (maybe even intentional obliviousness) that characterized others who have risen above their class.

Most maddeningly, he omits a discussion of the political or economic forces driving the changes he laments. In particular, I don’t think any story about inequality is possible without describing the intersection of economics, productivity, globalization and technology.

Potential Solutions

I resonate with his emphasis on importance of family dinners. His policy suggestions include expanded tax credits for the poor, greater access to quality day care and more money for community colleges. He also highlights increased investment in early education, expanded child care options and formal mentoring programs.

While logical, these sound a bit like the status quo and might be missing an emphasis on the moral foundations which provided the glue for the institutions whose importance he highlights.

In all he provides a vivid description of the diverging life chances of children in rich and poor families. I’m hoping that the main impact of this book will be the many conversations it creates and the poignancy of its character-based narratives will force me to think of ways I can help counter the trends described.

References

By 0 Comments

Endurance and Altitude for Propeller-driven Aircraft

Lower altitude = higher endurance?

At first glance this didn’t make sense to me. Air Force pilots will be the first to tell you that you don’t hug the earth en route to maximize your time on station, it is harder slogging in thicker air. At higher altitudes you have lower air pressure and colder air temperatures, low pressure is bad for thrust but low temperatures are more favorable. How does this all shake out? And what are the differences between jet- and prop-driven aircraft?

My initial hunch was that for jets, higher altitude requires more speed for level flight and for every gram of fuel burned, you would cover more unit of range in more rarefied air, but endurance is really about turbine efficiency. Higher altitudes decrease the mass flow through the engine, but engine cycle performance is mostly correlated with the ratio of maximum temperature achievable in the turbine $T_{max}$ to ambient temperatures. $T_{max}$ is fixed by engine material properties so the main way to increase efficiency with altitude is to decrease your ambient temperature. With a standard day, the beginning of the isotherm is around 36,089 feet where temperatures remains close to -70 F for about 30k more feet. At 70kft, the outside temperature actually begins to increase. Pressure decreases at a rate that is not subjected to the same complex interplay between the sun’s incoming radiation and the earths outbound radiation, which means any altitude above 36kft should really not give you a performance advantage.

However, modern high endurance unmanned platforms are propeller-driven aircraft. While I haven’t done to math to show how much more efficient these are when compared to jets, I want to explore particularly how the efficiency of propeller-driven aircraft is affected by altitude.

All aircraft share common interactions with the air, so I had to start with the basics of the airfoil. The lift coefficient is traditionally defined as the ratio of force is that pulling an aircraft up over resistive forces. Dynamic pressure (the difference between the stagnation and static pressures) is simply the pressure due to the motion of air particles. When applied against a given area, it is the force that must be overcome by lift for upward motion. If $L$ is the the lift force, and $q$ is the dynamic pressure applied across a planform area $A$ for an air density of $\rho$, by rearranging the lift equation, we have,

$$ C_L = \frac{L}{q_{\infty}\,A}=\frac{2\,L}{\rho\,v^2 A}. $$

Solving for velocity gives us our first insight on how altitude is going to impact the performance of any airfoil:

$$ v = \sqrt{\frac{2\,L}{A}}\,\sqrt{\frac{1}{\rho}}. $$

since, lower altitudes require lower velocity to generate the same lift force. But how does lower velocity equate to more endurance?

Using climatic data from MIL-STD-210C under the conservative assumption of a high density which occurs 20{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} of the time we have a basic 1/x relationship with density decreasing dramatically with altitude.

From, this alone, we can then plot the velocity needed to keep $C_L$ constant.

To understand the impact this has on endurance, we have to look at brake specific fuel consumption (BSFC). This is a measure of fuel efficiency within a shaft reciprocating engine. It simply the rate of fuel consumption divided by the power produced and can also be considered a as power-specific fuel consumption. If we consume a given number of grams per second of fuel at rate, $r$ for a given power, $P$,

$$ \text{BFSC} = \frac{r}{P} = \frac{r}{\tau \, \omega}, $$

with $\tau$ as engine torque $N·m$ and ($\omega$) as engine speed (rad/s).

$BFSC$ varies from complex interactions with engine speed and pressure:

To understand how this affects endurance, let’s just consider how much fuel we have divided by the rate we use it. Because fuel consumption for piston engines is proportional to power output, we use the average value method to predict the endurance of a propeller-driven aircraft. This is very simply looking at endurance as the fuel weight you have divided by the rate that fuel is spent, all the way to zero fuel. If $W_f$ is the weight of fuel available to be burned:

$$ E = \frac{\Delta W_f}{\dot W_f} = \frac{\Delta W_f}{(\text{BSFC}/\eta_{prop})\,D_{avg}\,V_{\infty}}, $$

where $\eta_{prop}$ is the propeller efficiency factor and $V_{\infty}$ is the speed of the air in the free-stream. The speed for maximum endurance needs to consider flight at the minimum power available,

and at this power, we should be able to maximize our endurance at the appropriate velocity. For increased accuracy, let’s consider that fuel is burned continuously,
$$ E = \int_{W_2}^{W_1} \frac{\eta_{prop}}{\text{BSFC}} \frac{dW}{D\,V_{\infty}}. $$

If $C_L$ is constant and $W=L$, then we can substitute $V_{\infty} = \sqrt{2W/(\rho\,A\,f\,C_L)}$,

$$ E = \frac{\eta_{prop}\,C_L^{3/2}}{\text{BSFC}\,C_D} \sqrt{\frac{\rho,A}{2}} \int_{W_2}^{W_1} \frac{dW}{W^{3/2}} = \frac{\eta_{prop}\,C_L^{3/2}}{\text{BSFC}\,C_D} \sqrt{2 \rho A} \left( W_2^{-1/2} – W_1^{1/2} \right)$$

This tells us that if we want maximum endurance, we want high propeller efficiency, low BSFC, high density (both low altitude and temperature) high amount (weight) of fuel available, and a maximum value of the ratio of $C_L^{3/2}/C_D$. Naturally, the higher density is going to directly assist, but the coefficients of lift over drag is maximized when we minimize the power output required,

$$ P_R = V_{\infty} \, D = V_{\infty} , \frac{W}{C_L / C_D} = \sqrt{\frac{2\,W^3}{\rho S}}\frac{1}{C_L^{3/2}/C_D} = \frac{\text{constant}\,C_D}{C_L^{3/2}} $$

So we want to minimize $P_R$. For an assumed drag polar, the condition for minimizing $C_D/C_L^{3/2}$ is found by expressing the ratio in terms of the drag polar, taking the derivative, and setting it equal to zero:

$$ 3 C_{D_0}=k C_L^2 $$

This has the interesting result that the velocity which results in minimum power required has an induced drag three times higher than parasitic drag.

In all, it seems that platform endurance is a function of a number of design parameters and the air density, so in general the higher the air density, the higher the endurance. Please comment to improve my thoughts on this.

By 0 Comments

Review: Sapiens

Yuval Harari (יובל נח הררי) has written a scholarly, thought-provoking and crisply written survey of “big history” that he uses to preach his starkly different views of philosophy and economics. Dr. Harari teaches at the Hebrew University of Jerusalem where he wrote Sapiens in Hebrew and accomplished the very idiomatic translation himself. Provocative and broad, he argues for a different perspective on history with confidence and swagger that he has it all figured out. Certainly, his very use of “Sapiens” for the title is in itself provocative. It reminds us that, long ago, the world held half a dozen species of human, of which only Homo sapiens survives. By highlighting the remarkably unpredictable path of history, he continually emphasizes that our present state is merely one of many possible — and he applies this both to the nature of our species and our beliefs (what he calls “shared myths”). This contrasts with my worldview which is the Christian narrative that we were uniquely created in God’s likeness and image and uniquely entrusted with a creation mandate.

Despite my differences, I appreciate any book that unleashes my imagination and challenges my worldview. Sapiens delivers with surveys of the impact of language, agriculture, economics, and science and technology against the unique backdrop of our species. The final conclusion is that we have won. While we are no longer competing for dominance on the planet, our greatest danger is our own inability to govern ourselves and anticipate the impact of our technology. Thus, for all our advances, our victory may be Pyrrhic: we have specularly failed to use our powers to increase our well-being and happiness. Because of this, Dr Harari predicts that we will vanish within a few centuries, either because we’ve gained such godlike powers as to become unrecognizable or because we’ve destroyed ourselves through environmental mismanagement.

From all this, I found Sapiens contains several new and powerful ideas: that Sapiens’ takeover of the planet is complete, that our greatest power is the ability to collectively create, believe and share ideas, that technology often enslaves us and can’t be resisted, and that the scientific revolution is the result of a willingness to admit ignorance.

We win! Sapiens now dominate the Planet

While the image below is not from Sapiens (it is from xkcd), it makes the point well that our species has dominated the planet. If you consider humans and our domesticated animals, our takeover is complete and the remaining wildlife is simply there to amuse us and give meaning to the pictures in our children’s books.

I wish he would have calculated the date Sapiens “won”, or the date where our species could basically do whatever we wanted with nature. While the industrial revolution might be seen by many as the period we conquered the world, I suspect the rise of the agrarian economy is where we took a different route and established a dominant position over nature. For most of its history, Homo sapiens lived a nomadic lifestyle. The vast majority of our ancestors spent their lives hunting prey and gathering vegetation. Rather than settling in one area, they travelled to wherever food was plentiful. Around 12,000 years ago, however, that all changed and that is when our population started to explode. On a small patch of land, farmers could grow a mass of edible plants. The consequent increase in the food supply meant that human societies could sustain much higher populations, but that required the systems we use today: money, government, war and politics.

However, the systems that provide us this dominance may provide the means for our downfall. Sapiens reminds us that we may be collectively acting in a way that is harmful to our future as a species by establishing such a dominant position on our planet without a matching increase in our wisdom or ability to govern. I often worry that we have developed several generations of technology, but are morally equivalent to our ancestors dating back 1000s of years. What politician today has the gravitas or ethics of Cicero? Clearly, our iPads don’t make us better people.

This wouldn’t be a problem, except the interdependent economic systems and the explosion of the money supply (i.e. leverage) makes our society dependent on high expectations about the future and how bank interdependencies have increased systemic risk to a level unprecedented in history.

Sapiens’ power is our ability to shared and jointly believe in ideas

Sapiens have accomplished this domination because we can uniquely cooperate in very large numbers across vast distances. This is Harari’s most strident theme: the physical un-reality of all our ideas. He claims that all large-scale human cooperation is based on myths, which he highlights are fiction. He describes shared concepts like the nation, money, and human rights as fabrications that have no relation to our biological reality as they don’t exist objectively. He claims that the fact that we share and ascribe such a lofty status to fictions is our most unique characteristic as a species.

While he remarks that we tend to over-value the reality of all our ideas, he reserves his sharpest criticism for Religion and its role in forming a shared human story. He covers Zoroastrian sacred texts, the Book of Genesis, the Popul Vuh. To him, gossip forms the most fundamental bond between local groups, but larger groups require religion which can sweep past trifling details and unite nations. In his narrative, religion was the necessary glue for human society until the 19th century, when scientific knowledge was able to create a standardized set of world beliefs. However, he notes that, without religion, there is no basis for many of the values we hold dear such as human rights.

This denigration of reality of our ideas and institutions is one where Harari overplays his hand. I believe our ideas and institutions, what Harari calls myths, have a complex ontological status. While Nationhood, the Pythagorean Theorem and the fundamental equality of human beings before the law are all non-physical notions formed in human brains, to label them all as fictions as Harari does, without distinguishing more carefully, diminishes the entire book.

Technology and Progress are a Mixed Blessing

When he talks technology and the Scientific revolution, Harari is now talking an area I’m much more familiar with. He makes clear that our obsession with technology is a modern phenomenon. To him, the relationship between science and technology itself is a recent development and when Bacon connected the two in the early seventeenth century, it was a revolutionary idea.

While our current society looks at the blessings of technology as an absolute win, Harari highlights the darker shadow behind technical advances. Namely, despite appearances consumers don’t have a choice to adopt new technology. Someone can decide to forgo email and credit cards, but they will not be able to participate in the modern economy. New technology thus creates addictions and what he calls a “luxury trap”.

For example, examine the rise of farming. Agriculture increased the amount of available food, yet the result of prosperity was not happiness but “population explosions and pampered elites.” Farmers worked harder than foragers and had a worse diet and poorer health, but the foragers had to adopt to the new economy. The surplus went to the privileged few, who used it to oppress. “The Agricultural Revolution,” Harari says, “was history’s biggest fraud.”

At the end of the book, Harari expresses an ambivalence about what we consider today to be a species-wide increase in well-being. “Unfortunately,” he says, “the Sapiens regime on earth has so far produced little that we can be proud of.” While I’m personally in awe of quantum electrodynamics, the modern financial system and anatheisa, Harari is arguing that living better has not made us more content. Citing recent research in psychology, he states that happiness “depends on the correlation between objective conditions and subjective expectations.” He cites that one may win the lottery and another become disabled. While one will most likely experience a short-term happiness and the other depression, research has shown that will be equally happy in a year, even though their circumstances remain different.

More worrying, our current dependence on technology may be our downfall. Not only are we interconnected to an unprecedented degree, but we also are addicted to growth and near-impossible expectations for technology to increase productivity and make scarce resources abundant. While this has historically avoided our malthusian collapse as population has grown, Harari persuasively argues that history is notoriously difficult to predict.

We might be at the beginning of the end of our species

At DARPA, we are starting to understand and experiment with human-machine symbiosis. Recently, we have not only wired a sense of touch from a mechanical hand directly into the brain, but have also figured out how to connect the brain to sensors that feel natural. Sapiens highlights the transplantation of ears onto mice and some of the fascinating and terrifying implications of stem cell manipulation.

Harari notes that for the first time in history, “we will see real changes in humans themselves – in their biology, in their physical and cognitive abilities”. History reveals that while we have enough imagination to invent new technologies, we are unable to foresee their consequences. Harari states:

It was the same with the agricultural revolution about 10,000 years ago. Nobody sat down and had a vision: ‘This is what agriculture is going to be for humankind and for the rest of the planet.’ It was an incremental process, step by step, taking centuries, even thousands of years, which nobody really understood and nobody could foresee the consequences.

The Scientific Revolution is the result of an Admission of Ignorance

Harari takes a stab at what caused the scientific revolution: a willingness to admit ignorance. Before the modern scientific era, the State (the King) and the Church were the source of all truth. There was no excuse to be ignorant. With pith and awe, Harari describes the Scientific Revolution as the point in history when “humankind admits its ignorance and begins to acquire unprecedented power.”

It is worth quoting him here. He writes:

“But modern science differs from all previous traditions of knowledge in three critical ways:

a) The willingness to admit ignorance. Modern science is based on the Latin injunction ignoramus – ‘we do not know’. It assumes that we don’t know everything. Even more critically, it accepts that the things that we think we know could be proven wrong as we gain more knowledge. No concept, idea or theory is sacred and beyond challenge.

b) The centrality of observation and mathematics. Having admitted ignorance, modern science aims to obtain new knowledge. It does so by gathering observations and then using mathematical reels to connect these observations into comprehensive theories.

c)The acquisition of new powers. Modern science is not content with creating theories. It uses these theories in order to acquire new powers, and in particular to develop new technologies.

The Scientific Revolution has not been a revolution of knowledge. It has been above all a revolution of ignorance. The great discovery that hunched the Scientific Revolution was the discovery that humans do not know the answers to their most important questions.”

Also, this scientific progress, he asserts, was fueled by the twin forces of imperialism and capitalism.

He writes:

“What forged the historical bond between modern science and European imperialism? Technology was an important factor in the nineteenth and twentieth centuries, but in the early modern era it was of limited importance. The key factor was that the plant-seeking botanist and the colony-seeking naval officer shared a similar mindset. Both scientist and conqueror began by admitting ignorance – they both said, ‘I don’t know what’s out there.’ They both felt compelled to go out and make new discoveries. And they both hoped the new knowledge thus acquired would make them masters of the world.”

While I find his views fascinating here, they still don’t match with my worldview. I consider the grand vision of using the scientific method to gain mastery over the physical world arose from the long-standing Christian vision—dating back at least to St. Augustine in the fourth century. My view is that nature as the second book through which God made himself known to humanity (the first being the Bible). Galileo justified science as an attempt to know the mind of God through his handiwork.

To miss this connection is only possible by forcing to remove the lens of current accepted groupthink. This is where Harari disappoints. He defers too much to current orthodoxies often resisting the logic of his own arguments for fear of affronting feminists or avoids conclusions that criticize his gay lifestyle, vegan sensibilities or postmodern worldview. There seems to be an inner conflict between the author’s freethinking scientific mind and a fuzzier worldview hobbled by political correctness.

In any case, I find Sapiens breathtaking in its scope and fascinating in its perspective.


There are some great quotes from the book here.

By 3 Comments

July 4th Home Project: Thermostat Bracket

This post is about how to use design tools to save time and build nice stuff at home using computer controlled machines (CNC). In addition to describing my design process, I’ve also included the helpful references I found along the way.

Our old thermostat was too small for our wall. I could have replaced the drywall, but I needed a starter project to help me understand 3D CNC work with wood. Replacing the drywall would have taken a good bit of time because of the lath behind the old thermostat. The bracket took a long time because I had to learn wood CNC and spent way too long finishing the surface. In the end, this was a great use of a wood CNC machine. It would have been difficult to get the corners right and route out the pocket inside. Additionally, I could prototype and correct parts of my design with the rapid iteration that CNC machines provided.

We have a programmable thermostat with z-wave, the 2gig CT100 Z-Wave Touch Screen Programmable Thermostat. It works perfectly and is easy to control with our Mi Casa Verde VeraLite Home Controller. This gives us the ability to set our temperature from our phones or do nest-like things like learn our patterns and adjust temperature. We can also set up multiple thermostats to regulate temperature throughout the different regions of our house.

In case you are working with the CT100 or the VeraLite, you might find the following links helpful:

Design

I designed the bracket in Fusion 360. I’m still figuring out how to use Fusion, but it is a computer-aided design application for creating 3D digital prototypes including design, visualization and simulation. Fusion 360 is easy to use and it provides the ability to go from design, render, analysis and production in one tool. Most important, it is free for hobbyists.

The design was pretty straightforward. It is a one inch offset with fillets that matched the radius of the CT100. One problem with CNC routing is that I tend to design features that take advantage of the CNC features and this tends to lead to more curves. I just had to get the measurements right. I shouldn’t need to do this, but I used a laser cutter to cut out the frame from a piece of cardboard to check the fit. I’m glad I did, because I hadn’t accounted for some of the curves and the opening was too small. In general, I love using the laser-cutter to prototype designs. The prototype let me see how the final design would look on the wall. This would have been helpful to test different designs. Chrissy and I tend to like 18th-century English and 19th-century neoclassic millwork, but I didn’t put too much thought into this design, partly because I could change it so easily.

Here is the final, dimensioned, design:

Screenshot 2016-07-03 10.16.37

Construction

I found a piece of scrap plywood at TechShop that I cut on the ShopBot buddy.

ShopBot Buddy

To cut the workpiece I used the 1/4″ High Speed Steel Two Flute Downcut. You can see the purchase page here. As this was my first cut, I had to understand the definitions and the different cutter parameters to build the tool in fusion.

For the High Speed Steel Two Flute Downcut I have the parameters are:

  • CED: 1/4
  • CEL: 1
  • SHK: 1/4
  • OAL: 3

Here are some terms that helped me:

CED: CED is abbreviated for cutting edge diameter or the width of the cut the tool should make through the work piece. CED has a tolerance in thousandths of an inch or .xxx decimal places.

CEL: CEL is abbreviated for cutting edge length and is the maximum thickness of the material it can cut. CEL has a tolerance in hundredths of an inch or .xx decimal places.

SHK: SHK is abbreviated for shank diameter and is the nominal size of the shank which should match the collet size of the spindle the tool will be used in. SHK has tolerance in the ten-thousandths of an inch or .xxxx decimal places.

OAL: OAL is abbreviated for overall length and is the total nominal length of the tool from end to end. OAL has a tolerance in hundredths of an inch or .xx decimal places.

HSS: High Speed Steel, typical applications in Non-Abrasive Plastic, Solid Wood & Aluminum where keen edges perform best. High Speed Steel tools perform well in hand routing applications where a tough core is necessary to prevent tool breakage.

Carbide Tipped: Used for a variety of applications in composite woods, hardwoods, abrasive plastics and composites plastics to hard aluminum. Limited by geometry in some applications due to the brazed nature of the tool. Carbide Tipped tools work well in hand routing applications due to the tough HSS core and hard carbide cutting edges.

Solid Carbide: Typically used for widest variety of applications including man-made board, solid wood, abrasive plastics, and some aluminum’s. Solid Carbide does not deflect allowing the tool to be fed at higher feedrates than PCD or straight insert cutters decreasing cycle times. s typically. Solid tools also have major edge keenness advantage thought only possible in HSS until a few years ago.

Chipload: Chipload is simply defined as the thickness of a chip which is formed during the machining of a material. Chipload is critical because if the chip is the proper size, the chip will carry away the heat promoting long tool life. If the chip is too small, the heat is transferred to the cutting tool causing prematurely dulling. Too high of a chipload will cause an unsatisfactory edge finish, or part movement.

The most important reason to understand cutter parameters, is to set the correct feed rates, which is a combination of rpm and cutting speed. In order to get this right, I consulted this reference from ShopBot and read up on end mills in general at makezine. I also was able to incorporate some information from destiny tool that was helpful to verify my settings.

These links also helped:
* hardwood cutting data from Onsrud
* A great video tutorial from ShopBot

After understanding endmills, I had to get everything in the right format for the machine. I found the open sbp reference to be very helpful and the command reference also taught me how to understand the resultant g-code.

I summarized my research below:

Table

Name SB# Onsrud Series Cut Chip Load per leading edge Flutes Feed rate (ips) RPM Max Cut
1/4″ Downcut Carbide End Mill 13507 57-910 1 x D 0.005-0.007 2 3.0-4.2 18,000

You can see the final product here:

20160703_180847

By One Comment

Satisfiability modulo theories and their relevance to cyber-security

Cybersecurity and cryptoanalysis is a field filled with logic puzzles, math and numerical techniques. One of the most interesting technical areas I’ve worked goes by the name of satisfiability modulo theories (SMT) and their associated solvers. This post provides a layman’s introduction to SMT and its applications to computer science and the modern practice of learning how to break code hack.

Some background theory

Satisfiability modulo theories are a type of constraint-satisfaction problems that arise many places from software and hardware verification, to static program analysis and graph problems. They apply where logical formulas can describe a system’s states and their associated transformations. If you look under the hood for most tools used today for computer security, you will find they are based on mathematical logic as the calculus of computation. The most common constraint-satisfaction problem is propositional satisfiability (commonly called SAT) which aims to decide whether a formula composed of Boolean variables, formed using logical connectives, can be made true by choosing true/false values for its variables. In this sense, those familiar with Integer Programming will find a lot of similarities with SAT. SAT has been widely used in verification, artificial intelligence and many other areas.

As powerful as SAT problems are, what if instead of boolean constraints, we use arithmetic in a more general application to build our constraints? Often constraints are best described as linear relationships among integer or real variables. In order to understand and rigorously treat the sets involved domain theory is combined with propositional satisfiability to arrive at satisfiability modulo theory (SMT).

The satisfiability modulo theories problem is a decision problem for logical formulas with respect to combinations of background theories expressed in classical first-order logic with equality. An SMT solver decides the satisfiability of propositionally complex formulas in theories such as arithmetic and uninterpreted functions with equality. SMT solving has numerous applications in automated theorem proving, in hardware and software verification, and in scheduling and planning problems. SMT can be thought of as a form of the constraint satisfaction problem and thus a certain formalized approach to constraint programming.

The solvers developed under SMT have proven very useful in situations where linear constraints and other types of constraints are required with artificial intelligence and verification often presented as exemplars. An SMT solver can solve a SAT problem, but not vice-versa. SMT solvers draw on some of the most fundamental areas of computer science, as well as a century of symbolic logic. They combine the problem of Boolean satisfiability with domains (such as those studied in convex optimization and term-manipulating symbolic systems). Implementing SAT solvers requires an implementation of the decision problem, completeness and incompleteness of logical theories, and complexity theory.

The process of SMT solving is a procedure of finding an satisfying assignment for a quantifier-free formula for $F$ with predicates on a certain background theory $T$. Alternatively the SMT solving process could show such an assignment doesn’t exist. An assignment on all variables that satisfies these constraints is the model or $M$. $M$ will be satisified when $F$ evaluates to $\text{true}$ under a given background theory $T$. In this sense, $M$ entails $F$ under theory $T$ which is commonly expressed as: $ M \vDash_T F$. If theory $T$ is not decidable, then the underlying SMT problem is undecidable and no solver could exist. This means that given a conjunction of constraints in $T$, there must exist a procedure of finite steps that can test the existence of a satisfying assignment for these constraints.

Ok, that is a lot of jargon. What is this good for?

SMT solvers have been used since the 1970s, albeit in very specific contexts, most commonly theorem-proving cf ACL2 for some examples. More recently, SMT solvers have been helpful in test-case generation, model-based testing and static program analysis. In order to make this more real with a concrete example, let’s consider one of the most well-known operations research problems: job-shop scheduling.

If there are $n$ jobs to complete, each composed of $m$ tasks with different durations on $m$ machines. The start of a new task can be delayed indefinitely, but you can’t stop a task once it has started. For this problem, there are two constraints: precedence and resource constraints. Precedence specifies that one job has to happen before another and the resource constraint specifies that no two different tasks requiring the same machine are able to execute at the same time. If you are given a total maximum time $max$ and the duration of each task, the task is to decide if a schedule exists where the end time of every task is less than or equal to $max$ units of time. The duration of the $j$th task of job $i$ is $d_{i,j}$ and each task starts at $t_{i,j}$.

I’ve solved this problem before with heuristics such as Simulated_annealing, but you can encode the solution to this problem in SMT using the theory of linear arithmetic. First, you have to encode the precedence constraint:

$$ t_{i,j}+1 \geq t_{i,j} + d_{i,j} $$

This states that the start-time of task $j+1$ must be greater or equal to the start time of task $j$ plus its duration. The resource constraint ensures that jobs don’t overlap. Between job $i$ and job $i’$ this constraint says:

$$ (t_{i,j} \geq t_{i’,j}+d_{i’,j}) \vee (t_{i’,j} \geq t_{i,j} + d_{i,j}) $$

Lastly, each time must be non-negative, or $ t_{i,1} \geq 0 $ and the end time of the last task must be less than or equal to $max$ or $t_{i,m} + d_{i,m} \leq max$. Together, these constraints forms a logical formula that combines logical connectives (conjunctions, disjunction and negation) with atomic formulas in the form of linear arithmetic inequalities. This is the SMT formula and the solution is a mapping from the variables $t_{i,j}$ to values that make this formula $\text{true}$.

So how is this relevant to software security?

Since software uses logical formulas to describe program states and transformations between them, SMT has proven very useful to analyze, verify or test programs. In theory, if we tried every possible input to a computer program, and we could observe and understand every resultant behavior, we would know with certainty all possible vulnerabilities in a software program. The challenge of using formal methods to verify (exploit) software is to accomplish this certainty in a reasonable amount of time and this generally distills down to clever ways to reduce the state space.

For example, consider dynamic symbolic execution. In computational mathematics, algebraic or symbolic computation is a scientific area that refers to the study and development of algorithms and software for manipulating mathematical expressions and other mathematical objects. This is in contrast to scientific computing which is usually based on numerical computation with approximate floating point numbers, while symbolic computation emphasizes exact computation with expressions containing variables that are manipulated as symbols.

The software that performs symbolic calculations is called a computer algebra system. At the beginning of computer algebra, circa 1970, when common algorithms were translated into computer code, they turned out to be highly inefficient. This motivated the application of classical algebra in order to make it effective and to discover efficient algorithms. For example, Euclid’s algorithm had been known for centuries to compute polynomial greatest common divisors, but directly coding this algorithm turned out to be inefficient for polynomials over infinite fields.

Computer algebra is widely used to design the formulas that are used in numerical programs. It is also used for complete scientific computations, when purely numerical methods fail, like in public key cryptography or for some classes of non-linear problems.

To understand some of the challenges of symbolic computation, consider basic associative operations like addition and multiplication. The standard way to deal with associativity is to consider that addition and multiplication have an arbitrary number of operands, that is that $a + b + c$ is represented as $”+”(a, b, c)$. Thus $a + (b + c)$ and $(a + b) + c$ are both simplified to $”+”(a, b, c)$. However, what about subtraction, say $(a − b + c)$? The simplest solution is to rewrite systematically, so $(a + (-1)\cdot b + c)$. In other words, in the internal representation of the expressions, there is no subtraction nor division nor unary minus, outside the representation of the numbers. New Speak for mathematical operations!

A number of tools used in industry are based on symbolic execution. (cf CUTE, Klee, DART, etc). What these programs have in common is they collect explored program paths as formulas and use solvers to identify new test inputs with the potential to guide execution into new branches. SMT applies well for this problem, because instead of the random walks of fuzz testing, “white-box” fuzzing which combines symbolic execution with conventional fuzz testing exposes the interactions of the system under test. Of course, directed search can be much more efficient than random search.

However, as helpful as white-box testing is in finding subtle security-critical bugs, it doesn’t guarantee that programs are free of all the possible errors. This is where model checking helps out. Model checking seeks to automatically check for the absence of categories of errors. The fundamental idea is to explore all possible executions using a finite and sufficiently small abstraction of the program state space. I often think of this as pruning away the state spaces that don’t matter.

For example, consider the statement $a = a + 1$. The abstraction is essentially a relation between the current and new values of the boolean variable $b$. SMT solvers are used to compute the relation by proving theorems, as in:

$$ a == a_{old} \rightarrow a+1 \neq a_{old} $$ is equavalient to checking the unsatisfiability of the negation $ a == a_{old} \wedge a + 1 == a_{old} $.

The theorem says if the current value of $b$ is $\text{true}$, then after executing the statement $ a = a + 1$, the value of $b$ will be $\text{false}$. Now, if $b$ is $\text{false}$, then neither of these conjectures are valid:

$$
a \neq a_{old} \rightarrow a + 1 == a_{old}
$$
or
$$
a \neq a_{old} \rightarrow a + 1 \neq a_{old}
$$

In practice, each SMT solver will produce a model for the negation of the conjecture. In this sense, the model is a counter-example of the conjecture, and when the current value of $b$ is false, nothing can be said about its value after the execution of the statement. The end result of these three proof attempts is then used to replace the statement $a = a + 1$ by:

 if b then
   b = false;
 else
   b = *;
 end

A finite state model checker can now be used on the Boolean program and will establish that $b$ is always $\text{true}$ when control reaches this statement, verifying that calls to

lock()

are balanced with calls to

unlock()

in the original program.

do {
 lock ();
 old_count = count;
 request = GetNextRequest();
 if (request != NULL) {
  unlock();
  ProcessRequest(request);
  count = count + 1;
 }
}
while (old_count != count);
unlock();

becomes:

do {
 lock ();
 b = true;
 request = GetNextRequest();
 if (request != NULL) {
   unlock();
   ProcessRequest(request);
   if (b) b = false; else b = ∗;
 }
}
while (!b);
unlock();

Application to static analysis

Static analysis tools work the same way as white-box fuzzing or directed search while ensuring the feasibility of the program through, in turn, checking the feasibility of program paths. However, static analysis never requires actually running the program and can therefore analyze software libraries and utilities without instantiating all the details of their implementation. SMT applies to static analysis because they accurately capture the semantics of most basic operations used by mainstream programming languages. While this fits nicely for functional programming languages, this isn’t always a perfect fit languages such as Java, C#, and C/C++ which all use fixed-width bitvectors as representation for values of type int. In this case, the accurate theory for int is two-complements modular arithmetic. Assuming a bit-width of 32b, the maximal positive 32b integer is 231−1, and the smallest negative 32b integer is −231. If both low and high are 230, low + high evaluates to 231, which is treated as the negative number −231. The presumed assertion 0 ≤ mid < high does therefore not hold. Fortunately, several modern SMT solvers support the theory of “bit-vectors,” accurately capturing the semantics of modular arithmetic.

Let’s look at an example from a binary search algorithm:

int binary_search(
int[] arr, int low, int high, int key) {
 assert (low > high || 0 <= low < high);
 while (low <= high) {
 //Find middle value
 int mid = (low + high)/2;
 assert (0 <= mid < high); int val = arr[mid]; //Refine range if (key == val) return mid; if (val > key) low = mid+1;
   else high = mid–1;
 }
 return –1;
}

 

Summary

SMT solvers combine SAT reasoning with specialized theory solvers either to find a feasible solution to a set of constraints or to prove that no such solution exists. Linear programming (LP) solvers are designed to find feasible solutions that are optimal with respect to some optimization function. Both are powerful tools that can be incredibly helpful to solve hard and practical problems in computer science.

One of the applications I follow closely is symbolic-execution based analysis and testing. Perhaps the most famous commercial tool that uses dynamic symbolic execution (aka concolic testing) is the SAGE tool from Microsoft. The KLEE and S2E tools (both of which are open-source tools, and use the STP constraint solver) are widely used in many companies including HP Fortify, NVIDIA, and IBM. Increasingly these technologies are being used by many security companies and hackers alike to find security vulnerabilities.

 

 

By 0 Comments

Basement Framing with the Shopbot

Framing around bulkheads is painful. It is hard to get everything straight and aligned. I found the Shopbot to be very helpful. There are three problems I was trying to solve: (1) Getting multiple corners straight across 30 feet, (2) nearly no time and (3) basic pine wood framing would sag over a 28″ run.

In fairness, the cuts did take a lot of time (about 2.5 hours of cutting), but I could do other work while the ShopBot milled out the pieces. I also had several hours of prep and installation, so I’m definitely slower than a skilled carpenter would be, but probably faster off by using this solution. Plus, I think I’m definitely more straight and accurate. I especially need this, because my lack of skill means that I don’t have the bag of tricks available to deal with non-straight surfaces.

First, Autodesk Revit makes drawing ducts easy as part of an overall project model. The problem was the way the ducts were situated, the team working on the basement couldn’t simply make a frame that went all the way to the wall, because of an annoying placed door.

I was able to make a quick drawing in the model and print out frames on the shopbot. They only had to be aligned vertically which was easy to do with the help of a laser level.

second-ducts-v4

These were easy to cut out while I also had to make some parts for my daughters school project.
20160613_210346

By 0 Comments

DIY Roulette Wheel and Probability for Kids

My daughter had to make a “probability game” for her fifth grade math class. She has been learning javascript and digital design so we worked together to design and build a roulette wheel for her class.

First, she drew out a series of sketches and we talked it over. She wanted a 0.5 inch thick plywood 2 foot diameter, but after some quick calculations we decided on a $1/4$ inch thick wheel and a 18″ diameter. I had to talk her into adding pegs and a ball bearing from a skateboard from amazon for $2. The inner diameter is $1/4$ inch so I also bought a package of dowels for $5 to make the pegs. I also bought a 1/2 sheet of plywood (that I used about 1/3 of) and some hardware from Home Depot.

She wanted 10 sections with combinations of the following outcomes: Small, Large and Tiny prizes as well as two event outcomes: Spin Again and Lose a Spin. Each student would have at most three turns. We had the following frequencies (out of 10):

Outcome Frequency
Small 3
Large 2
Tiny 1
Lose a spin 3
Spin Again 1

This led to a ~~fun~~ (frustrating) discussion monte carlo code,  conditional probabilities and cumulative probabilities. Good job teacher! We got to answer questions like:

  • What is the probability of getting a large prize in a game (three spins)?
  • What is the probability you get no prize?
  • What is the expected number of spins?

She really threw the math for a loop with the Spin Again and Lose a Spin options. We had to talk about systems with a random number of trials. My favorite part was exposing her to true randomness. She was convinced the wheel was biased because she got three larges in a row. I had to teach her that true random behavior was more unbalanced than her intuition might lead her to believe.

In order to understand a problem like this, it is all about the state space. There are four possible outcomes: three different prizes or no prize. To explain the effect the spin skips have on the outcomes, I had to make the diagram below. Each column represents one of the three spins, each circle represents a terminal outcome and each rectangle represents a result of a spin.

Drawing1

From this, we can compute the probabilities for each of the 17 outcomes:

1 2 3 Prob
1 $P_L$ 0.200
2 $P_S$ 0.300
3 $P_T$ 0.100
4 L $P_L$ 0.060
5 L $P_S$ 0.090
6 L $P_T$ 0.030
7 L L 0.090
8 L S 0.030
9 S $P_L$ 0.020
10 S $P_S$ 0.030
11 S $P_T$ 0.010
12 S L 0.030
13 S S $P_L$ 0.002
14 S S $P_S$ 0.003
15 S S $P_T$ 0.001
16 S S L 0.003
17 S S S 0.001

I would love to find a more elegant solution, but the strange movements of the state-space left me with little structure I could exploit.

And we can add these to get the event probabilities and (her homework) to generate the expected values of prizes she needs to bring when 20 students are going to play the game:

Probability Expected Value Ceiling
$P_L$ 28{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} 5.64 6
$P_S$ 42{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} 8.44 9
$P_T$ 14{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} 2.82 3
NP 16{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} 3.10 4

We can also get the probabilities for the number of spins:

Count Probability
One spin 0.600
Two 0.390
Three 0.010

Simulation

When the probability gets hard . . . simulate, and let the law of large numbers work this out.

This demonstrated the probability of getting a prize was:

Probability Expected Value Ceiling
$P_L$ 28{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} 5.64 6
$P_S$ 42{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} 8.46 9
$P_T$ 14{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} 2.82 3
NP 15{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} 3.08 4

Design

So I took her designs and helped her write the following code to draw the wheel in Adobe Illustrator. This didn’t take long to write, because I had written similar code to make a series of clocks for my 5 year old to teach him how to tell time. The code was important to auto-generate the designs, because we must have tried 10 different iterations of the game.

Which produced this Adobe Illustrator file that I could laser-cut:

spinner2

From here, I designed a basic structure in Fusion 360. I cut the base and frame from $1/2$ inch birch plywood with a $1/4$ inch downcut endmill on a ShopBot.

A render:

the wheel v19

And a design:

2016-06-09 (1)

If you want the fusion file, request in comments and I’ll post.

Please let me know if you have any questions and I’ll share my design. Next up? We are going to print a new wheel to decide who washes the dishes! Kids get double the frequency.

By One Comment