Operator precedence parsers are very simple on the surface. So don't feel in the least bit intimidated, because by the time you've read through this I hope to have you walk away with a solid foundation on how to write your very own operator precedence parser. The goal is to understand how to solve the problem of operator precedence parsing, and not necessarily to write your own parser. Learning how the problem can be solved is the most important thing to take away from this article.

He starts with an introduction to the concepts behind "operator precedence" including processing order and grouping. He also mentions infix and postfix (RPN) notations for handling different formats of equations. He used the "Shunting-yard Algorithm" and how it relates to handling the different parts of the equation, one at a time, in the correct order. He rest of the post is dedicated to the details of the execution in the tool, including code examples and the tokenization of the strings passed into it.

]]>One of the biggest reasons to refactor code is to eliminate duplication. It is pretty easy to introduce duplication in our code either unintentionally or because we don't know how to prevent or get rid of it. [...] I've found that there are three basic types of duplication that we can eliminate from our code that successfully build on each other.

He describes the three types - data, type and algorithm - and gives some code snippets showing how they present themselves and simple solutions of how to resolve them. There's also a quick mention of a "combined attack" when more than one form of duplication shows up at once. He suggests a to help find the "edges" of the duplication:

I've also found the key to eliminating duplication is sometimes to first exaggerate it. Often I will purposely take two methods that I know have some duplication and make them look even more duplicated in order to be able to clearly see where the duplication lies.]]>

In the last post we had a simple stepping algorithm, and a gradient descent implementation, for fitting a line to a set of points with one variable and one 'outcome'. As I mentioned though, it's fairly straightforward to extend that to multiple variables, and even to curves, rather than just straight lines. For this example I've reorganised the code slightly into a class to make life a little easier, but the main changes are just the hypothesis and learn functions.

He restructures the learning method to make it easier to reuse and includes a "scale data" method to compensate for irregularities in the data and compute the variance.

]]>There are a lot of problems that fall under predicting these types of continuous values based on limited inputs - for example: given the air pressure, how much rain will there be, given the qualifying times, how quick will the fastest lap be in the race. By taking a bunch of existing data and fitting a line, we will be able to make a prediction easily - and often reasonably correctly.

He defines two pieces of information, the intercept and the gradient, and how they relate to minimize the "square error" that can come from getting the square root of your values based on the difference between an actual and predicted value. Based on a sample data set, he comes up with these results, showing the trend line for the points given. He points out a few issues with the method and corrects them with a few tweaks to his original algorithm.

]]>I'd like to share a topic today which was re-introduced to me by a lightningtalk of a colleague of mine. His talk was on the "big O notation". The big O notation is a tool you can use to express the order of complexity of an algorithm. It is useful because it lets you express the order of complexity of an algorithm without taking a lot of time profiling or researching the underlying algorithm. In other words: it gives you a quick way to gain an understanding of what might be wrong (or right) with a specific algorithm.

He introduces the notation as the result of a series of steps needed to solve a problem (ex. 2+2 is less complex than 2+2+2). He illustrates with PHP examples that show adding complexity to a class, adding sets of numbers, looping to find needles in haystacks and finding duplicates in an array of strings. Each of these expand on the theory and show more complexity as the article progresses.

]]>When drawing a path on a map (for instance, the directions from point A to point B) it is important to consider the limitations of the device you're drawing the path on. In this article, I will show you how to reduce the number of points in a path so the path can be displayed with minimal loss of quality on devices such as iPhone or Android-powered devices that may struggle with an extremely large set of points.

Using the GTFS service's data, he's been creating maps for an iPhone application. Naturally, a path with 700 points would take a lot of resources to render. Using the Douglas-Peucker algorithm he can reduce it down to a much more manageable 70 points. He explains the algorithm briefly and hows how to implement it in PHP with three classes - ShapePoint, Shape and ShapeReducer. The resulting reduced dataset is then passed directly over to a Google Maps for plotting.

]]>Let me show you one sorting algorithm, perhaps the most known of all them - the quick sort, implemented both on PHP and JavaScript. Although the code look similar between both languages, there are few differences, that show the importance of the syntax knowledge!

Most of the post is taken up with the two code examples - the PHP one sorting arrays with a for loop and a call to array_merge and the Javascript side pushing values into an array and calling "concat" on it to get the right values.

]]>If you [override the default session handlers], unless you want to entrust PHP's core to do it, one thing you will have to take care of is generating unique session ids to send as a cookie to your users, allowing the session to persist. Other common use cases for such unique hashes is to generate CSRF tokens to insert in forms or URLs, and finally authentication tokens for email validation or such.

He talks about how we, as humans, aren't very good at figuring out true randomness and that hashing the information only adds to the problem. He mentions how some of the random functions in PHP aren't all that random and that there's a better way to really generate good values. He's come up with a solution (his "generateUniqueId" function) that tries to generate entropy from OpenSSL or from the COM extension or from the "/dev/urandom" on unix-based systems. It's then hashed and sent back out the other side for easy use.

]]>Programming is such a dynamic action that you often find yourself having to replace an algorithmn all together. It will be much easier to do if the current algorithmn is an easy one already. [...] Make sure you decompose your algorithmns as much as you can and use many small methods for it.

His example replaces multiple if statements to search through an array with a simple in_array statement, returning the selected array index from there.

]]>A sad poem of an algorithm where solitude brought excessive use of cpu cycles and memory allocation for redundant data (it copied over and over again the same image till all memory was filled with it)

This definitely belongs in the "random PHP-related things" area, but it's still a fun little read. Check it out - just try to not get too despondent.

]]>