MySQL Optimization – A Little Tip

I see the topic of MySQL optimization come up frequently in discussions. Understandably so, given that it doesn’t take that much data for a poorly optimized query to become the bottleneck in an otherwise well thought out process.

Some simple suggestions

Well I don’t consider myself an expert, I do have experience with working with larger datasets and there are a couple of things that I always do to keep queries performing well.

Optimize Queries with EXPLAIN

Explain is your friend, get to know it well. If you take the time to read thru the Explain documentation on the MySQL site, you will find some valuable information, some of which is hilighted below.

Optimizing joins

Single sweep what?

MySQL resolves all joins using a single-sweep multi-join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table.

Why is this important? Imagine a main table – tableA – with 80,000 rows of data. This table has a corresponding n:n table that maps entries in tableA with a locations table. A query could be written as:

SELECT tableA.*, locations.location from tableA
Left Join tableA2locations on
tableA2locations.tableA_id = tableA.id
Left Join locations on
tableA2locations.location_id = locations.id

Keeping the above quote in mind, MySQL will read a row from the first table and join the corresponding data from the joined tables for that row and then sweep thru the rest of the data, joining as it goes along.

This leads us into the following section.

Number of rows needed to execute a query

You can get a good indication of how good a join is by taking the product of the values in the rows column of the EXPLAIN output. This should tell you roughly how many rows MySQL must examine to execute the query.

where locations.location = 'sometown'

Keeping the above quote in mind, MySQL will read a row from the first table and join the corresponding data from the joined tables for that row and then sweep thru the rest of the data, joining as it goes along.

From the above, you can determine that for a query on tables that have not been properly indexed, a join can quickly become unwieldy when dealing simply with three tables with records in the thousands (1000*1000*1000 = a slow query). See HackMySQL for a good example of this.

Reducing the number of rows needed to execute a query

So beyond indexing properly for joins, you can still end up with a query that runs in a way that causes a bottleneck.

Taking our example from above, imagine that we use a where clause that limits the tableA selection to half (tableA.foo = 'bar' below):

SELECT tableA.*, locations.location from tableA
Left Join tableA2locations on
tableA2locations.tableA_id = tableA.id
Left Join locations on
tableA2locations.location_id = locations.id

This starts us out with 40,000 rows of tableA data to examine. If there are a further 2000 rows from tableA2locations, thats 800,000 rows of data. Not astronomical, but significant. If this was a 3 or 4 table join, things could get ugly. What to do? The answer may be obvious to some: select first with the most limiting table:

SELECT tableA.*, locations.location from locations
Left Join tableA2locations on
tableA2locations.location_id = locations.id
Left Join tableA on
tableA2locations.tableA_id = tableA.id

This starts us out with 1 selection from the locations table, then 2000 from tableA2locations. If the join between tableA2locations and tableA is indexed correctly, we are then left with an index join based on ID, rather then having to initially select 40,000 rows from tableA as in the previous example.

When I first started programming, it made sense to me to select from the main table (tableA) and join the lookups. But once you add some data to the mix and start to play with Explain, you quickly realize that selecting from the limiting table can make your server’s life a little easier.

For further reading on the topic, I always send people to HackMySQL when they ask, so for more tips and tricks, be sure to have a read thru the optimize section of that site.

where locations.location = 'sometown' and tableA.foo = 'bar'
where locations.location = 'sometown' and tableA.foo = 'bar'
Blog
  • Recent Posts

  • Categories

  • Twitter

  • advertisement

Contact Us