• Jesse Van Rooy

  • Software Engineer

Identifying and optimizing crucial bottlenecks

When building web applications, a great deal of optimization often occurs to speed up the flow of information from and to the user. The possible ways of speeding up this information flow are as varied as web applications themselves. In this article we will be focusing on the way a data model can be optimized to fit the specific use cases for which this data model is intended.

It’s no secret these optimizations are always specific to a use case. They have a positive effect on the use case they are intended for, but also have a negative effect on other use cases (in the case of an index this is an increased insert/update workload). Evaluating which use cases create the biggest bottlenecks and getting an understanding of how the data model is used in a production environment is crucial to making the right tradeoffs in order to acquire the desired performance.

In this post I will briefly explain the steps that I usually take to identify the crucial bottlenecks and how to optimize them.

Use case

To show off some of the steps I go through I will use a simple demo application for reference. I will be using a simple Spring boot application that uses JPA/Hibernate to connect to a MySQL database. In order to render some HTML quickly I will be using Thymeleaf templates. We will also sprinkle on some Lombok annotations to save on time.

Let’s take a very basic use case where we have a User entity.

@Getter
@Setter
@Entity
@Builder
@NoArgsConstructor
@AllArgsConstructor
@Table(name = "users")
public class User {

    @Id
    @GeneratedValue
    private Long id;

    @Length(max = 64)
    @Column(name = "display_name")
    private String displayName;

    @Length(max = 16)
    @Column(name = "first_name")
    private String firstName;

    @Length(max = 16)
    @Column(name = "last_name")
    private String lastName;

    @Length(max = 64)
    @Column(name = "email")
    private String email;

} 


With a simple JPA repository with two paged query methods.

public interface UserRepository extends JpaRepository<User, Long> {

    Page<User> findAllByDisplayName(String displayName, Pageable pageable);

    Page<User> findAllByFirstNameOrLastName(String firstName, String lastName, Pageable pageable);

}


We create a simple Crud service for our User entity.

public interface UserService {

    Page<User> listAllUsers(Pageable pageable);

    Page<User> listUsersWithDisplayName(String displayName, Pageable pageable);

    Page<User> listUsersWithFirstNameOrLastName(String name, Pageable pageable);

}  
@Service
@RequiredArgsConstructor
public class UserServiceImpl implements UserService {

    private final UserRepository userRepository;

    @Override
    public Page<User> listAllUsers(Pageable pageable) {
        return userRepository.findAll(pageable);
    }

    @Override
    public Page<User> listUsersWithDisplayName(String displayName, Pageable pageable) {
        return userRepository.findAllByDisplayName(displayName, pageable);
    }

    @Override
    public Page<User> listUsersWithFirstNameOrLastName(String name, Pageable pageable) {
        return userRepository.findAllByFirstNameOrLastName(name, name, pageable);
    }

}

We'll also create an installer that runs after application setup. This installer will randomly generate 10000 users for our example.

@Component
@RequiredArgsConstructor
public class DummyUserInstaller {

    public static final String[] FIRST_NAMES = {
            "James",
            "David",
            "Christopher",
            "George",
            "Ronald",
            "John",
            "Richard"
    };

    public static final String[] LAST_NAMES = {
            "Smith",
            "Johnson",
            "Williams",
            "Jones",
            "Brown",
            "Davis",
            "Miller",
            "Wilson"
    };

    public static final String[] EMAIL_PROVIDERS = {
            "gmail.com",
            "hotmail.com",
            "outlook.com",
            "yahoo.com"
    };
    public static final int RANDOM_USER_COUNT = 10000;

    private final UserRepository userRepository;

    @PostConstruct
    public void installUsers() {
        userRepository.deleteAll();
        Random random = new Random();
        for (int i = 0; i < RANDOM_USER_COUNT; i++) {
            userRepository.save(createRandomUser(random));
        }
    }

    private User createRandomUser(Random random) {
        String randomFirstName = FIRST_NAMES[random.nextInt(FIRST_NAMES.length)];
        String randomLastName = LAST_NAMES[random.nextInt(LAST_NAMES.length)];
        return User.builder()
                .displayName(String.format("%s %s", randomFirstName, randomLastName))
                .firstName(randomFirstName)
                .lastName(randomLastName)
                .email(String.format("%s_%s%d@%s", randomFirstName, randomLastName,
                        random.nextInt(100), EMAIL_PROVIDERS[random.nextInt(EMAIL_PROVIDERS.length)]))
                .build();
    }

}


And in a controller we'll register an endpoint in order to get a simple user search functionality. This endpoint has three optional request parameters: page & size in order to implement pagination and one to be able to search on a display name. If no display name request parameter is given all results will be returned.

@Controller
@RequiredArgsConstructor
public class UserController {

    public static final int DEFAULT_PAGE = 1;
    public static final int DEFAULT_PAGE_SIZE = 20;

    private final UserService userService;

    @GetMapping("/")
    public String overview(@RequestParam(name = "displayName", required = false) String displayName,
                           @RequestParam(name = "page", required = false) Integer page,
                           @RequestParam(name = "size", required = false) Integer size,
                           Model model
    ) {
        if (displayName == null) {
            model.addAttribute("users", userService.listAllUsers(toPageRequest(page, size)));
        } else {
            model.addAttribute("users", userService.listUsersWithDisplayName(displayName, toPageRequest(page, size)));
        }
        return "user-overview";
    }

    private PageRequest toPageRequest(Integer page, Integer size) {
        if (page == null) {
            page = DEFAULT_PAGE;
        }
        if (size == null) {
            size = DEFAULT_PAGE_SIZE;
        }
        return PageRequest.of(page, size);
    }

} 


We also create a simple Thymeleaf template to display our users.

<!DOCTYPE html>
<html xmlns:th="http://www.w3.org/1999/xhtml" lang="en">
<head>
    <meta charset="UTF-8">
    <title>Use case</title>
</head>
<body>
<form action="/" method="GET">
    <label for="displayName">Display name : </label>
    <input type="text" id="displayName" name="displayName">
    <button type="submit">Search</button>
</form>
<table>
    <thead>
    <tr>
        <th>Display name</th>
        <th>First name</th>
        <th>Last name</th>
        <th>Email</th>
    </tr>
    </thead>
    <tbody>
    <tr th:each="user : ${users}">
        <td th:text="${user.displayName}"></td>
        <td th:text="${user.firstName}"></td>
        <td th:text="${user.lastName}"></td>
        <td th:text="${user.email}"></td>
    </tr>
    </tbody>
</table>
</body>
</html>

If we run this simple application and open http://localhost:8080 in the web browser and query one of the random user, eg. ‘James Williams’, we can see that the page load time lies between 100 – 200 milliseconds. We will compare this page load time to the load time after adding our performance improvements to see the actual result of the improvement on the user experience.

starting page load times

How to identify them

A good indicator of bottlenecks in a data model are queries that are performed very often and appear to be slower than necessary. There are a number of tools and methods that can be used to identify ‘slow queries’. I will briefly describe two ways of identifying slow queries: using JetProfiler and using the explain plan.

Tools like JetProfiler give you an idea of which queries are performing poorly by measuring the real-time load on a database. As seen in the screenshot below a sample is taken from the performed queries on a database and the retrieved queries are given a rating based on how impactful they are on the database.
JetProfiler rating

JetProfiler also provides an explanation on why a query is impactful on the database. An example of this is shown in the screenshot below.

JetProfiler explanation

Another way of getting an understanding on how impactful a query can be on the database is to retrieve an explain plan for the given query. This will give you some information on what kind of search operation the database will have to perform in order to return the correct result of the given query. An explain plan is acquired like this:

explain select * from users where display_name = 'James Wilson' 

explain plan

The returned result contains the following attributes:

  • Select type: The type of select that was used to perform this query
  • Table: The table that was used to get the information
  • Type: The type of join that was used
  • Possible keys: All indexes that were available to choose from while performing this query
  • Key: The index that was chosen to use while performing this query
  • Key length: The length of the chosen key
  • Ref: Which columns or constants were used to compare against the chosen index given in the ‘key’ return value
  • Rows: The amount of rows that were believed to be looked at in order to return the result while performing the query(lower is better)
  • Extra: Extra information on how the given query was performed.


Both the result of the JetProfiler explanation and the explain statement come to the same conclusion: fetching the users with a certain display name requires the database to look over all the inserted records in order to determine which records match the requirements of the query. This is not a very efficient way of performing a query, and checking all users to see if one of them matches a given string will become problem if we would have millions of users in our system, especially if this query is performed by many users at the same time.

How to add them

Now that we have identified a slow query in our application, we can try to optimize this query for the use cases it’s intended for. To optimize this query we will add a simple index on the user table for the display_name field. We can add an index in a couple of different ways:
- With a create index statement like the one displayed below:

Create index ix_display_name on users(display_name) 

- Or using JPA/Hibernate with some Java persistence annotations. By filling in the indexes attribute on the table annotation we can create an index by its name and a list of columns that need to be indexed.

@Table(name = "users", indexes = @Index(name = "ix_display_name", columnList = "display_name"))

When we add an index on the users table for the display name column, we can see that our index is now being used. We can see that the amount of rows that are checked with the same query goes down drastically.

 A similar result is shown in JetProfiler when we record the given queries again:

JetProfiler Explanation

 If we now load the page on our browser again and search for the same random user, we can see that the page load speed is now between 30 – 50 milliseconds, a significant improvement.

better page load times

Conclusion

So that’s two different ways of identifying slow queries in an application: quickly find the slowest queries in your application with a tool like JetProfiler, or find out how much impact a specific query has on the database using the explain plan. I also showed and tested a rather simple solution to improve the performance of this specific query by adding a use case specific index, but that is obviously very dependent on the case.

Verwante Artikels