9.6.2021

HowTo Query Optimization

Vorgehen:

  • Langsame Query identifizieren
  • Query in plain SQL übersetzen (bzw. übersetzen lassen)
    • Hibernate - SQL Logging aktivieren
    • aus der select query from pg_stat_activity Tabelle
  • Ausführungsplan der Datenbank erfragen
EXPLAIN <insert sql query here>
  • Plan analysieren

Beispiel:

Limit  (cost=38737.32..38737.55 rows=91 width=456)
  ->  Sort  (cost=38737.32..38737.55 rows=91 width=456)
        Sort Key: consultant0_.partner
        ->  HashAggregate  (cost=38733.45..38734.36 rows=91 width=456)
              Group Key: consultant0_.partner, consultant0_.id, consultant0_.base_division, consultant0_.celum_asset_id, consultant0_.celum_signature_id, consultant0_.city, consultant0_.country, consultant0_.created_at, consultant0_.fax_number, consultant0_.images_requested_at, consultant0_.language, consultant0_.last_modified_at, consultant0_.name_first, consultant0_.name_last, consultant0_.partner_url, consultant0_.photo_image_url, consultant0_.post_code, consultant0_.salutation, consultant0_.signature_image_url, consultant0_.smtp_addr, consultant0_.street_housenum, consultant0_.tel_number, consultant0_.tel_number_mob, consultant0_.tenant, consultant0_.title
              ->  Hash Right Join  (cost=74.76..24724.33 rows=224146 width=456)
                    Hash Cond: (responsibi1_.consultant_id = consultant0_.id)
                    ->  Seq Scan on responsibility responsibi1_  (cost=0.00..22747.10 rows=714310 width=32)
                    ->  Hash  (cost=73.62..73.62 rows=91 width=456)
                          ->  Seq Scan on consultant consultant0_  (cost=0.00..73.62 rows=91 width=456)
                                Filter: (tenant = 'DE'::text)

Was ist das teure an dieser Query?

Optimized version
Limit  (cost=399.40..403.65 rows=68 width=456)
  ->  Unique  (cost=399.40..403.65 rows=68 width=456)
        ->  Sort  (cost=399.40..399.57 rows=68 width=456)
              Sort Key: consultant0_.id, consultant0_.base_division, consultant0_.celum_asset_id, consultant0_.celum_signature_id, consultant0_.city, consultant0_.country, consultant0_.created_at, consultant0_.fax_number, consultant0_.images_requested_at, consultant0_.language, consultant0_.last_modified_at, consultant0_.name_first, consultant0_.name_last, consultant0_.partner, consultant0_.partner_url, consultant0_.photo_image_url, consultant0_.post_code, consultant0_.salutation, consultant0_.signature_image_url, consultant0_.smtp_addr, consultant0_.street_housenum, consultant0_.tel_number, consultant0_.tel_number_mob, consultant0_.title
              ->  Nested Loop Semi Join  (cost=0.42..397.33 rows=68 width=456)
                    ->  Seq Scan on consultant consultant0_  (cost=0.00..73.62 rows=91 width=456)
                          Filter: (tenant = 'DE'::text)
                    ->  Index Scan using idx_responsibilty_consultant_id on responsibility r  (cost=0.42..799.83 rows=447 width=16)
                          Index Cond: (consultant_id = consultant0_.id)
                          Filter: (((consultant_function)::text = 'MANAGER'::text) AND (variety = 'CORN'::variety_type))

Durch die Optimierung konnte die Laufzeit der Query von ~2s auf ~350ms reduziert werden

Things to consider:

  • There is no "general" solution for all performance problems
  • Always analyse plans on the productive system (statistics, plans, etc might differ based on the actual data)
  • Full table scans are most of the time expensive
  • The cost of a query does not always indicate a long runtime, it's more like an indicator that this query might take long
  • Find the part of the query which increases the cost the most (in our example, the -> Hash Right Join (cost=74.76..24724.33 rows=224146 width=456))
  • Maybe there is the option to reduce the cost by introducing an index?
  • Ask @sijakubo.newcubator
Simon

Softwareentwickler

Zur Übersicht

Standort Hannover

newcubator GmbH
Bödekerstraße 22
30161 Hannover

Standort Dortmund

newcubator GmbH
Westenhellweg 85-89
44137 Dortmund