9.6.2021
HowTo Query Optimization
Vorgehen:
- Langsame Query identifizieren
- Query in plain SQL übersetzen (bzw. übersetzen lassen)
- Hibernate - SQL Logging aktivieren
- aus der
select query from pg_stat_activity
Tabelle
- Ausführungsplan der Datenbank erfragen
EXPLAIN <insert sql query here>
- Plan analysieren
Beispiel:
Limit (cost=38737.32..38737.55 rows=91 width=456)
-> Sort (cost=38737.32..38737.55 rows=91 width=456)
Sort Key: consultant0_.partner
-> HashAggregate (cost=38733.45..38734.36 rows=91 width=456)
Group Key: consultant0_.partner, consultant0_.id, consultant0_.base_division, consultant0_.celum_asset_id, consultant0_.celum_signature_id, consultant0_.city, consultant0_.country, consultant0_.created_at, consultant0_.fax_number, consultant0_.images_requested_at, consultant0_.language, consultant0_.last_modified_at, consultant0_.name_first, consultant0_.name_last, consultant0_.partner_url, consultant0_.photo_image_url, consultant0_.post_code, consultant0_.salutation, consultant0_.signature_image_url, consultant0_.smtp_addr, consultant0_.street_housenum, consultant0_.tel_number, consultant0_.tel_number_mob, consultant0_.tenant, consultant0_.title
-> Hash Right Join (cost=74.76..24724.33 rows=224146 width=456)
Hash Cond: (responsibi1_.consultant_id = consultant0_.id)
-> Seq Scan on responsibility responsibi1_ (cost=0.00..22747.10 rows=714310 width=32)
-> Hash (cost=73.62..73.62 rows=91 width=456)
-> Seq Scan on consultant consultant0_ (cost=0.00..73.62 rows=91 width=456)
Filter: (tenant = 'DE'::text)
Was ist das teure an dieser Query?
Optimized version
Limit (cost=399.40..403.65 rows=68 width=456)
-> Unique (cost=399.40..403.65 rows=68 width=456)
-> Sort (cost=399.40..399.57 rows=68 width=456)
Sort Key: consultant0_.id, consultant0_.base_division, consultant0_.celum_asset_id, consultant0_.celum_signature_id, consultant0_.city, consultant0_.country, consultant0_.created_at, consultant0_.fax_number, consultant0_.images_requested_at, consultant0_.language, consultant0_.last_modified_at, consultant0_.name_first, consultant0_.name_last, consultant0_.partner, consultant0_.partner_url, consultant0_.photo_image_url, consultant0_.post_code, consultant0_.salutation, consultant0_.signature_image_url, consultant0_.smtp_addr, consultant0_.street_housenum, consultant0_.tel_number, consultant0_.tel_number_mob, consultant0_.title
-> Nested Loop Semi Join (cost=0.42..397.33 rows=68 width=456)
-> Seq Scan on consultant consultant0_ (cost=0.00..73.62 rows=91 width=456)
Filter: (tenant = 'DE'::text)
-> Index Scan using idx_responsibilty_consultant_id on responsibility r (cost=0.42..799.83 rows=447 width=16)
Index Cond: (consultant_id = consultant0_.id)
Filter: (((consultant_function)::text = 'MANAGER'::text) AND (variety = 'CORN'::variety_type))
Durch die Optimierung konnte die Laufzeit der Query von ~2s auf ~350ms reduziert werden
Things to consider:
- There is no "general" solution for all performance problems
- Always analyse plans on the productive system (statistics, plans, etc might differ based on the actual data)
- Full table scans are most of the time expensive
- The
cost
of a query does not always indicate a long runtime, it's more like an indicator that this query might take long - Find the part of the query which increases the cost the most (in our example, the
-> Hash Right Join (cost=74.76..24724.33 rows=224146 width=456)
) - Maybe there is the option to reduce the cost by introducing an index?
- Ask @sijakubo.newcubator
Standort Hannover
newcubator GmbH
Bödekerstraße 22
30161 Hannover
Standort Dortmund
newcubator GmbH
Westenhellweg 85-89
44137 Dortmund