Datadog Beginner’s Guide: Monitoring vs Observability Explained with Step-by-Step Setup

Introduction

In recent years, IT systems have become increasingly complex due to the rise of microservices, containers, and cloud environments. In such environments, traditional “monitoring” alone is no longer sufficient to identify the root causes of incidents or detect early warning signs. This is where Observability comes into play.

Observability is an approach that enables deep understanding of complex systems by integrating metrics, logs, and traces. Among the various tools available, Datadog stands out as a leading cloud-based observability platform. It supports a wide range of targets—servers, cloud services, containers, and applications—and is popular for its ease of setup, even for beginners.

This article provides a beginner-friendly explanation of the differences between monitoring and observability, along with step-by-step guidance on how to set up and start using Datadog. If you’re planning to introduce Datadog into your home lab or production system, this guide will serve as a helpful reference.

What Is Observability?

Observability is a concept that enables a deeper understanding of a system’s internal state by collecting and analyzing diverse types of data in an integrated manner.
By clarifying the differences between observability and monitoring, we can better grasp its true value and purpose.

Monitoring

Monitoring is a mechanism for collecting data to understand what is happening within a running system.
Its primary purpose is problem detection — it gathers and visualizes predefined metrics and logs based on known issues or expected scenarios, and sends alerts when certain conditions are met.

Typical monitoring features in Datadog include:

  • Infrastructure Monitoring
    Tracks CPU, memory, disk, and network usage; monitors system availability and running processes.
  • Alerts
    Sends notifications when thresholds are exceeded (e.g., send an email alert when CPU usage exceeds 90%).
  • Dashboards
    Visualizes metrics in real time for continuous system insight.

Observability

Observability, on the other hand, is an approach that helps you understand not only “what is happening” in your system but also “why it is happening” by leveraging the data that has been collected.

For example, when CPU usage exceeds a threshold, traditional monitoring can tell you that usage has spiked—but not why. With observability, you can go further and determine:

  • Which application process is consuming CPU resources
  • Which SQL queries are occupying the system
  • Which logs are correlated with the event

This enables much faster and more accurate root cause analysis.

Observability is built upon three key pillars:

  • Metrics
    Quantitative, time-series data that tracks the state of your environment — such as CPU or memory usage, response time, and request counts.
  • Logs
    Records of events output by the system, such as access logs, error logs, and security logs. Logs help identify what happened and when, with timestamps playing a critical role.
  • Traces
    Data that tracks how an application processes requests over time. Traces visualize end-to-end request flows across microservices or multiple components, revealing where and what is happening within the system.

Comparison of Leading Observability Platforms

Datadog is one of the most well-known observability platforms, traditionally recognized as part of the “big three” alongside Dynatrace and New Relic.
In recent years, however, Grafana Labs has also gained significant recognition and is becoming an increasingly prominent player in the observability space.

Reference: 2025 Gartner® Magic Quadrant™ for Observability Platforms

Datadog、Dynatrace、New Relic、Grafana Labsの4製品を比較した内容は、次の表のとおりです。

CategoryDatadogDynatraceNew RelicGrafana Labs
OverviewRapidly growing SaaS-native platformplatform
Full-stack APM for enterprises
enterprises
Long-established APM pioneer
Visualization leader originating from open-source
Deployment Model– SaaS only– SaaS
– On-Prem (Dynatrace Managed)
– SaaS only・OSS (free)
SaaS (Grafana Cloud)
Enterprise
Strengths– Easy to deploy
– Excellent cloud integration
– Rich dashboards
– Automatic discovery and dependency mapping
– Proven performance in large-scale systems
– High reliability
– Simple, unified UI
– Usage-based pricing suited for small deployments
– Strong for Kubernetes and microservices monitoring
– Backed by OSS community
AI / Automation– Simple alert optimization
– Log pattern analysis
– Simple alert optimization
Log pattern analysis
– Limited automation
– Weak automation
– Extensibility depends on user configuration
Ease of ImplementationVery easy (visualization available immediately after agent setup)Complex (requires design, PoC, and vendor assistance)Intuitive UI and easy onboarding– OSS requires expertise to set up
– SaaS is relatively easy
Target UsersStartups to mid-sized cloud businessesLarge enterprises in finance, telecom, etc.SaaS providers, small to mid-sized companiesOSS-oriented engineering teams and Kubernetes users
Pricing ModelUsage-based (by data volume)Per-host plus feature-based licensingUsage-based (by users and data volume)OSS = Free
SaaS/Enterprise = Subscription
UI/UX– Highly visual and user-friendly
– Rich dashboards
– Information-dense and complex but powerful– Simple and beginner-friendly– Highly customizable but requires design skills
Market PerceptionRepresentative of cloud-native platforms, rapidly expandingHighly trusted in enterprise marketsEstablished APM vendor, but growth has slowedRapidly emerging from OSS roots, seen as a potential “fourth major player”

How to Set Up and Use Datadog

From here, we’ll explain how to install and start using Datadog.
Datadog offers a 14-day free trial, so it’s easy to get started and explore its features firsthand.
You can sign up for the free trial https://www.datadoghq.com/free-datadog-trial

Infrastructure Monitoring

Let’s start with the basics — infrastructure monitoring.
You can visualize key metrics such as CPU usage and memory utilization.
Log in to the Datadog console and follow the steps below.

Install the Agent

To enable infrastructure monitoring, you need to install the Datadog Agent on the target server.
From the Datadog console, navigate to Integration > Install Agents and follow the installation instructions.

Select the platform on which you want to install the Agent.

Although it’s not directly related to infrastructure monitoring, enable Application Performance Monitoring (APM) as it will be needed later.

Click Select API Key.

Choose your API Key.
Then click Use API Key.

Note: The API key is the authentication credential used to send data to Datadog’s cloud service or to perform operations via the API.

A command that includes the embedded API key will be displayed, which you can execute on the server where you want to install the Agent.

Run the following command to install the Agent.

[root@appserver ~]# DD_API_KEY=52bb3bd609f81c1ced94d58138b4f892 \
DD_SITE="ap1.datadoghq.com" \
DD_APM_INSTRUMENTATION_ENABLED=host \
DD_APM_INSTRUMENTATION_LIBRARIES=java:1,python:3,js:5,php:1,dotnet:3,ruby:2 \
bash -c "$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 82305  100 82305    0     0   604k      0 --:--:-- --:--:-- --:--:--  604k

* Datadog Agent 7 install script v1.40.0

/usr/bin/systemctl

* Installing YUM sources for Datadog

Cache was expired
43 files removed

  Installing package(s): datadog-agent

Last metadata expiration check: 0:01:54 ago on Fri 19 Sep 2025 06:41:35 AM JST.
Dependencies resolved.
================================================================================
 Package               Architecture   Version             Repository       Size
================================================================================
Installing:
 datadog-agent         x86_64         1:7.70.2-1          datadog         154 M

Transaction Summary
================================================================================
Install  1 Package

Total download size: 154 M
Installed size: 154 M
Downloading Packages:
datadog-agent-7.70.2-1.x86_64.rpm                18 MB/s | 154 MB     00:08
--------------------------------------------------------------------------------
Total                                            18 MB/s | 154 MB     00:08
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                        1/1
  Running scriptlet: datadog-agent-1:7.70.2-1.x86_64                        1/1
Failed to stop datadog-agent-process.service: Unit datadog-agent-process.service not loaded.
Failed to stop datadog-agent-sysprobe.service: Unit datadog-agent-sysprobe.service not loaded.
Failed to stop datadog-agent-trace.service: Unit datadog-agent-trace.service not loaded.
Failed to stop datadog-agent-security.service: Unit datadog-agent-security.service not loaded.
Failed to stop datadog-agent.service: Unit datadog-agent.service not loaded.
Failed to disable unit: Unit file datadog-agent-process.service does not exist.
Failed to disable unit: Unit file datadog-agent-sysprobe.service does not exist.
Failed to disable unit: Unit file datadog-agent-trace.service does not exist.
Failed to disable unit: Unit file datadog-agent-security.service does not exist.
Failed to disable unit: Unit file datadog-agent.service does not exist.

  Installing       : datadog-agent-1:7.70.2-1.x86_64                        1/1
  Running scriptlet: datadog-agent-1:7.70.2-1.x86_64                        1/1
Creating file: '/opt/datadog-agent/.post_python_installed_packages.txt'
File '/opt/datadog-agent/.diff_python_installed_packages.txt' not found.

  Verifying        : datadog-agent-1:7.70.2-1.x86_64                        1/1

Installed:
  datadog-agent-1:7.70.2-1.x86_64

Complete!

* Adding your API key to the Datadog Agent configuration: /etc/datadog-agent/datadog.yaml


* Setting SITE in the Datadog Agent configuration: /etc/datadog-agent/datadog.yaml

/usr/bin/systemctl
* Starting the Datadog Agent...

  Your Datadog Agent is running and functioning properly.
  It will continue to run in the background and submit metrics to Datadog.
  If you ever want to stop the Datadog Agent, run:

       systemctl stop datadog-agent

  And to run it again run:

       systemctl start datadog-agent

[root@appserver ~]#

If the Agent’s status is shown as active (running) as below, the installation was successful.

[root@appserver jmx.d]# systemctl status datadog-agent
 datadog-agent.service - Datadog Agent
     Loaded: loaded (/usr/lib/systemd/system/datadog-agent.service; enabled; preset: disabled)
     Active: active (running) since Sun 2025-09-21 06:32:15 JST; 5s ago
   Main PID: 3897554 (agent)
      Tasks: 9 (limit: 23168)
     Memory: 33.2M
        CPU: 4.844s
     CGroup: /system.slice/datadog-agent.service
             3897554 /opt/datadog-agent/bin/agent/agent run -p /opt/datadog-agent/run/agent.pid

Sep 21 06:32:19 appserver agent[3897554]: 2025-09-21 06:32:19 JST | CORE | INFO | (comp/forwarder/defaultforwa>
Sep 21 06:32:19 appserver agent[3897554]: 2025-09-21 06:32:19 JST | CORE | INFO | (comp/core/ipc/impl@v0.70.2/>
Sep 21 06:32:19 appserver agent[3897554]: 2025-09-21 06:32:19 JST | CORE | INFO | (comp/core/ipc/impl@v0.70.2/>
Sep 21 06:32:19 appserver agent[3897554]: 2025-09-21 06:32:19 JST | CORE | INFO | (pkg/process/metadata/worklo>
Sep 21 06:32:20 appserver agent[3897554]: 2025-09-21 06:32:20 JST | CORE | INFO | (comp/core/tagger/impl/tagge>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (comp/forwarder/defaultforwa>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (pkg/aggregator/demultiplexe>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (pkg/aggregator/demultiplexe>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (pkg/aggregator/time_sampler>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (pkg/runtime/runtime.go:28 i>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (comp/logs/agent/agentimpl/a>
lines 1-21/21 (END)

You should be able to confirm that the Agent has been installed, as shown below.

Next, let’s check the dashboard.

Select Host Metrics.

If the metrics for the server where the Agent was installed are displayed as shown below, the setup was successful.
You can check various resource information such as CPU, memory, disk, and network.

Collecting Metrics from Various Middleware

This section explains the steps to collect metrics from various middleware.

Collecting PostgreSQL Metrics

Let’s go through the steps to collect metrics from PostgreSQL as an example.
Click Installations > Installations.

Click + ADD for the target middleware.

Click Install Integration.

Click Postgres – Metrics.

The PostgreSQL metrics should be displayed as shown below.

Collecting JVM Metrics

Next, let’s check the JVM metrics. These are essential indicators for the stable operation of Java applications, so it is recommended to configure them as part of your monitoring.

First, install the Java Agent in the same way as you did for PostgreSQL above.

Next, enable JMX (Java Management Extensions) on the Tomcat side. The procedure is described in a separate article, so please refer to the following:
How to Enable JMX in Tomcat: SSL Setup, Authentication, and Monitoring Tool Integration

Then, modify the Datadog Agent configuration. A sample conf.yaml file is provided below for your reference.

[root@appserver ~]# vi /etc/datadog-agent/conf.d/jmx.d/conf.yaml
[root@appserver ~]# cat /etc/datadog-agent/conf.d/jmx.d/conf.yaml
instances:

  - host: localhost
    port: 9010  # Tomcat JMX port
    name: tomcat
    collect_default_metrics: false

    conf:
      # Full GC (Old Generation)
      - include:
          domain: java.lang
          bean: type=GarbageCollector,name=G1\ Old\ Generation
          attribute:
            - CollectionCount
            - CollectionTime
          metric_prefix: jvm.gc.old

      # Young GC (Eden)
      - include:
          domain: java.lang
          bean: type=GarbageCollector,name=G1\ Young\ Generation
          attribute:
            - CollectionCount
            - CollectionTime
          metric_prefix: jvm.gc.new

      # Heap New (Eden space)
      - include:
          domain: java.lang
          bean: type=MemoryPool,name=G1\ Eden\ Space
          attribute:
            - Usage.used
            - Usage.max
          metric_prefix: jvm.memory.new

      # Heap Old
      - include:
          domain: java.lang
          bean: type=MemoryPool,name=G1\ Old\ Gen
          attribute:
            - Usage.used
            - Usage.max
          metric_prefix: jvm.memory.old

      # Metaspace (Java8+)
      - include:
          domain: java.lang
          bean: type=MemoryPool,name=Metaspace
          attribute:
            - Usage.used
            - Usage.max
          metric_prefix: jvm.memory.metaspace

      # Thread metrics
      - include:
          domain: java.lang
          bean: type=Threading
          attribute:
            - ThreadCount
            - PeakThreadCount
            - DaemonThreadCount
          metric_prefix: jvm.threads

      # Class loading metrics
      - include:
          domain: java.lang
          bean: type=ClassLoading
          attribute:
            - LoadedClassCount
            - UnloadedClassCount
          metric_prefix: jvm.classes

      # CPU usage metrics
      - include:
          domain: java.lang
          bean: type=OperatingSystem
          attribute:
            - ProcessCpuLoad
            - SystemCpuLoad
          metric_prefix: jvm.cpu

      # Tomcat connector threads
      - include:
          domain: Catalina
          bean: type=ThreadPool,name=http-nio-8080
          attribute:
            - currentThreadCount
            - currentThreadsBusy
            - maxThreads
          metric_prefix: tomcat.threads

      # Tomcat session metrics
      - include:
          domain: Catalina
          bean: type=Manager,context=/,host=localhost
          attribute:
            - activeSessions
            - sessionCounter
          metric_prefix: tomcat.sessions

Restart the Datadog Agent.

[root@appserver ~]# systemctl restart datadog-agent
[root@appserver ~]# systemctl status datadog-agent
 datadog-agent.service - Datadog Agent
     Loaded: loaded (/usr/lib/systemd/system/datadog-agent.service; enabled; preset: disabled)
     Active: active (running) since Sun 2025-09-21 06:32:15 JST; 5s ago
   Main PID: 3897554 (agent)
      Tasks: 9 (limit: 23168)
     Memory: 33.2M
        CPU: 4.844s
     CGroup: /system.slice/datadog-agent.service
             3897554 /opt/datadog-agent/bin/agent/agent run -p /opt/datadog-agent/run/agent.pid

Sep 21 06:32:19 appserver agent[3897554]: 2025-09-21 06:32:19 JST | CORE | INFO | (comp/forwarder/defaultforwa>
Sep 21 06:32:19 appserver agent[3897554]: 2025-09-21 06:32:19 JST | CORE | INFO | (comp/core/ipc/impl@v0.70.2/>
Sep 21 06:32:19 appserver agent[3897554]: 2025-09-21 06:32:19 JST | CORE | INFO | (comp/core/ipc/impl@v0.70.2/>
Sep 21 06:32:19 appserver agent[3897554]: 2025-09-21 06:32:19 JST | CORE | INFO | (pkg/process/metadata/worklo>
Sep 21 06:32:20 appserver agent[3897554]: 2025-09-21 06:32:20 JST | CORE | INFO | (comp/core/tagger/impl/tagge>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (comp/forwarder/defaultforwa>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (pkg/aggregator/demultiplexe>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (pkg/aggregator/demultiplexe>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (pkg/aggregator/time_sampler>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (pkg/runtime/runtime.go:28 i>
Sep 21 06:32:21 appserver agent[3897554]: 2025-09-21 06:32:21 JST | CORE | INFO | (comp/logs/agent/agentimpl/a>
lines 1-21/21 (END)
[root@appserver ~]#

Click JMX Metrics from the dashboard.

If the various JVM-related resources are displayed as shown below, the setup was successful.

Log Monitoring

Agent Configuration

Let’s enable log monitoring.
First, in the Agent configuration, set the logs_enabled parameter to true to activate log monitoring.

[root@appserver ~]# vi /etc/datadog-agent/datadog.yaml
[root@appserver ~]# cat /etc/datadog-agent/datadog.yaml
#########################
## Basic Configuration ##
#########################

## @param api_key - string - required
## @env DD_API_KEY - string - required
## The Datadog API key used by your Agent to submit metrics and events to Datadog.
## Create a new API key here: https://app.datadoghq.com/organization-settings/api-keys .
## Read more about API keys here: https://docs.datadoghq.com/account_management/api-app-keys/#api-keys .
api_key: 52bb3bd609fXXXXXXXXXXXXXXXXXXXX

~Output Truncated~

##################################
## Log collection Configuration ##
##################################

## @param logs_enabled - boolean - optional - default: false
## @env DD_LOGS_ENABLED - boolean - optional - default: false
## Enable Datadog Agent log collection by setting logs_enabled to true.
#
logs_enabled: true

~Output Truncated~

Next, as an example, let’s add configurations to monitor Apache access logs and error logs.

[root@appserver ~]# vi /etc/datadog-agent/conf.d/apache.d/conf.yaml
[root@appserver ~]# cat /etc/datadog-agent/conf.d/apache.d/conf.yaml
logs:
  # Apache access logs
  - type: file
    path: /var/log/httpd/access_log
    service: apache
    source: apache
    sourcecategory: http_access

  # Apache error logs
  - type: file
    path: /var/log/httpd/error_log
    service: apache
    source: apache
    sourcecategory: http_error
    # Optional: only collect WARN, ERROR, CRIT, ALERT, EMERG
    log_processing_rules:
      - type: include_at_match
        name: only_errors
        pattern: "(warn|error|crit|alert|emerg)"

[root@appserver ~]#

Next, grant permissions so that the Datadog Agent can access the above logs.

[root@appserver ~]# setfacl -m u:dd-agent:x /var/log/httpd
[root@appserver ~]# setfacl -m u:dd-agent:r /var/log/httpd/access_log
[root@appserver ~]# setfacl -m u:dd-agent:r /var/log/httpd/error_log
[root@appserver ~]# sudo -u dd-agent tail -n 3 /var/log/httpd/error_log

Check whether the dd-agent user can access the logs.

[root@appserver ~]# sudo -u dd-agent tail -n 3 /var/log/httpd/access_log
147.185.132.103 - - [22/Sep/2025:06:24:17 +0900] "\x16\x03\x01" 400 226 "-" "-"
45.135.193.100 - - [22/Sep/2025:06:27:05 +0900] "GET / HTTP/1.1" 302 - "-" "-"
91.224.92.34 - - [22/Sep/2025:06:55:10 +0900] "\x16\x03\x01\x05\xa8\x01" 400 226 "-" "-"
[root@appserver ~]#
[root@appserver ~]# sudo -u dd-agent tail -n 3 /var/log/httpd/error_log
[Sun Sep 21 05:51:18.192859 2025] [proxy_http:error] [pid 3803544:tid 3803707] [client 106.72.183.131:5181] AH01114: HTTP: failed to make connection to backend: localhost
[Sun Sep 21 05:51:18.500838 2025] [proxy:error] [pid 3806244:tid 3806274] (111)Connection refused: AH00957: http: attempt to connect to 127.0.0.1:8080 (localhost:8080) failed
[Sun Sep 21 05:51:18.501112 2025] [proxy_http:error] [pid 3806244:tid 3806274] [client 106.72.183.131:25662] AH01114: HTTP: failed to make connection to backend: localhost, referer: https://quiz.eeengineer.com/quizapp
[root@appserver ~]#

After completing the above configuration changes, restart the Datadog Agent.
This completes the setup.

Collecting Information via Log Monitoring

Next, let’s check the information that can be collected through log monitoring from the Datadog console.
Click Logs > Explorer.

Click Get Started.

Click Get Started.

Access Apache to generate logs. If the logs appear in the Datadog console as shown below, the setup was successful.

Click the log you want to view in detail.
Then, click Metrics.

The dashboard displays various metrics such as CPU usage, memory usage, and disk I/O.
This allows you to check the server resources and system performance during the time periods when logs were generated.

APM(Application Performance Monitoring)

Agent Configuration

Let’s enable the APM (Application Performance Management) feature, which is a key aspect of observability products.
First, enable the APM feature in the Datadog configuration file by setting the enabled parameter under apm_config to true.

[root@appserver ~]# vi /etc/datadog-agent/datadog.yaml
[root@appserver ~]# cat /etc/datadog-agent/datadog.yaml
~Output Truncated~

####################################
## Trace Collection Configuration ##
####################################

## @param apm_config - custom object - optional
## Enter specific configurations for your trace collection.
## Uncomment this parameter and the one below to enable them.
## See https://docs.datadoghq.com/agent/apm/
#
apm_config:

  ## @param enabled - boolean - optional - default: true
  ## @env DD_APM_ENABLED - boolean - optional - default: true
  ## Set to true to enable the APM Agent.
  #
  enabled: true

  ## @param env - string - optional - default: none
  ## @env DD_APM_ENV - string - optional - default: none
  ## The environment tag that Traces should be tagged with.
  ## If not set the value will be inherited, in order, from the top level

~Output Truncated~

[root@appserver ~]# systemctl restart datadog-agent
[root@appserver ~]#

Next, deploy the dd-java-agent, the Java tracing agent provided by Datadog.

[root@appserver ~]# DD_AGENT_VERSION=latest
curl -L https://dtdg.co/latest-java-tracer -o /etc/datadog-agent/dd-java-agent.jar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   145  100   145    0     0    814      0 --:--:-- --:--:-- --:--:--   814
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 31.2M  100 31.2M    0     0  15.4M      0  0:00:02  0:00:02 --:--:-- 23.8M
[root@appserver ~]#
[root@appserver ~]# ll /etc/datadog-agent/dd-java-agent.jar
-rw-r--r-- 1 root root 32729838 Sep 23 22:23 /etc/datadog-agent/dd-java-agent.jar
[root@appserver ~]#

To start this agent when Tomcat launches, add the following configuration to setenv.sh.

[root@appserver ~]# vi /opt/tomcat/bin/setenv.sh
[root@appserver ~]# cat /opt/tomcat/bin/setenv.sh
~Output Truncated~

# Datadog APM Java Tracer settings
CATALINA_OPTS="$CATALINA_OPTS \
-javaagent:/etc/datadog-agent/dd-java-agent.jar \
-Ddd.service=spring-boot-app \
-Ddd.env=production \
-Ddd.version=1.0.0 \
-Ddd.agent.host=localhost \
-Ddd.trace.enabled=true"
[root@appserver ~]#
[root@appserver ~]# systemctl restart tomcat
[root@appserver ~]# 

This completes the configuration.

Collecting Information via APM

Now, let’s check the items that can be collected via APM from the Datadog console.
Click APM > Services.

Select the application for which you want to collect information.

Click Performance.
As shown below, you should be able to view performance information such as the number of requests, errors, and latency.

Click Relationships.
This visualizes the relationships between various components, which is very useful.

Now, let’s take a look at Traces, the core feature of APM.
Click APM > Traces.

As shown below, you can view summary information for each request. Select one trace to see the details.

You can check detailed information such as which internal process is taking time, where SQL queries are being executed, and how long they take.

Click Metrics. You can view various metrics at the time the process was executed. For example, you can immediately see whether garbage collection occurred at the same time as a delayed response.

By clicking SQL Queries, you can view the specific details of the SQL operations.

Database Monitoring

This section explains how to enable database monitoring, which is part of the APM functionality, for a PostgreSQL database.

Adding Configuration to Collect Statistics

By specifying pg_stat_statements in shared_preload_libraries in postgresql.conf, you can collect SQL execution statistics for PostgreSQL.
After adding this configuration, restart PostgreSQL.

[root@appserver ~]# vi /var/lib/pgsql/data/postgresql.conf
[root@appserver ~]# cat /var/lib/pgsql/data/postgresql.conf
...omitted...
# Collecting performance statistics setting
shared_preload_libraries = 'pg_stat_statements'

# Enable collection of execution statistics
track_activities = on

# Track statistics for tables and indexes
track_counts = on

# Record timing information for I/O operations
track_io_timing = on

# Log sample queries (useful for detecting slow queries)
log_min_duration_statement = 1000   # Log SQL statements that take longer than 1 second
[root@appserver ~]#
[root@appserver ~]# systemctl restart postgresql
[root@appserver ~]#

Creating a User for PostgreSQL Monitoring

To monitor PostgreSQL performance and query activity with Datadog, you need to create a dedicated monitoring user and grant the necessary permissions.
First, log in to the PostgreSQL database and create a user for Datadog PostgreSQL monitoring.

postgres=# create user datadog with password 'yourpassword';
CREATE ROLE
postgres=#

Create a dedicated schema datadog for Datadog.

postgres=# CREATE SCHEMA datadog;
CREATE SCHEMA
postgres=#

Grant permissions so that the monitoring user datadog can access objects within the schema.

postgres=# GRANT USAGE ON SCHEMA datadog TO datadog;
GRANT
postgres=#

Grant the role for viewing statistical information.

postgres=# GRANT pg_monitor TO datadog;
GRANT
postgres=#

Enable the extension for query performance analysis.

postgres=# CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION
postgres=#

Create a function that allows the Datadog user to retrieve the execution plan of a specified SQL statement in JSON format.

postgres=# CREATE OR REPLACE FUNCTION datadog.explain_statement(
   l_query TEXT,
   OUT explain JSON
)
RETURNS SETOF JSON AS
$$
DECLARE
curs REFCURSOR;
plan JSON;

BEGIN
   OPEN curs FOR EXECUTE pg_catalog.concat('EXPLAIN (FORMAT JSON) ', l_query);
   FETCH curs INTO plan;
   CLOSE curs;
   RETURN QUERY SELECT plan;
END;
$$
LANGUAGE 'plpgsql'
RETURNS NULL ON NULL INPUT
SECURITY DEFINER;
CREATE FUNCTION
quiz=#

This completes the setup.
You can now verify that statistical information can be collected. If records can be retrieved as shown below, the setup was successful.

postgres-# \q
[postgres@quiz ~]$
[postgres@quiz ~]$ psql -h localhost -U datadog -d postgres -A -c "select * from pg_stat_database limit 1;"
Password for user datadog:
datid|datname|numbackends|xact_commit|xact_rollback|blks_read|blks_hit|tup_returned|tup_fetched|tup_inserted|tup_updated|tup_deleted|conflicts|temp_files|temp_bytes|deadlocks|checksum_failures|checksum_last_failure|blk_read_time|blk_write_time|session_time|active_time|idle_in_transaction_time|sessions|sessions_abandoned|sessions_fatal|sessions_killed|stats_reset
0||0|0|0|72|530420|201511|118897|3|0|0|0|0|0|0|||0|0|0|0|0|0|0|0|0|
(1 row)
[postgres@quiz ~]$
[postgres@quiz ~]$ psql -h localhost -U datadog -d postgres -A -c "select * from pg_stat_activity limit 1;"
Password for user datadog:
datid|datname|pid|leader_pid|usesysid|usename|application_name|client_addr|client_hostname|client_port|backend_start|xact_start|query_start|state_change|wait_event_type|wait_event|state|backend_xid|backend_xmin|query_id|query|backend_type
||216421||||||||2025-10-19 06:30:17.392103+09||||Activity|AutoVacuumMain||||||autovacuum launcher
(1 row)
[postgres@quiz ~]$

Collecting Information via Database Monitoring

Now, let’s check the Database Monitoring screen from the Datadog console.
Click APM > Database Monitoring.

Select the host you want to monitor.

You should be able to view basic information such as query throughput and execution time.

You can also check information like Top Queries.

By selecting Query Metrics, you can view more detailed information.

Summary

In this article, we introduced the basics of observability and the concrete steps for implementing it using Datadog. The key points are summarized below:

Understanding Observability

  • Monitoring is a mechanism to understand what is happening.
  • Observability is a mechanism to analyze why it is happening.
  • By leveraging metrics, logs, and traces together, root cause analysis can be performed more quickly.

Datadog Installation and Basic Setup

  • Install the Datadog Agent on your servers.
  • Visualize infrastructure metrics such as CPU, memory, and disk usage.
  • Collect metrics from middleware and JVM as well.

Log Monitoring

  • Collect and visualize logs from Apache and other sources.
  • Combining logs with metrics provides a deeper understanding of system status.

APM (Application Performance Monitoring)

  • Deploy the dd-java-agent for Java applications.
  • Track processing time and SQL execution per request.
  • Useful for identifying bottlenecks and performance issues.

Database Monitoring

  • Collect performance statistics from PostgreSQL.
  • Create a dedicated Datadog user with proper permissions to visualize query activity and execution times.
  • Makes SQL performance analysis easier.

By leveraging Datadog, even complex systems can be visualized and analyzed smoothly, helping with early detection of issues and performance improvement. It is recommended to start with a free trial and gradually enable the features you need.

コメント