Transformed EFD¶
The Transformed EFD Dataset¶
Purpose and Context¶
This guide explains how to query the Transformed Engineering and Facilities Database (EFD) dataset. This dataset is part of the Vera C. Rubin Observatory’s Consolidated Database (ConsDB), which combines information from multiple sources into a single format. The Transformed EFD tables contain summarized EFD telemetry, providing context for scientific analysis, such as instrument health and environmental conditions during observations.
Data Sources and Time Windows¶
The data originates from the EFD, a time-series database that stores high-frequency telemetry from observatory systems. The data in these tables is not the raw stream but has been aggregated over two time windows defined by the LSST Butler system:
By Exposure: Data is summarized over the
timespanof a single raw exposure.By Visit: Data is summarized over the
timespanof a processed visit.
This link to Butler records ensures that every data point can be tied to a specific observational event.
Connecting to the Database¶
Note
Connection Methods: The connection process may vary depending on your infrastructure or access method (direct database connection, TAP services, web interfaces, etc.). For the sake of understanding the data structure and query patterns, this documentation uses PostgreSQL syntax examples throughout.
Identifying the Correct Schema¶
Data is organized into instrument-specific schemas to keep datasets separate and organized. This is important for writing correct queries, as you must specify the correct schema for your instrument.
Use the following mapping from instrument name to schema name when constructing queries:
LATISS:
efd_latissLSSTCam:
efd_lsstcamLSSTComCam:
efd_lsstcomcam
All queries must prepend the schema name to the table name. For example, to select data from the exposure_efd table for the LSSTCam instrument, the correct syntax is:
SELECT * FROM efd_lsstcam.exposure_efd;
Understanding the Data Configuration¶
The transformed EFD data structure is determined by configuration files that define which telemetry topics are processed and how they are transformed. All instruments use the same data model structure, but the specific columns and properties available depend on the instrument’s configuration.
Key points about the data configuration:
Consistent Structure: All instruments (LATISS, LSSTCam, LSSTComCam) use the same four table types with identical schemas
Instrument-Specific Content: The actual columns and properties available depend on the telemetry topics configured for each instrument
Configuration-Driven: The data you see in the database is determined by YAML configuration files that specify which EFD topics to process
Standardized Transformations: Common transformation functions (mean, max, min, standard deviation, etc.) are applied consistently across instruments
To understand what data is available for a specific instrument, you can examine the configuration files or query the database schema directly.
Understanding the Database Schema¶
The Four Core Tables¶
The processing pipeline populates four tables for each instrument schema, generated from a master configuration to ensure consistency. The four tables are:
exposure_efd: Contains pivoted (columnar) data, with one row per exposure.visit1_efd: Contains pivoted data, with one row per visit.exposure_efd_unpivoted: Contains unpivoted (key-value) data related to exposures.visit1_efd_unpivoted: Contains unpivoted (key-value) data related to visits.
The Pivoted vs. Unpivoted Data Model¶
The database uses both pivoted and unpivoted tables. This design is a practical choice to manage the large amount of telemetry data. The pivoted tables are optimized for ease of use and performance for common queries. The unpivoted tables provide a scalable solution for metrics that involve a large number of individual data points, such as readings from an array of sensors.
The unpivoted tables were created to handle cases where there were too many columns to include in a single table. Instead of creating hundreds of columns for every sensor, these metrics are stored in a key-value structure.
Pivoted Tables (
exposure_efd,visit1_efd): These tables are the default choice for most analysis and contain the most common metrics. Each metric has its own column (e.g.,mt_dome_temperature_mean). This structure is intuitive for writing queries.Note
Science Data Models:
Unpivoted Tables (
exposure_efd_unpivoted,visit1_efd_unpivoted): These tables contain data from a large array of similar sensors or any metric that generates too many values to be stored as separate columns, such as vibration sensor data or actuator forces. Data is stored in a key-value format using three main columns:property: Corresponds to the base name of the measurement (e.g.,mt_vibration_data_mean).field: Identifies the specific sensor or data point (e.g.,accelerationPSDX10).value: The measured value for that field and property.
This model is flexible for storing large sets of related data but requires more complex queries to analyze.
Core Identifying Columns¶
These columns serve as primary keys and are used to identify records and join data between tables.
Important
Indexed Columns:
Standard tables (
exposure_efd,visit1_efd): Onlyday_obsandseq_numare indexedUnpivoted tables (
exposure_efd_unpivoted,visit1_efd_unpivoted):day_obs,seq_num,exposure_id,property, andfieldare indexed
For optimal query performance, always use these indexed columns in your WHERE clauses and JOIN conditions.
day_obs(integer): The date of observation inYYYYMMDDformat (e.g.,20250806). Primary key column.seq_num(integer): The sequence number of the observation within that day. Primary key column.exposure_id(long): The unique Butler identifier for the exposure. Has a unique constraint in standard tables but is indexed in unpivoted tables.visit_id(long): The unique Butler identifier for the visit. Has a unique constraint in standard tables but is indexed in unpivoted tables.created_at(timestamp): The timestamp indicating when the row was generated. Not indexed.
Data Availability and Processing¶
Understanding when and how data becomes available is important for effective querying:
Processing Schedule: Data is processed in regular intervals, typically every few minutes
Data Latency: There may be a delay of 5-15 minutes between when observations occur and when transformed data becomes available
Missing Data: Some exposures or visits may not have transformed EFD data if: - The EFD telemetry was not available during the observation - The transformation process encountered errors - The observation occurred before the transformation service was active
Data Completeness: Not all telemetry topics may be available for every observation, depending on instrument configuration and operational status
To check data availability for a specific time period, you can query the count of records for your time range of interest using the indexed day_obs column.
Querying Transformed Data: Practical Examples¶
Basic Pivoted Queries¶
Example 1: Retrieve environmental data for a specific night
This query fetches environmental conditions for all LSSTCam exposures on a given night.
SELECT
day_obs,
seq_num,
exposure_id,
mt_dome_temperature_mean,
mt_salindex110_sonic_temperature_mean,
mt_salindex110_wind_speed_0_mean
FROM
efd_lsstcam.exposure_efd
WHERE
day_obs = 20250806
ORDER BY
seq_num
LIMIT 100;
Example 2: Find visits with good seeing conditions
This query finds all visits for the LSSTCam instrument where the DIMM seeing was better than 0.7 arcseconds.
SELECT
day_obs,
seq_num,
visit_id,
mt_dimm_fwhm_mean,
mt_dome_temperature_mean
FROM
efd_lsstcam.visit1_efd
WHERE
day_obs = 20250806
AND mt_dimm_fwhm_mean < 1.0
ORDER BY
seq_num
LIMIT 50;
Basic Unpivoted Queries¶
Querying unpivoted tables requires filtering on the property and field columns to isolate the desired metric. The property column corresponds to the name of a transformation, while the field column corresponds to a specific sensor or measurement source.
Example 1: Find a single unpivoted value
This query retrieves the mean vibration for a specific accelerometer sensor (accelerationPSDX10) during a given LSSTCam exposure.
SELECT
value
FROM
efd_lsstcam.exposure_efd_unpivoted
WHERE
day_obs = 20250806
AND seq_num = 55
AND property = 'mt_vibration_data_mean'
AND field = 'accelerationPSDX10'
LIMIT 1;
Example 2: Pivoting unpivoted data in a query
The following query retrieves multiple vibration sensor readings for a single exposure and displays them as if they were in separate columns, effectively “pivoting” the data within the SQL query itself.
SELECT
day_obs,
seq_num,
exposure_id,
MAX(CASE WHEN field = 'accelerationPSDX0' THEN value END) AS psdx0,
MAX(CASE WHEN field = 'accelerationPSDX1' THEN value END) AS psdx1,
MAX(CASE WHEN field = 'accelerationPSDY0' THEN value END) AS psdy0,
MAX(CASE WHEN field = 'accelerationPSDY1' THEN value END) AS psdy1
FROM
efd_lsstcam.exposure_efd_unpivoted
WHERE
day_obs = 20250806
AND seq_num = 55
AND property = 'mt_vibration_data_mean'
GROUP BY
day_obs, seq_num, exposure_id
LIMIT 1;
Joining Tables¶
Example 1: Combine standard metrics with specific sensor readings for a single exposure
This query joins exposure_efd with exposure_efd_unpivoted to show the dome temperature alongside specific vibration sensor readings for an LSSTCam exposure.
SELECT
piv.day_obs,
piv.seq_num,
piv.exposure_id,
piv.mt_dome_temperature_mean,
unpiv.field AS sensor_name,
unpiv.value AS vibration_psd
FROM
efd_lsstcam.exposure_efd AS piv
JOIN
efd_lsstcam.exposure_efd_unpivoted AS unpiv
ON piv.day_obs = unpiv.day_obs AND piv.seq_num = unpiv.seq_num
WHERE
piv.day_obs = 20250806
AND piv.seq_num = 55
AND unpiv.property = 'mt_vibration_data_mean'
LIMIT 10;
Example 2: Combine and Pivot Data for a Specific Day
This query combines data for all exposures on a specific day and uses conditional aggregation to pivot the unpivoted sensor data. The MAX(CASE WHEN... END) expressions transform the key-value pairs from the unpivoted table into distinct columns, creating a “wide” table format that is easier to analyze.
SELECT
piv.day_obs,
piv.seq_num,
piv.exposure_id,
piv.mt_pointing_mount_position_ra_mean,
MAX(CASE WHEN unpiv.field = 'accelerationPSDX0' THEN unpiv.value END) AS vibration_psd_x0,
MAX(CASE WHEN unpiv.field = 'accelerationPSDX1' THEN unpiv.value END) AS vibration_psd_x1,
MAX(CASE WHEN unpiv.field = 'accelerationPSDY0' THEN unpiv.value END) AS vibration_psd_y0,
MAX(CASE WHEN unpiv.field = 'accelerationPSDY1' THEN unpiv.value END) AS vibration_psd_y1
FROM
efd_lsstcam.exposure_efd AS piv
JOIN
efd_lsstcam.exposure_efd_unpivoted AS unpiv
ON piv.day_obs = unpiv.day_obs AND piv.seq_num = unpiv.seq_num
WHERE
piv.day_obs = 20250806
AND unpiv.property = 'mt_vibration_data_mean'
GROUP BY
piv.day_obs, piv.seq_num, piv.exposure_id, piv.mt_pointing_mount_position_ra_mean
ORDER BY
piv.seq_num
LIMIT 50;
Advanced Query Examples¶
Example 1: Environmental Conditions Over Time Range
This query retrieves environmental data for a specific time range using indexed columns:
SELECT
day_obs,
seq_num,
exposure_id,
mt_dome_temperature_mean,
mt_dimm_fwhm_mean,
mt_salindex110_sonic_temperature_mean
FROM
efd_lsstcam.exposure_efd
WHERE
day_obs BETWEEN 20250801 AND 20250807
ORDER BY
day_obs, seq_num
LIMIT 200;
Example 2: Data Completeness Analysis
This query shows how to check data completeness for instrument metrics:
SELECT
day_obs,
COUNT(*) as total_exposures,
COUNT(mt_dome_temperature_mean) as dome_temp_measurements,
COUNT(mt_dimm_fwhm_mean) as seeing_measurements,
COUNT(mt_salindex110_sonic_temperature_mean) as sonic_temp_measurements
FROM
efd_lsstcam.exposure_efd
WHERE
day_obs = 20250806
GROUP BY
day_obs;
Example 3: Mount Performance Analysis
Analyze mount encoder performance and pointing accuracy:
SELECT
day_obs,
seq_num,
exposure_id,
mt_azimuth_encoder_absolute_position_0_rms_jitter,
mt_elevation_encoder_absolute_position_0_rms_jitter,
mt_pointing_mount_position_ra_mean,
mt_pointing_mount_position_ra_stddev
FROM
efd_lsstcam.exposure_efd
WHERE
day_obs = 20250806
AND mt_azimuth_encoder_absolute_position_0_rms_jitter IS NOT NULL
ORDER BY
seq_num
LIMIT 100;
Example 4: Wind and Seeing Conditions
Query for good observing conditions:
SELECT
day_obs,
seq_num,
exposure_id,
mt_dimm_fwhm_mean,
mt_salindex110_wind_speed_0_mean,
mt_salindex110_sonic_temperature_mean
FROM
efd_lsstcam.exposure_efd
WHERE
day_obs = 20250806
AND mt_dimm_fwhm_mean < 0.8
AND mt_salindex110_wind_speed_0_mean < 10.0
ORDER BY
seq_num
LIMIT 50;
Example 5: Unpivoted Data Analysis
Query unpivoted data using the indexed columns for efficient performance:
SELECT
day_obs,
seq_num,
exposure_id,
property,
field,
value
FROM
efd_lsstcam.exposure_efd_unpivoted
WHERE
day_obs = 20250806
AND property = 'mt_vibration_data_mean'
AND field IN ('accelerationPSDX0', 'accelerationPSDX1', 'accelerationPSDY0')
ORDER BY
seq_num, field
LIMIT 100;
Unpivoted Tables Corresponding Data Model¶
These transformations produce multiple values per exposure or visit. The results are stored in the exposure_efd_unpivoted and visit1_efd_unpivoted tables. Each row contains a property, a field (the specific sensor), and a value.
Property Name |
Data Type |
IVOA Unit |
Description |
IVOA UCD |
|---|---|---|---|---|
|
float |
m**2.s**-4.Hz**-1 |
Mean acceleration PSD in the sensor direction. Fields: |
meta.ucd;stat.mean |
|
float |
m**2.s**-4.Hz**-1 |
Standard deviation of the acceleration PSD in the sensor direction. Fields: |
stat.stdev |
|
float |
m**2.s**-4.Hz**-1 |
Maximum acceleration PSD in the sensor direction. Fields: |
meta.ucd;stat.max |
|
float |
m**2.s**-4.Hz**-1 |
Minimum acceleration PSD in the sensor direction. Fields: |
meta.ucd;stat.min |
|
float |
um |
z4-z22 terms of annular Zernike polynomials. Fields: |
|
|
float |
um |
Aggregated degree of freedom in the control algorithm. The unit of angle-related elements is arcsec instead of micron. Fields: |
|
|
float |
um |
Calculated degree of freedom in the last visit. The unit of angle-related elements is arcsec instead of micron. Fields: |
|
|
float |
N |
Actuator force in z direction. Fields: |
|
|
float |
N |
Actuator force in z direction. Fields: |
|
|
float |
N |
Mean forces in Z axis. Fields: |
meta.ucd;stat.mean |
|
float |
N |
Mean forces in Z axis. Fields: |
meta.ucd;stat.mean |
|
float |
N |
Mean forces in Z axis. Fields: |
meta.ucd;stat.mean |
|
float |
N |
Mean forces in Z axis. Fields: |
meta.ucd;stat.mean |
|
float |
N |
Mean force applied by SAL command or script for each actuator in sequence. Fields: |
meta.ucd;stat.mean |
|
float |
N |
Mean gravity component (F_e + F_0 + F_a + F_f) of look-up table (LUT) force for each actuator in sequence. Fields: |
meta.ucd;stat.mean |
|
float |
N |
Mean temperature component (T_u + T_x + T_y + T_r) of look-up table (LUT) force for each actuator in sequence. Fields: |
meta.ucd;stat.mean |
|
float |
z4-z22 terms of Noll Zernike polynomial indices. Fields: |
Common Query Scenarios¶
This section provides complete end-to-end examples for typical analysis scenarios.
Scenario 1: Environmental Conditions Analysis¶
Goal: Analyze environmental conditions (temperature, seeing, wind) for a specific night to understand observing quality.
Complete workflow:
-- Get environmental conditions for all exposures on a specific night
SELECT
day_obs,
seq_num,
exposure_id,
mt_dome_temperature_mean,
mt_dimm_fwhm_mean,
mt_salindex110_wind_speed_0_mean,
mt_salindex110_sonic_temperature_mean
FROM efd_lsstcam.exposure_efd
WHERE day_obs = 20250806
ORDER BY seq_num;
-- Find exposures with good seeing conditions
SELECT
day_obs,
seq_num,
exposure_id,
mt_dimm_fwhm_mean,
mt_dome_temperature_mean
FROM efd_lsstcam.exposure_efd
WHERE day_obs = 20250806
AND mt_dimm_fwhm_mean < 0.8
ORDER BY mt_dimm_fwhm_mean;
-- Summary statistics for the night
SELECT
COUNT(*) as total_exposures,
AVG(mt_dome_temperature_mean) as avg_temp,
AVG(mt_dimm_fwhm_mean) as avg_seeing,
MIN(mt_dimm_fwhm_mean) as best_seeing,
MAX(mt_dimm_fwhm_mean) as worst_seeing
FROM efd_lsstcam.exposure_efd
WHERE day_obs = 20250806
AND mt_dimm_fwhm_mean IS NOT NULL;
Scenario 2: Mount Performance Analysis¶
Goal: Analyze telescope mount performance by examining encoder jitter and pointing accuracy.
Complete workflow:
-- Get mount encoder performance for a specific day
SELECT
day_obs,
seq_num,
exposure_id,
mt_azimuth_encoder_absolute_position_0_rms_jitter,
mt_elevation_encoder_absolute_position_0_rms_jitter,
mt_pointing_mount_position_ra_mean,
mt_pointing_mount_position_ra_stddev
FROM efd_lsstcam.exposure_efd
WHERE day_obs = 20250806
AND mt_azimuth_encoder_absolute_position_0_rms_jitter IS NOT NULL
ORDER BY seq_num;
-- Find exposures with high jitter
SELECT
day_obs,
seq_num,
exposure_id,
mt_azimuth_encoder_absolute_position_0_rms_jitter,
mt_elevation_encoder_absolute_position_0_rms_jitter
FROM efd_lsstcam.exposure_efd
WHERE day_obs = 20250806
AND (mt_azimuth_encoder_absolute_position_0_rms_jitter > 0.01
OR mt_elevation_encoder_absolute_position_0_rms_jitter > 0.01)
ORDER BY mt_azimuth_encoder_absolute_position_0_rms_jitter DESC;
Scenario 3: Vibration Sensor Analysis¶
Goal: Analyze vibration data from multiple sensors stored in unpivoted format.
Complete workflow:
-- Get vibration data for a specific exposure
SELECT
day_obs,
seq_num,
exposure_id,
property,
field,
value
FROM efd_lsstcam.exposure_efd_unpivoted
WHERE day_obs = 20250806
AND seq_num = 55
AND property = 'mt_vibration_data_mean'
ORDER BY field;
-- Pivot vibration data for easier analysis
SELECT
day_obs,
seq_num,
exposure_id,
MAX(CASE WHEN field = 'accelerationPSDX0' THEN value END) AS psdx0,
MAX(CASE WHEN field = 'accelerationPSDX1' THEN value END) AS psdx1,
MAX(CASE WHEN field = 'accelerationPSDY0' THEN value END) AS psdy0,
MAX(CASE WHEN field = 'accelerationPSDY1' THEN value END) AS psdy1
FROM efd_lsstcam.exposure_efd_unpivoted
WHERE day_obs = 20250806
AND property = 'mt_vibration_data_mean'
GROUP BY day_obs, seq_num, exposure_id
ORDER BY seq_num;
-- Combine vibration data with environmental conditions
SELECT
piv.day_obs,
piv.seq_num,
piv.mt_dome_temperature_mean,
unpiv.field AS sensor_name,
unpiv.value AS vibration_psd
FROM efd_lsstcam.exposure_efd AS piv
JOIN efd_lsstcam.exposure_efd_unpivoted AS unpiv
ON piv.day_obs = unpiv.day_obs AND piv.seq_num = unpiv.seq_num
WHERE piv.day_obs = 20250806
AND unpiv.property = 'mt_vibration_data_mean'
ORDER BY piv.seq_num, unpiv.field;
Scenario 4: Data Completeness Check¶
Goal: Verify data availability and completeness for a time period.
Complete workflow:
-- Check data availability for a week
SELECT
day_obs,
COUNT(*) as total_exposures,
COUNT(mt_dome_temperature_mean) as dome_temp_measurements,
COUNT(mt_dimm_fwhm_mean) as seeing_measurements,
COUNT(mt_salindex110_wind_speed_0_mean) as wind_measurements
FROM efd_lsstcam.exposure_efd
WHERE day_obs BETWEEN 20250801 AND 20250807
GROUP BY day_obs
ORDER BY day_obs;
-- Check for missing data patterns
SELECT
day_obs,
COUNT(*) as total_exposures,
COUNT(mt_dome_temperature_mean) as temp_data,
ROUND(COUNT(mt_dome_temperature_mean)::numeric / COUNT(*) * 100, 2) as temp_coverage_pct
FROM efd_lsstcam.exposure_efd
WHERE day_obs BETWEEN 20250801 AND 20250807
GROUP BY day_obs
HAVING COUNT(mt_dome_temperature_mean) < COUNT(*) * 0.9
ORDER BY day_obs;
-- Check unpivoted data availability
SELECT
day_obs,
COUNT(DISTINCT seq_num) as exposures_with_unpivoted_data,
COUNT(DISTINCT property) as unique_properties,
COUNT(*) as total_unpivoted_records
FROM efd_lsstcam.exposure_efd_unpivoted
WHERE day_obs = 20250806
GROUP BY day_obs;
Troubleshooting Common Issues¶
No Data Returned for Expected Time Range
If your query returns no results for a time period when you expect data:
Check Data Availability: Query using indexed columns to see if data exists for your time range
Verify Schema Name: Ensure you’re using the correct schema name for your instrument
Check for NULL Values: Some columns may be NULL if telemetry wasn’t available
Missing Columns or Properties
If you’re looking for specific telemetry data that doesn’t appear:
Check Schema Files: For pivoted tables, refer to the Science Data Models linked in the “Pivoted vs. Unpivoted Data Model” section above, which list all available columns for each instrument.
Check Unpivoted Properties: For unpivoted tables, refer to the “Unpivoted Tables Corresponding Data Model” section in this documentation, which lists all available properties.
Check Unpivoted Tables: Some data may be stored in unpivoted format:
SELECT
DISTINCT property
FROM
efd_lsstcam.exposure_efd_unpivoted
WHERE
day_obs = 20250806
LIMIT 20;
Performance Issues with Large Queries
For queries over long time periods:
Use Indexed Columns: Always include
day_obsandseq_numfilters to use the primary key indexFor Unpivoted Tables: Use
day_obs,seq_num,property, andfieldfor optimal performance (exposure_idis not indexed)Limit Result Sets: Use
LIMITclauses for exploratory queriesAvoid Non-Indexed Filters: Filtering on
exposure_idor other non-indexed columns can be slow for large datasetsUse Composite Keys: When joining tables, use
day_obsandseq_numtogether instead ofexposure_id
Understanding NULL Values
NULL values in the data typically indicate:
No Telemetry Available: The EFD topic wasn’t available during the observation
Transformation Failed: The transformation function couldn’t process the data
Configuration Issue: The telemetry topic isn’t configured for the instrument
Use IS NOT NULL filters when you need complete data:
SELECT
day_obs,
seq_num,
exposure_id,
mt_dome_temperature_mean
FROM
efd_lsstcam.exposure_efd
WHERE
day_obs = 20250806
AND mt_dome_temperature_mean IS NOT NULL
LIMIT 50;