Skip to content

Conversation

susan-shu-c
Copy link
Member

@susan-shu-c susan-shu-c commented Sep 18, 2025

1. What does this PR do?

2. Which ECS fields are affected/introduced?

Field Type Description /Usage
gen_ai.system_instructions flattened The system message or instructions provided to the GenAI model separately from the chat history.
gen_ai.input.messages nested The chat history provided to the model as an input.
gen_ai.output.messages nested Messages returned by the model where each message represents a specific model response (choice, candidate).
gen_ai.tool.definitions nested The list of source system tool definitions available to the GenAI agent or model.
gen_ai.tool.call.arguments flattened Parameters passed to the tool call.
gen_ai.tool.call.result flattened The result returned by the tool call (if any and if execution was successful).

Changes based on OTel:

3. Why is this change necessary?

4. Have you added/updated documentation?

YES / NO / N/A

5. Have you built ECS and committed any newly generated files?

YES / NO

6. Have you run the ECS validation tests locally?

YES / NO

7. Anything else for the reviewers?

Looking for feedback

[Edit: see comment]

For the fields where it would be more useful to keep the associations and have more cases for searching, I changed the field type to nested, and for those that don't need the associations and probably don't need nested searching, I changed them to flattened.

For most of the fields, they are lists of .json objects, or .json objects. For fields whose content could be very long (input.messages, output.messages), I have proposed that they are the flattened type due to costs.

via docs for nested type:

When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with key and value fields. Instead, consider using the flattened data type, which maps an entire object as a single field and allows for simple searches over its contents. Nested documents and queries are typically expensive, so using the flattened data type for this use case is a better option.

Though as I am not a subject matter expert on the field types and efficiency, looking for additional feedback or comments.


Commit Message

Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link

Documentation changes preview: https://docs-v3-preview.elastic.dev/elastic/ecs/pull/2532/reference/

Copy link

github-actions bot commented Sep 18, 2025

Copy link
Contributor

@trisch-me trisch-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as stage 2 is a final stage, please update all examples and generate all fields

@susan-shu-c
Copy link
Member Author

@flash1293 thanks a lot for explaining the tradeoffs. For the fields where it would be more useful to keep the associations and have more cases for searching, I changed the field type to nested, and for those that don't need the associations and probably don't need nested searching, I changed them to flattened. Really appreciate the help.

@flash1293
Copy link

Thanks @susan-shu-c - side note, do we have cases of existing nested and flattened ECS fields already?

@susan-shu-c
Copy link
Member Author

Hi @flash1293 there are a few:

Flattened:
elf.exports
log.syslog.structured_data

Nested:
elf.sections
threat.enrichments

@susan-shu-c
Copy link
Member Author

susan-shu-c commented Oct 10, 2025

Thanks for the comments all. I've updated the following:

  1. Cleaned up the rfcs/text/0052-gen_ai-additional-fields.md file to be on par with rfcs/text/0052/gen_ai.yaml, cleaned up unused comments.
  2. Updated schemas/gen_ai.yml which Michael said the published schema will be taken from

However, when I try to run make clean generate experimental I am getting some unexpected behavior, where many .md files in docs is being deleted, trying to resolve with @mjwolf

Also getting a failure in tests

  File "/code/ecs/scripts/generators/otel.py", line 156, in __set_stability
    otel['stability'] = self.attributes[get_otel_attribute_name(field_details, otel)]['stability']
                        ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'gen_ai.output.messages'
make: *** [generator] Error 1

OTel reference: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/#gen-ai-output-messages

@Mikaayenson
Copy link

Migrating a slack thought to the issue for posterity. Here are a couple things we should consider before pushing this forward.

  1. The default setting for limiting nested fields on indices ( index.mapping.nested_fields.limit ) is 50 . If customers try to create a new index with a higher limit, they will receive the following error: Settings [index.mapping.nested_fields.limit,index.mapping.nested_objects.limit] are not available when running in serverless mode. It's a serverless limitation that can't be overridden without Elastic support involved. ECH its much higher. I think like 10k and can be manually overridden.
  2. IINM nested fields are not visible in Kibana visualizations. If there are any that we would imagine want to be visualized, it would be impacted, so we may want to consider changing them to flattened.

@trisch-me
Copy link
Contributor

@susan-shu-c you error about not finding gen_ai.tool.call.arguments is because of this field not being released yet. Last known release (which I have merged to ecs today) doesn’t contain this field, i.e. we can’t say it’s an otel: match
As a workaround - we could skip otel definition for ecs fields for those fields that are not released yet but we should make sure it will not be forgotten, i.e. just comment them out for example

As an idea we can just always work against main in otel. There are no things deleted anymore, everything is deprecated.

@susan-shu-c
Copy link
Member Author

hi, for now, I have marked the tool.call[...] fields as OTel related - in v1.37.0 it'd still be under gen_ai.operation.name (roughly speaking) - link

Screenshot 2025-10-17 at 12 05 13 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants