Skip to content

Conversation

junngo
Copy link
Contributor

@junngo junngo commented Sep 26, 2025

Currently, Treeherder ingests performance data (PERFHERDER_DATA:) by parsing raw logs.
This patch supports reading data from the perfherder-data.json artifact instead.
For now, both the existing log parsing and the new JSON ingestion run in parallel to maintain compatibility.

bugzilla :https://bugzilla.mozilla.org/show_bug.cgi?id=1990742

@junngo junngo marked this pull request as draft September 26, 2025 14:18
@gmierz gmierz self-requested a review September 29, 2025 12:27
return artifact_list


def post_perfherder_artifacts(job_log):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junngo I think it would be better for us to put this into a separate area. This folder seems to be specifically for parsing logs, but we're parsing JSONs instead. What do you think about having this task defined here in the perf directory? https://github.com/mozilla/treeherder/blob/505ad6b4047f77fc3ecdea63e57881116340d0fb/treeherder/perf/tasks.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmierz Splitting the code is a great idea. Creating a separate file under the code directory [0] looks good to me. It feels more cohesive to put it there, since the log parsing [1] also lives in that folder.
Please consider my opinion and feel free to tell me about the directory location.

[0]
https://github.com/mozilla/treeherder/tree/505ad6b4047f77fc3ecdea63e57881116340d0fb/treeherder/log_parser
[1]

with make_request(self.url, stream=True) as response:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the new file based on your feedback. It seems more suitable since the JSON artifact isn’t part of the log parsing process :)

existing_replicates = set(
PerformanceDatumReplicate.objects.filter(
performance_datum=subtest_datum
).values_list("value", flat=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is happening because of duplicate ingestion tasks (log, and json). I think we should find a way to default to using the JSON if they exist, and ignore the data we find in the logs. Maybe we could have a list of tests that we start with for testing this out? I'm thinking we could start with these tasks since the data they produce is not useful so any failures won't be problematic: https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=regress&revision=6bd2ea6b9711dc7739d8ee7754b9330b11d0719d&selectedTaskRun=K87CGE6IT1GHl6wD4Skbyw.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, log parsing and the JSON file feature are both active right now, so I handled the duplication.
I’ll revert that, add an allowlist, and only call _load_perf_datum for whitelisted tests when needed.

@junngo junngo force-pushed the ingest-perfherder-data branch from 34855c7 to 26bc32d Compare September 30, 2025 14:44
@junngo junngo marked this pull request as ready for review September 30, 2025 14:44
@junngo
Copy link
Contributor Author

junngo commented Oct 1, 2025

ID Framework Enabled Suites
1 talos true
2 build_metrics true compiler warnings, compiler_metrics, decision ...
4 awsy true
5 awfy false
6 platform_microbench true
10 raptor true
11 js-bench true
12 devtools true
13 browsertime true constant-regression ...
14 vcs false
15 mozperftest true
16 fxrecord true
17 telemetry true

I have a list of frameworks generated locally by django code.
It would be good to gradually reflect the less important framework-suite mappings one by one.

[0]
compiler warnings: https://firefoxci.taskcluster-artifacts.net/NE-naCeqSyenKogxu0nD4Q/0/public/build/perfherder-data-building.json
compiler_metrics: https://firefoxci.taskcluster-artifacts.net/P1T_HaXURD-r59ymlz5GWA/0/public/build/perfherder-data-compiler-metrics.json
decision: https://firefoxci.taskcluster-artifacts.net/OKsoq3lARpCjUhwVjqDddA/0/public/perfherder-data-decision.json

Copy link
Contributor Author

@junngo junngo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note:

# treeherder/etl/jobs.py
parse_logs.apply_async(queue=queue, args=[job.id, [job_log.id], priority])

I considered splitting the queues, but decided to keep using the existing ones to avoid code duplication and increased complexity.

https://github.com/mozilla/treeherder/pull/8997/files#diff-937b3e21ad52eec5277a7f52f51572348a072addafb88a049f9fe302ae437e76R369

@junngo junngo force-pushed the ingest-perfherder-data branch from 26bc32d to 7ec7ee8 Compare October 7, 2025 12:29
@junngo
Copy link
Contributor Author

junngo commented Oct 7, 2025

Hi there :) I updated the code.
There is log parsing feature. I didn’t modify the existing log parsing feature. Instead, I created the new queue and task for handling the perfherder-data.json artifacts. I separated the processing of logs and perfherder-data.json artifacts so that they can run on different queues.

@junngo junngo force-pushed the ingest-perfherder-data branch from 7ec7ee8 to b29d246 Compare October 7, 2025 13:58
Copy link
Collaborator

@gmierz gmierz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start @junngo! It looks like we're getting close :)


job_log_name = job_log.name.replace("-", "_")
if job_log_name.startswith("perfherder_data"):
_schedule_perfherder_ingest(job, job_log, result, repository)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling the schedule function here, we should call it in the _load_job method similar to where we call the _schedule_log_parsing function.

)

first_exception = None
for job_log in job_logs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is parsing the logs, but this new task should only be responsible for handling the JSON artifacts.

Copy link
Contributor Author

@junngo junngo Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review :)
(I thought the purpose of JobLog table was as follows.) The job_logs variable is built from the JobLog table, but that table isn’t just for raw log parsing. It’s a generic job reference table that also tracks artifacts like live_backing_log and perfherder-data-artifact.json and so on.
We store references to whatever needs further processing there, and then different Celery queues pick them up and handle them.
I agree the wording around job_logs could be confusing, so I’ll rename things to make it clear!
If you have any other feedback or ideas, I’d be happy to hear them.

@junngo junngo force-pushed the ingest-perfherder-data branch 2 times, most recently from 69ce6a2 to c319a67 Compare October 9, 2025 11:22

first_exception = None
for job_artifact in job_artifacts:
job_log_name = job_artifact.name.replace("-", "_")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change this to job_artifact_name


if job_artifact.status not in (JobLog.PENDING, JobLog.FAILED):
logger.info(
"Skipping ingest_perfherder_data for job %s since log already processed. Log Status: %s",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "since artifact already processed."

@junngo junngo force-pushed the ingest-perfherder-data branch 2 times, most recently from 3f6fcf6 to 845ee84 Compare October 14, 2025 14:48
Copy link
Collaborator

@gmierz gmierz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot better now :) a few questions/minor things below. I think the major thing is where we're checking the should_ingest stuff.

)

log_refs = job_datum.get("log_references", [])
log_refs = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could split this out of log_refs and make some additional code to handle the JogLog creation for those artifacts below the log ones? e.g.

if perf_refs:
    for artifact in perf_refs:
        ...
    _schedule...

Maybe some of the code could be generalized here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I split log_refs and perfherder_data_references and updated it :)


try:
serialized_artifacts = serialize_artifact_json_blobs(artifact_list)
store_job_artifacts(serialized_artifacts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call store_performance_artifact here directly instead of going through the store_job_artifacts method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I called the store_performance_artifact method directly instead of store_job_artifacts method.

@junngo junngo force-pushed the ingest-perfherder-data branch from 845ee84 to cb5b351 Compare October 15, 2025 15:47
Copy link
Collaborator

@gmierz gmierz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ changes look great to me now, great work @junngo!

@junngo junngo force-pushed the ingest-perfherder-data branch from cb5b351 to 5b4abc0 Compare October 16, 2025 12:54
Comment on lines +55 to +57
elif [ "$1" == "worker_perf_ingest" ]; then
export REMAP_SIGTERM=SIGQUIT
exec newrelic-admin run-program celery -A treeherder worker --without-gossip --without-mingle --without-heartbeat -Q perf_ingest --concurrency=7
Copy link
Contributor Author

@junngo junngo Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmierz
In production, we might need to run a worker_perf_ingest process for the perf_ingest queue.
But I’m not fully sure whether this entrypoint is actually required.

@gmierz gmierz merged commit af7211c into mozilla:master Oct 20, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants