Skip to content

Conversation

sundarshankar89
Copy link
Collaborator

Changes

What does this PR do?

  • Introduces new Synapse Profiler scripts for in-depth Azure Synapse assessment as part of Lakebridge resources.
  • Creates a YAML pipeline configuration (pipeline_config.yml) to orchestrate data extraction and metric collection across Synapse environments.

Relevant implementation details

Caveats/things to watch out for when reviewing:

Linked issues

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • ... +add your own

Tests

  • manually tested
  • added unit tests
  • added integration tests

@sundarshankar89 sundarshankar89 self-assigned this Sep 15, 2025
@sundarshankar89 sundarshankar89 requested a review from a team as a code owner September 15, 2025 07:11
@sundarshankar89 sundarshankar89 added the feat/profiler Issues related to profilers label Sep 15, 2025
Copy link

github-actions bot commented Sep 15, 2025

✅ 29/29 passed, 2 flaky, 1m19s total

Flaky tests:

  • 🤪 test_transpiles_informatica_with_sparksql (9.932s)
  • 🤪 test_transpile_sql_file (8.51s)

Running from acceptance #2398

Copy link
Contributor

@m-abulazm m-abulazm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a better interface to execute
I am not sure if we want to leave it to the developers to always initiate, the workspace and creds. this will lead to inconsistencies sooner or later.

I also would split extracting the "metrics" and persisting them so we need at least two methods extract and persist

raise FileNotFoundError(f"Credentials file not found at {path}") from e


def create_credential_manager(file_path: Path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference from databricks.labs.lakebridge.connections.credential_manager.create_credential_manager

we should merge them together and make it reusable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

return CredentialManager(loader, secret_providers)


def get_sqlpool_reader(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/databricks/labs/lakebridge/connections/database_manager.py also implements a synapse client. what is the difference and we should merge them?

)


def get_synapse_jdbc_settings(config: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used anywhere. please remove

return MetricsQueryClient(credential=DefaultAzureCredential())


def save_resultset_to_db(result, table_name: str, db_path: str, mode: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the duckdb functions to their own file

return max_column_val


def get_serverless_database_groups(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used in one place only, can go next to it

from sqlalchemy import text


def execute():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the long execute methods need to be split up. as this is needed to add the missing unit tests

)


def execute():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

from databricks.labs.lakebridge.resources.assessments.synapse.common.profiler_classes import SynapseWorkspace


def execute():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

from sqlalchemy import text


def execute():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/profiler Issues related to profilers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants