From a7fbe6825733836ccc1b2a646388c91ee380704d Mon Sep 17 00:00:00 2001 From: Christopher Menon <16004217+cmenon12@users.noreply.github.com> Date: Sun, 3 Aug 2025 21:29:20 +0100 Subject: [PATCH 1/3] Fix bullet point formatting in global_search.md --- docs/query/global_search.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/query/global_search.md b/docs/query/global_search.md index a9685d70be..5f0af44171 100644 --- a/docs/query/global_search.md +++ b/docs/query/global_search.md @@ -60,7 +60,7 @@ Below are the key parameters of the [GlobalSearch class](https://github.com/micr * `reduce_system_prompt`: prompt template used in the `reduce` stage, default template can be found at [reduce_system_prompt](https://github.com/microsoft/graphrag/blob/main//graphrag/prompts/query/global_search_reduce_system_prompt.py) * `response_type`: free-form text describing the desired response type and format (e.g., `Multiple Paragraphs`, `Multi-Page Report`) * `allow_general_knowledge`: setting this to True will include additional instructions to the `reduce_system_prompt` to prompt the LLM to incorporate relevant real-world knowledge outside of the dataset. Note that this may increase hallucinations, but can be useful for certain scenarios. Default is False -*`general_knowledge_inclusion_prompt`: instruction to add to the `reduce_system_prompt` if `allow_general_knowledge` is enabled. Default instruction can be found at [general_knowledge_instruction](https://github.com/microsoft/graphrag/blob/main//graphrag/prompts/query/global_search_knowledge_system_prompt.py) +* `general_knowledge_inclusion_prompt`: instruction to add to the `reduce_system_prompt` if `allow_general_knowledge` is enabled. Default instruction can be found at [general_knowledge_instruction](https://github.com/microsoft/graphrag/blob/main//graphrag/prompts/query/global_search_knowledge_system_prompt.py) * `max_data_tokens`: token budget for the context data * `map_llm_params`: a dictionary of additional parameters (e.g., temperature, max_tokens) to be passed to the LLM call at the `map` stage * `reduce_llm_params`: a dictionary of additional parameters (e.g., temperature, max_tokens) to passed to the LLM call at the `reduce` stage @@ -70,4 +70,4 @@ Below are the key parameters of the [GlobalSearch class](https://github.com/micr ## How to Use -An example of a global search scenario can be found in the following [notebook](../examples_notebooks/global_search.ipynb). \ No newline at end of file +An example of a global search scenario can be found in the following [notebook](../examples_notebooks/global_search.ipynb). From a9ec9381328bdb8a3e012184f717dff47a3039cd Mon Sep 17 00:00:00 2001 From: Christopher Menon <16004217+cmenon12@users.noreply.github.com> Date: Sun, 3 Aug 2025 21:40:20 +0100 Subject: [PATCH 2/3] Fix query docs formatting Fix newlines and trailing whitespace --- docs/query/drift_search.md | 1 - docs/query/global_search.md | 3 +-- docs/query/local_search.md | 1 - docs/query/multi_index_search.md | 13 ++++++------- docs/query/question_generation.md | 1 + 5 files changed, 8 insertions(+), 11 deletions(-) diff --git a/docs/query/drift_search.md b/docs/query/drift_search.md index e6199221ed..8e26700b3c 100644 --- a/docs/query/drift_search.md +++ b/docs/query/drift_search.md @@ -14,7 +14,6 @@ DRIFT search (Dynamic Reasoning and Inference with Flexible Traversal) builds up

Figure 1. An entire DRIFT search hierarchy highlighting the three core phases of the DRIFT search process. A (Primer): DRIFT compares the user’s query with the top K most semantically relevant community reports, generating a broad initial answer and follow-up questions to steer further exploration. B (Follow-Up): DRIFT uses local search to refine queries, producing additional intermediate answers and follow-up questions that enhance specificity, guiding the engine towards context-rich information. A glyph on each node in the diagram shows the confidence the algorithm has to continue the query expansion step. C (Output Hierarchy): The final output is a hierarchical structure of questions and answers ranked by relevance, reflecting a balanced mix of global insights and local refinements, making the results adaptable and comprehensive.

- DRIFT Search introduces a new approach to local search queries by including community information in the search process. This greatly expands the breadth of the query’s starting point and leads to retrieval and usage of a far higher variety of facts in the final answer. This addition expands the GraphRAG query engine by providing a more comprehensive option for local search, which uses community insights to refine a query into detailed follow-up questions. ## Configuration diff --git a/docs/query/global_search.md b/docs/query/global_search.md index 5f0af44171..86d2a20ac0 100644 --- a/docs/query/global_search.md +++ b/docs/query/global_search.md @@ -45,11 +45,10 @@ flowchart LR ``` -Given a user query and, optionally, the conversation history, the global search method uses a collection of LLM-generated community reports from a specified level of the graph's community hierarchy as context data to generate response in a map-reduce manner. At the `map` step, community reports are segmented into text chunks of pre-defined size. Each text chunk is then used to produce an intermediate response containing a list of point, each of which is accompanied by a numerical rating indicating the importance of the point. At the `reduce` step, a filtered set of the most important points from the intermediate responses are aggregated and used as the context to generate the final response. +Given a user query and, optionally, the conversation history, the global search method uses a collection of LLM-generated community reports from a specified level of the graph's community hierarchy as context data to generate response in a map-reduce manner. At the `map` step, community reports are segmented into text chunks of pre-defined size. Each text chunk is then used to produce an intermediate response containing a list of point, each of which is accompanied by a numerical rating indicating the importance of the point. At the `reduce` step, a filtered set of the most important points from the intermediate responses are aggregated and used as the context to generate the final response. The quality of the global search’s response can be heavily influenced by the level of the community hierarchy chosen for sourcing community reports. Lower hierarchy levels, with their detailed reports, tend to yield more thorough responses, but may also increase the time and LLM resources needed to generate the final response due to the volume of reports. - ## Configuration Below are the key parameters of the [GlobalSearch class](https://github.com/microsoft/graphrag/blob/main//graphrag/query/structured_search/global_search/search.py): diff --git a/docs/query/local_search.md b/docs/query/local_search.md index bf0f43e3ce..cf99e46961 100644 --- a/docs/query/local_search.md +++ b/docs/query/local_search.md @@ -59,4 +59,3 @@ Below are the key parameters of the [LocalSearch class](https://github.com/micro ## How to Use An example of a local search scenario can be found in the following [notebook](../examples_notebooks/local_search.ipynb). - diff --git a/docs/query/multi_index_search.md b/docs/query/multi_index_search.md index 6b6ff2b41a..44fc89b9ad 100644 --- a/docs/query/multi_index_search.md +++ b/docs/query/multi_index_search.md @@ -2,19 +2,18 @@ ## Multi Dataset Reasoning -GraphRAG takes in unstructured data contained in text documents and uses large languages models to “read” the documents in a targeted fashion and create a knowledge graph. This knowledge graph, or index, contains information about specific entities in the data, how the entities relate to one another, and high-level reports about communities and topics found in the data. Indexes can be searched by users to get meaningful information about the underlying data, including reports with citations that point back to the original unstructured text. +GraphRAG takes in unstructured data contained in text documents and uses large languages models to “read” the documents in a targeted fashion and create a knowledge graph. This knowledge graph, or index, contains information about specific entities in the data, how the entities relate to one another, and high-level reports about communities and topics found in the data. Indexes can be searched by users to get meaningful information about the underlying data, including reports with citations that point back to the original unstructured text. -Multi-index search is a new capability that has been added to the GraphRAG python library to query multiple knowledge stores at once. Multi-index search allows for many new search scenarios, including: +Multi-index search is a new capability that has been added to the GraphRAG python library to query multiple knowledge stores at once. Multi-index search allows for many new search scenarios, including: - Combining knowledge from different domains – Many documents contain similar types of entities: person, place, thing. But GraphRAG can be tuned for highly specialized domains, such as science and engineering. With the recent updates to search, GraphRAG can now simultaneously query multiple datasets with completely different schemas and entity definitions. -- Combining knowledge with different access levels – Not all datasets are accessible to all people, even within an organization. Some datasets are publicly available. Some datasets, such as internal financial information or intellectual property, may only be accessible by a small number of employees at a company. Multi-index search allows multiple sources with different access controls to be queried at the same time, creating more nuanced and informative reports. Internal R&D findings can be seamlessly combined with open-source scientific publications. +- Combining knowledge with different access levels – Not all datasets are accessible to all people, even within an organization. Some datasets are publicly available. Some datasets, such as internal financial information or intellectual property, may only be accessible by a small number of employees at a company. Multi-index search allows multiple sources with different access controls to be queried at the same time, creating more nuanced and informative reports. Internal R&D findings can be seamlessly combined with open-source scientific publications. -- Combining knowledge in different locations – With multi-index search, indexes do not need to be in the same location or type of storage to be queried. Indexes in the cloud in Azure Storage can be queried at the same time as indexes stored on a personal computer. Multi-index search makes these types of data joins easy and accessible. - -To search across multiple datasets, the underlying contexts from each index, based on the user query, are combined in-memory at query time, saving on computation and allowing the joint querying of indexes that can’t be joined inherently, either do access controls or differing schemas. Multi-index search automatically keeps track of provenance information, so that any references can be traced back to the correct indexes and correct original documents. +- Combining knowledge in different locations – With multi-index search, indexes do not need to be in the same location or type of storage to be queried. Indexes in the cloud in Azure Storage can be queried at the same time as indexes stored on a personal computer. Multi-index search makes these types of data joins easy and accessible. +To search across multiple datasets, the underlying contexts from each index, based on the user query, are combined in-memory at query time, saving on computation and allowing the joint querying of indexes that can’t be joined inherently, either do access controls or differing schemas. Multi-index search automatically keeps track of provenance information, so that any references can be traced back to the correct indexes and correct original documents. ## How to Use -An example of a global search scenario can be found in the following [notebook](../examples_notebooks/multi_index_search.ipynb). \ No newline at end of file +An example of a global search scenario can be found in the following [notebook](../examples_notebooks/multi_index_search.ipynb). diff --git a/docs/query/question_generation.md b/docs/query/question_generation.md index 525a465499..6a2a69cf13 100644 --- a/docs/query/question_generation.md +++ b/docs/query/question_generation.md @@ -5,6 +5,7 @@ The [question generation](https://github.com/microsoft/graphrag/blob/main//graphrag/query/question_gen/) method combines structured data from the knowledge graph with unstructured data from the input documents to generate candidate questions related to specific entities. ## Methodology + Given a list of prior user questions, the question generation method uses the same context-building approach employed in [local search](local_search.md) to extract and prioritize relevant structured and unstructured data, including entities, relationships, covariates, community reports and raw text chunks. These data records are then fitted into a single LLM prompt to generate candidate follow-up questions that represent the most important or urgent information content or themes in the data. ## Configuration From 72fcf1155de931bf879019f9acf3a3c5d56c495e Mon Sep 17 00:00:00 2001 From: Christopher Menon <16004217+cmenon12@users.noreply.github.com> Date: Sun, 3 Aug 2025 21:52:37 +0100 Subject: [PATCH 3/3] Generate semversioner patch JSON --- .semversioner/next-release/patch-20250803205217783089.json | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 .semversioner/next-release/patch-20250803205217783089.json diff --git a/.semversioner/next-release/patch-20250803205217783089.json b/.semversioner/next-release/patch-20250803205217783089.json new file mode 100644 index 0000000000..d30f63fd7a --- /dev/null +++ b/.semversioner/next-release/patch-20250803205217783089.json @@ -0,0 +1,4 @@ +{ + "type": "patch", + "description": "Fix query docs formatting" +}