Extract sections from text based on start and end keys.
This class provides functionality to extract sections of text that begin with any of the start keys and end just before any of the end keys (i.e., not include the end keys).
If no start key is found, return "".
If start key is None, the section will start from the beginning of the text until the end key (exclusive).
If no end key is found or end key is None, it extracts until the end of the text.
match_strategy: Strategy for matching both start and end keys:
“greedy” (default):
Scans text from left to right
Returns first match found for any pattern in keys
Order of patterns in keys list doesn’t matter
Faster but less precise when order matters
“sequential”:
Tries each pattern in keys list in order
Returns first successful match
Order of patterns in keys list matters
More precise but slightly slower
Parameters
Name
Type
Description
Default
start_keys
list[str] | None
List of possible section start markers as regular expression. If None, the section will be extracted from the beginning of the text.
required
end_keys
list[str] | None
List of possible section end markers as regular expression. The end_key will not be included in the extracted section. If None, the section will be extracted until the end of the text.
required
include_start_keys
bool
Whether to include the start key in the extracted section. Default is True.
True
word_boundary
bool
Whether to wrap word boundary around the keys. Default is True.
False
flags
Union[re.RegexFlag, int]
Regex flags to use in pattern matching. For ‘re’ backend: These are directly passed to re.compile() For ‘re2’ backend: These are converted to re2.Options properties Default is re.IGNORECASE.
re.IGNORECASE
match_strategy
(greedy, sequential)
Strategy for matching both start and end keys.
"greedy"
backend
(re, re2)
Regex backend to use: - “re”: Standard Python regex engine (default) - “re2”: Google’s RE2 engine (must be installed)
"re"
Examples
from radreportparser import SectionExtractor# Create an extractor for finding text between headersextractor = SectionExtractor( start_keys=["FINDINGS:"], end_keys=["IMPRESSION:", "CONCLUSION:"])print(extractor)
Extract a section from the text using configured patterns.
Extract a section from text if any of start_keys matches. If multiple start_keys matches are found in text, return section from the first match. The matching strategy is controlled by match_strategy argument in the initialization of SectionExtractor()
Parameters
Name
Type
Description
Default
text
str
The input text to extract section from.
required
verbose
bool
If true and there are more than one position of text that matches the start_keys, print message to standard output.
True
Returns
Name
Type
Description
str
The extracted section text. Returns empty string if section not found.
Examples
# Create an extractor for finding textfrom radreportparser import SectionExtractorextractor = SectionExtractor( start_keys=["FINDINGS:"], end_keys=["IMPRESSION:"])# Extract section from texttext ="FINDINGS: Normal. IMPRESSION: Clear."section = extractor.extract(text)print(section)
FINDINGS: Normal.
extract_all
SectionExtractor.extract_all(text: str)
Extract all sections from the text that match the configured patterns.
Extract one or more section(s) from text if any of start_keys matches. The matching strategy is controlled by match_strategy argument in the initialization of SectionExtractor().
Parameters
Name
Type
Description
Default
text
str
The input text to extract sections from
required
Returns
Name
Type
Description
List[str]
List of extracted section texts. Returns empty list if no sections found.
Examples
# Create an extractor for finding textfrom radreportparser import SectionExtractorextractor = SectionExtractor( start_keys=["FINDING:"], end_keys=["IMPRESSION:"])text ='''FINDING: First observationIMPRESSION: OKFINDING: Second observationIMPRESSION: Also OK'''sections = extractor.extract_all(text)print(sections)
['FINDING: First observation', 'FINDING: Second observation']