SectionExtractor

SectionExtractor(
    self,
    start_keys: list[str] | None,
    end_keys: list[str] | None,
    include_start_keys: bool = True,
    word_boundary: bool = False,
    flags: Union[re.RegexFlag, int] = re.IGNORECASE,
    match_strategy: Literal['greedy', 'sequential'] = 'greedy',
    backend: Literal['re', 're2'] = 're',
)

Extract sections from text based on start and end keys.

This class provides functionality to extract sections of text that begin with any of the start keys and end just before any of the end keys (i.e., not include the end keys).

If no start key is found, return "".
If start key is None, the section will start from the beginning of the text until the end key (exclusive).
If no end key is found or end key is None, it extracts until the end of the text.

match_strategy: Strategy for matching both start and end keys:

“greedy” (default):
- Scans text from left to right
- Returns first match found for any pattern in keys
- Order of patterns in keys list doesn’t matter
- Faster but less precise when order matters
“sequential”:
- Tries each pattern in keys list in order
- Returns first successful match
- Order of patterns in keys list matters
- More precise but slightly slower

Parameters

Name	Type	Description	Default
start_keys	list[str] \| None	List of possible section start markers as regular expression. If None, the section will be extracted from the beginning of the text.	required
end_keys	list[str] \| None	List of possible section end markers as regular expression. The `end_key` will not be included in the extracted section. If None, the section will be extracted until the end of the text.	required
include_start_keys	bool	Whether to include the start key in the extracted section. Default is True.	`True`
word_boundary	bool	Whether to wrap word boundary around the keys. Default is True.	`False`
flags	Union[re.RegexFlag, int]	Regex flags to use in pattern matching. For ‘re’ backend: These are directly passed to re.compile() For ‘re2’ backend: These are converted to re2.Options properties Default is `re.IGNORECASE`.	`re.IGNORECASE`
match_strategy	(greedy, sequential)	Strategy for matching both start and end keys.	`"greedy"`
backend	(re, re2)	Regex backend to use: - “re”: Standard Python regex engine (default) - “re2”: Google’s RE2 engine (must be installed)	`"re"`

Examples

from radreportparser import SectionExtractor
# Create an extractor for finding text between headers
extractor = SectionExtractor(
    start_keys=["FINDINGS:"],
    end_keys=["IMPRESSION:", "CONCLUSION:"]
)
print(extractor)

SectionExtractor(start_keys=['FINDINGS:'], end_keys=['IMPRESSION:', 'CONCLUSION:'], include_start_keys=True, word_boundary=False, flags=re.IGNORECASE, match_strategy='greedy', backend='re')

Methods

Name	Description
extract	Extract a section from the text using configured patterns.
extract_all	Extract all sections from the text that match the configured patterns.

extract

SectionExtractor.extract(text: str, verbose: bool = True)

Extract a section from the text using configured patterns.

Extract a section from text if any of start_keys matches. If multiple start_keys matches are found in text, return section from the first match. The matching strategy is controlled by match_strategy argument in the initialization of SectionExtractor()

Parameters

Name	Type	Description	Default
text	str	The input text to extract section from.	required
verbose	bool	If `true` and there are more than one position of `text` that matches the `start_keys`, print message to standard output.	`True`

Returns

Name	Type	Description
	str	The extracted section text. Returns empty string if section not found.

Examples

# Create an extractor for finding text
from radreportparser import SectionExtractor
extractor = SectionExtractor(
    start_keys=["FINDINGS:"],
    end_keys=["IMPRESSION:"]
)
# Extract section from text
text = "FINDINGS: Normal. IMPRESSION: Clear."
section = extractor.extract(text)
print(section)

FINDINGS: Normal.

extract_all

SectionExtractor.extract_all(text: str)

Extract all sections from the text that match the configured patterns.

Extract one or more section(s) from text if any of start_keys matches. The matching strategy is controlled by match_strategy argument in the initialization of SectionExtractor().

Parameters

Name	Type	Description	Default
text	str	The input text to extract sections from	required

Returns

Name	Type	Description
	List[str]	List of extracted section texts. Returns empty list if no sections found.

Examples

# Create an extractor for finding text
from radreportparser import SectionExtractor
extractor = SectionExtractor(
    start_keys=["FINDING:"],
    end_keys=["IMPRESSION:"]
)
text = '''
FINDING: First observation
IMPRESSION: OK
FINDING: Second observation
IMPRESSION: Also OK
'''
sections = extractor.extract_all(text)
print(sections)

['FINDING: First observation', 'FINDING: Second observation']