3GPP TS 38.331 is long and technical, so we cannot pass the whole document to the model. Retrieval is the step that finds the most relevant parts of the spec for a given question.
Libraries Used
rank_bm25 (BM25Okapi): This is the main retrieval algorithm. It ranks text chunks by keyword relevance.
re (regular expressions): Used for text cleaning and tokenization.
json: Read the extracted output from the PDF step and write retrieval results.
dataclasses (@dataclass): Clean structure for chunk objects.
typing (List, Dict, Optional, Tuple): Type hints for cleaner code and fewer bugs.
argparse: To run retrieval from command line with flags.
BeautifulSoup: from bs4 is used in adaptive retrieval to extract text from HTML pages.
typing: used for type hints such as List, Dict, Any, and Optional.
io.BytesIO: is used to read PDF bytes directly.
What the retriever does
Takes a user query
Scores all chunks using BM25
Returns the top k chunks with metadata so we can trace back to the spec
Inputs
extracted_data.json from the text extraction step. This contains page level text and metadata and is the main input to retrieval.
Chunking and metadata
We chunk the extracted text into overlapping windows
Each chunk keeps:
chunk id
page range
chunk text
Retrieval method
We use BM25 as our baseline retriever because technical specifications contain consistent keywords and terms. For each query, BM25 returns top ranked chunks along with BM25 scores.
Output format
For each of the top k chunks, we store:
rank
chunk id
page range
BM25 score
preview text
full chunk text
This output is then fed into the generation step.
Next steps
Add two pass retrieval later as a comparison
Result Example:
{
"query": "How does the network initiate the paging procedure?",
"top_k": 5,
"prefetch_k": 80,
"keyword": "paging,initiation",
"words_per_chunk": 550,
"overlap_words": 80,
"results": [
{
"rank": 1,
"chunk_id": "chunk_000072",
"score": 19.769371703048908,
"page_start": 81,
"page_end": 82,
"text_preview": "with an SCG with an sk-Counter even when no DRB is setup using the secondary key (SK ) in order to allow the configuration of SRB3. The network can also provide the UE with an sk-Counter, even if no gNB SCG is configured, when using SN terminated MCG bearers. 5.3.2 Paging 5.3.2.1 General !\" #$%&'E) !\"#$%# Figure 5.3.2.1-1: Paging The purpose of this procedure is: - to transmit paging information to a UE in RRC_IDLE or RRC_INACTIVE. - to transmit paging information for a L2 U2N Remote UE in RRC_I...",
"text": "with an SCG with an sk-Counter even when no DRB is setup using the secondary key (SK ) in order to allow the configuration of SRB3. The network can also provide the UE with an sk-Counter, even if no gNB SCG is configured, when using SN terminated MCG bearers. 5.3.2 Paging 5.3.2.1 General !\" #$%&'E) !\"#$%# Figure 5.3.2.1-1: Paging The purpose of this procedure is: - to transmit paging information to a UE in RRC_IDLE or RRC_INACTIVE. - to transmit paging information for a L2 U2N Remote UE in RRC_IDLE or RRC_INACTIVE to its serving L2 U2N Relay UE in any RRC state. 5.3.2.2 Initiation The network initiates the paging procedure by transmitting the Paging message at the UE's paging occasion as specified in TS 38.304 [20]. The network may address multiple UEs within a Paging message by including one PagingRecord for each UE. The network may also include one or multiple TMGI(s) in the Paging message to page UEs for specific MBS multicast session(s). 5.3.2.3 Reception of the Paging message by the UE or PagingRecord by the L2 U2N Remote UE Upon receiving the Paging message by the UE or receiving PagingRecord from its connected parent L2 U2N Relay UE by a L2 U2N Remote UE, the UE shall: 1> if in RRC_IDLE, for each of the PagingRecord, if any, included in the Paging message, or 1> if in RRC_IDLE, for the PagingRecord, if any, included in the UuMessageTransferSidelink message received from the connected parent L2 U2N Relay UE: 2> if the ue-Identity included in the PagingRecord matches the UE identity allocated by upper layers: 3> if upper layers indicate the support of paging cause: 4> forward the ue-Identity, accessType (if present) and paging cause (if determined) to the upper layers; 3> else: 4> forward the ue-Identity and accessType (if present) to the upper layers; NOTE 1: If the parent L2 U2N Relay UE supports the MUSIM feature, it can forward the paging cause to the connected L2 U2N Remote UE or to the child UE. 1> if in RRC_INACTIVE, for each of the PagingRecord, if any, included in the Paging message, or 3GPP Release 19 82 3GPP TS 38.331 V19.1.0 (2025-12) 1> if in RRC_INACTIVE, for the PagingRecord, if any, included in the UuMessageTransferSidelink message received from the connected parent L2 U2N Relay UE: 2> if the ue-Identity included in the PagingRecord matches the UE's stored fullI-RNTI: 3> if the UE is configured by upper layers with Access Identity 1: 4> initiate the RRC connection resumption procedure according to 5.3.13 with resumeCause set to mpsPriorityAccess; 3> else if the UE is configured by upper layers with Access Identity 2: 4> initiate the RRC connection resumption procedure according to 5.3.13 with resumeCause set to mcsPriorityAccess; 3> else if the UE is configured by upper layers with one or more Access Identities equal to 11-15: 4> initiate the RRC connection resumption procedure according to 5.3.13 with resumeCause set to highPriorityAccess; 3> else if mt-SDT indication was included in the Paging message and if the conditions for initiating SDT for a resume procedure initiated in response to RAN paging according to 5.3.13.1b are fulfilled: 4> if pagingGroupList was not included in the Paging message; or 4> if pagingGroupList was included in the Paging message but the UE has not joined any"
},
{
"rank": 2,
"chunk_id": "chunk_000073",
"score": 19.350720655498968,
"page_start": 82,
"page_end": 83,
"text_preview": "Access Identities equal to 11-15: 4> initiate the RRC connection resumption procedure according to 5.3.13 with resumeCause set to highPriorityAccess; 3> else if mt-SDT indication was included in the Paging message and if the conditions for initiating SDT for a resume procedure initiated in response to RAN paging according to 5.3.13.1b are fulfilled: 4> if pagingGroupList was not included in the Paging message; or 4> if pagingGroupList was included in the Paging message but the UE has not joined ...",
"text": "Access Identities equal to 11-15: 4> initiate the RRC connection resumption procedure according to 5.3.13 with resumeCause set to highPriorityAccess; 3> else if mt-SDT indication was included in the Paging message and if the conditions for initiating SDT for a resume procedure initiated in response to RAN paging according to 5.3.13.1b are fulfilled: 4> if pagingGroupList was not included in the Paging message; or 4> if pagingGroupList was included in the Paging message but the UE has not joined any MBS session(s) indicated by the TMGI(s) included in the pagingGroupList; or 4> if pagingGroupList was included in the Paging message, all the MBS session(s) indicated by the TMGI(s) included in the pagingGroupList that the UE has joined are configured to be received in RRC_INACTIVE, and inactiveReceptionAllowed was included for all these MBS session(s): 5> initiate the RRC connection resumption procedure according to 5.3.13 with resumeCause set to mt-SDT; NOTE 1a: If a UE receives a Paging message including mt-SDT indication and inactiveReceptionAllowed indications for all the multicast session(s) the UE has joined and the UE initiates RRC connection resume, the UE starts monitoring the corresponding G-RNTI(s), if configured, and if multicast MCCH is present, the UE starts monitoring the Multicast MCCH-RNTI and acquires the MBSMulticastConfiguration message on multicast MCCH. 4> else: 5> initiate the RRC connection resumption procedure according to 5.3.13 with resumeCause set to mt-Access; 3> else: 4> initiate the RRC connection resumption procedure according to 5.3.13 with resumeCause set to mtAccess; NOTE 2: If both conditions for initiating MT-SDT and MO-SDT according to 5.3.13.1b are fulfilled, UE may initiate RRC connection resumption procedure for MT-SDT or MO-SDT based on implementation. NOTE 3: A MUSIM UE may not initiate the RRC connection resumption procedure, e.g. when it decides not to respond to the Paging message due to UE implementation constraints as specified in TS 24.501 [23]. 2> else if the ue-Identity included in the PagingRecord matches the UE identity allocated by upper layers: 3> if upper layers indicate the support of paging cause: 4> forward the ue-Identity, accessType (if present) and paging cause (if determined) to the upper layers; 3> else: 4> forward the ue-Identity and accessType (if present) to the upper layers; 3GPP Release 19 83 3GPP TS 38.331 V19.1.0 (2025-12) 3> perform the actions upon going to RRC_IDLE as specified in 5.3.11 with release cause 'other'; 1> if in RRC_IDLE, for each TMGI included in pagingGroupList, if any, included in the Paging message: 2> if the UE has joined an MBS session indicated by the TMGI included in the pagingGroupList: 3> forward the TMGI to the upper layers; 1> if in RRC_INACTIVE and the UE has joined one or more MBS session(s) indicated by the TMGI(s) included in the pagingGroupList: 2> if PagingRecordList is not included in the Paging message; or 2> if none of the ue-Identity included in any of the PagingRecord matches the UE identity allocated by upper layers or the UE's stored fullI-RNTI: 3> if the UE is not configured to receive multicast in RRC_INACTIVE for at least one of the MBS sessions indicated by the TMGI(s) included in pagingGroupList that the UE has joined; or 3> if inactiveReceptionAllowed is not included for at least one of the MBS sessions indicated by the TMGI(s) included in pagingGroupList that the UE has joined: 4> initiate"
},
{
"rank": 3,
"chunk_id": "chunk_000428",
"score": 18.52757140063668,
"page_start": 466,
"page_end": 468,
"text_preview": "L2 U2U Relay UE shall: 1> if the RemoteUEInformationSidelink includes the sl-DestinationIdentityRemoteUE: 2> consider the end-to-end PC5 connection release for the end-to-end PC5 connection between the L2 U2U Remote UE and the peer L2 U2U Remote UE identified by sl-DestinationIdentityRemoteUE; 2> initiate the end-to-end PC5 connection failure/release related actions as specified in 5.8.9.3b; 3GPP Release 19 467 3GPP TS 38.331 V19.1.0 (2025-12) 5.8.9.9 Uu message transfer in sidelink 5.8.9.9.1 Ge...",
"text": "L2 U2U Relay UE shall: 1> if the RemoteUEInformationSidelink includes the sl-DestinationIdentityRemoteUE: 2> consider the end-to-end PC5 connection release for the end-to-end PC5 connection between the L2 U2U Remote UE and the peer L2 U2U Remote UE identified by sl-DestinationIdentityRemoteUE; 2> initiate the end-to-end PC5 connection failure/release related actions as specified in 5.8.9.3b; 3GPP Release 19 467 3GPP TS 38.331 V19.1.0 (2025-12) 5.8.9.9 Uu message transfer in sidelink 5.8.9.9.1 General !\"#$%\"&'E !\")*+&'E !\"#$%%&'$()&*%+$),-.$/-*M Figure 5.8.9.9.1-1: Uu message transfer in sidelink The purpose of this procedure is to transfer Paging message and System Information from the L2 U2N Relay UE to the L2 U2N Remote UE (in case of single hop) or to the Child UE (in case of multi hop) in RRC_IDLE/RRC_INACTIVE. 5.8.9.9.2 Actions related to transmission of UuMessageTransferSidelink message The L2 U2N Relay UE initiates the Uu message transfer procedure when at least one of the following conditions is met: 1> upon receiving Paging message related to the connected L2 U2N Remote UE or the Child UE from network or Parent relay UE (including Paging message within RRCReconfiguration message); 1> upon acquisition of the SIB(s) requested by the connected L2 U2N Remote UE or by the Child UE (as indicated in sl-RequestedSIB-List in the RemoteUEInformationSidelink) or upon receiving the updated SIB(s) from network or Parent relay UE which has been requested by the connected L2 U2N Remote UE or by the Child UE; 1> upon acquisition of the posSIB(s) requested by the connected L2 U2N Remote UE or by the Child UE (as indicated in sl-RequestedPosSIB-List in the RemoteUEInformationSidelink) or upon receiving the updated posSIB(s) from network or Parent relay UE which have been requested by the connected L2 U2N Remote UE or by the Child UE; 1> upon unsolicited SIB1 forwarding to the connected L2 U2N Remote UE or by the Child UE or upon receiving the updated SIB1 from network or Parent relay UE; For each associated L2 U2N Remote UE or for each associated Child UE, the L2 U2N Relay UE shall set the contents of UuMessageTransferSidelink message as follows: 1> include sl-PagingDelivery if the Paging message received from network or Parent relay UE containing the ueIdentity of the L2 U2N Remote UE; 1> include sl-SIB1-Delivery if any of the conditions for initiating Uu message transfer procedure related to SIB1 are met; 1> include sl-SystemInformationDelivery if any of the conditions for initiating Uu message transfer procedure related to System Information are met; 1> submit the UuMessageTransferSidelink message to lower layers for transmission. NOTE: The L2 U2N Relay UE may perform unsolicited forwarding of SIB1 to the L2 U2N Remote UE or to the Child UE based on UE implementation. A L2 U2N Remote UE configured with MP does not apply the SIB1 received from the L2 U2N Relay UE on the indirect path, if any. 5.8.9.9.3 Reception of the UuMessageTransferSidelink by the L2 U2N Remote UE Upon receiving the UuMessageTransferSidelink message, the L2 U2N Remote UE shall: 1> if sl-PagingDelivery is included: 2> perform the paging reception procedure as specified in clause 5.3.2.3; 1> if sl-SystemInformationDelivery and/or sl-SIB1-Delivery is included: 2> perform the actions specified in clause 5.2.2.4. 3GPP Release 19 468 3GPP TS 38.331 V19.1.0 (2025-12) 5.8.9.9.4 Reception of the UuMessageTransferSidelink by the L2 Intermediate U2N Relay UE Upon receiving the UuMessageTransferSidelink message from the"
},
{
"rank": 4,
"chunk_id": "chunk_000464",
"score": 18.454507064821385,
"page_start": 501,
"page_end": 502,
"text_preview": "services in RRC_INACTIVE. 5.10.2.2 Initiation If configured to receive MBS multicast services in RRC_INACTIVE, a UE applies the multicast MCCH information acquisition procedure for PTM configuration update and upon selection to a new cell (i.e., different from the cell where the UE was configured to receive multicast in RRC_CONNECTED) providing SIB24 or reselection to a cell providing SIB24 (except in case the UE is aware that the multicast sessions that the UE has joined are not available for R...",
"text": "services in RRC_INACTIVE. 5.10.2.2 Initiation If configured to receive MBS multicast services in RRC_INACTIVE, a UE applies the multicast MCCH information acquisition procedure for PTM configuration update and upon selection to a new cell (i.e., different from the cell where the UE was configured to receive multicast in RRC_CONNECTED) providing SIB24 or reselection to a cell providing SIB24 (except in case the UE is aware that the multicast sessions that the UE has joined are not available for RRC_INACTIVE in the new cell) and may apply the multicast MCCH information acquisition procedure upon receiving GroupPaging in the paging message. A UE that is receiving MBS multicast data in RRC_INACTIVE shall apply the multicast MCCH information acquisition procedure upon receiving a notification that the multicast MCCH information has changed. NOTE: It is up to UE implementation how to address a possibility of the UE missing a multicast MCCH change notification. Unless explicitly stated otherwise in the procedural specification, the multicast MCCH information acquisition procedure overwrites any stored multicast MCCH information as specified in 5.10.1.1, i.e. delta configuration is not applicable for multicast MCCH information and the UE discontinues using a field if it is absent in multicast MCCH information. 3GPP Release 19 502 3GPP TS 38.331 V19.1.0 (2025-12) 5.10.2.3 Multicast MCCH information acquisition by the UE A UE configured to receive an MBS multicast service in RRC_INACTIVE shall: 1> if the procedure is triggered by a multicast MCCH information change notification: 2> start acquiring the MBSMulticastConfiguration message on multicast MCCH in the concerned cell from the slot in which the change notification was received; 1> if the UE selects a new cell (i.e., different from the cell where the UE was configured to receive multicast in RRC_CONNECTED) providing SIB24 or reselects a cell providing SIB24; or 1> if the procedure is triggered by the reception of GroupPaging in the paging message in 5.3.2.3; or 1> if the UE receives RRCRelease configuring the UE to receive MBS multicast in RRC_INACTIVE which does not include PTM configuration for at least one multicast session for which the UE is not indicated to stop monitoring the G-RNTI: 2> acquire the MBSMulticastConfiguration message on multicast MCCH in the concerned cell at the next repetition period. 5.10.2.4 Actions upon reception of the MBSMulticastConfiguration message No UE requirements related to the contents of the MBSMulticastConfiguration message apply other than those specified elsewhere, e.g., within the corresponding field descriptions. 5.10.3 MRB configuration 5.10.3.1 General The multicast MRB configuration procedure is used by the UE in RRC_INACTIVE state to configure PDCP, RLC, MAC entities and the physical layer upon PTM configuration update and moving to a cell providing SIB24. The UE may perform multicast MRB modification or release/establishment when PTM configuration is updated via multicast MCCH or when it moves to a cell where the PDCP COUNT of the corresponding multicast MRB is not synchronized within the RNA. The UE may perform multicast MRB modification when it moves to a cell where the PDCP COUNT of the corresponding multicast MRB is synchronized within the RNA. The UE resets MAC upon cell selection or reselection. NOTE: How to perform modification of a multicast MRB which is already configured in the UE is left to UE implementation. Upon moving to a cell where the PDCP COUNT of a multicast MRB is not"
},
{
"rank": 5,
"chunk_id": "chunk_000196",
"score": 17.954056393260363,
"page_start": 220,
"page_end": 221,
"text_preview": "connection resume is initiated at the L2 U2N Relay UE upon reception of a message from a L2 U2N Remote UE via SL-RLC0 or SL-RLC1, or upon reception of RemoteUEInformationSidelink message containing the connectionForMP). The interaction with NAS is left to UE implementation. 5.3.13.1b Conditions for initiating SDT When requesting lower layers to check the conditions for initiating SDT, RRC indicates to lower layers whether the resume procedure is initiated for mobile originated or mobile terminat...",
"text": "connection resume is initiated at the L2 U2N Relay UE upon reception of a message from a L2 U2N Remote UE via SL-RLC0 or SL-RLC1, or upon reception of RemoteUEInformationSidelink message containing the connectionForMP). The interaction with NAS is left to UE implementation. 5.3.13.1b Conditions for initiating SDT When requesting lower layers to check the conditions for initiating SDT, RRC indicates to lower layers whether the resume procedure is initiated for mobile originated or mobile terminated case. A UE in RRC_INACTIVE initiates the resume procedure for SDT when all of the following conditions are fulfilled: 1> for the resume procedure initiated by the upper layers (i.e. mobile originated case): 2> SIB1 includes sdt-ConfigCommon; and 2> sdt-Config is configured; and 2> all the pending data in UL is mapped to the radio bearers configured for SDT; and 2> for an (e)RedCap UE when RedCap-specific initial downlink BWP includes no CD-SSB, ncd-SSBRedCapInitialBWP-SDT is configured; and 2> lower layers indicate that conditions for initiating MO-SDT as specified in TS 38.321 [3] are fulfilled. 1> for the resume procedure initiated in response to RAN paging (i.e. mobile terminated case): 2> lower layers indicate that conditions for initiating MT-SDT as specified in TS 38.321 [3] are fulfilled. NOTE: How the UE determines that all pending data in UL is mapped to radio bearers configured for SDT is left to UE implementation. 5.3.13.1c Void 5.3.13.1d Conditions for resuming RRC connection for multicast reception In RRC_INACTIVE state, if configured with MBS multicast reception in RRC_INACTIVE, the UE shall: 1> if the RRC connection resume procedure is triggered for multicast reception at reception of SIB1, as specified in 5.2.2.4.2; or 1> if the RRC connection resume procedure is triggered for multicast reception at reception of Paging message, as specified in 5.3.2.3; or 1> if the PTM configuration is not available on the multicast MCCH in the new cell after cell selection (i.e., different from the cell where the UE was configured to receive multicast in RRC_CONNECTED) or in the cell after cell reselection for at least one multicast session that the UE has joined and for which the UE is not indicated to stop monitoring the G-RNTI; or 1> if mbs-NeighbourCellList included in MBSMulticastConfiguration acquired in the previous cell indicates that at least one multicast session that the UE has joined and for which the UE is not indicated to stop monitoring the GRNTI, is not provided for RRC_INACTIVE in the current serving cell; or 1> if either the measured RSRP or RSRQ for serving cell as specified in TS 38.304 [20] is below the corresponding threshold indicated by thresholdIndex for a multicast session that the UE has joined and for which the UE is not indicated to stop monitoring the G-RNTI: 3GPP Release 19 221 3GPP TS 38.331 V19.1.0 (2025-12) 2> initiate RRC connection resume procedure as specified in 5.3.13.2 with resumeCause set as below: 3> if the UE is configured by upper layers with Access Identity 1: 4> set resumeCause to mps-PriorityAccess; 3> else if the UE is configured by upper layers with Access Identity 2: 4> set resumeCause to mcs-PriorityAccess; 3> else if the UE is configured by upper layers with one or more Access Identities equal to 11-15: 4> set resumeCause to highPriorityAccess; 3> else: 4> set resumeCause to mt-Access. 5.3.13.2 Initiation The UE"
}
]
}