2.1. Weight dimension and value per HS6

Index(['bk_transaction_id', 'si_transaction_id', 'revision_date', 'carrier', 'requester', 'requester_street', 'requester_city', 'requester_state', 'requester_postal_code', 'requester_country', 'shipper', 'shipper_street', 'shipper_city', 'shipper_state', 'shipper_postal_code', 'shipper_country', 'forwarder', 'forwarder_street', 'forwarder_city', 'forwarder_state', 'forwarder_postal_code', 'forwarder_country', 'consignee', 'consignee_street', 'consignee_city', 'consignee_state', 'consignee_postal_code', 'consignee_country', 'contract_party', 'contract_party_street', 'contract_party_city', 'contract_party_state', 'contract_party_postal_code', 'contract_party_country', 'notify_party', 'notify_party_street', 'notify_party_city', 'notify_party_state', 'notify_party_postal_code', 'notify_party_country', 'second_notify_party', 'second_notify_party_street', 'second_notify_party_city', 'second_notify_party_state', 'second_notify_party_postal_code', 'second_notify_party_country', 'third_notify_party', 'third_notify_party_street', 'third_notify_party_city', 'third_notify_party_state', 'third_notify_party_postal_code', 'third_notify_party_country', 'move_type', 'pol_vessel_name', 'pol_conveyance_number', 'pol_city_unlocode', 'pol_city_name', 'pol_country_name', 'pol_etd', 'pod_vessel_name', 'pod_conveyance_number', 'pod_city_unlocode', 'pod_city_name', 'pod_country_name', 'pod_eta', 'plor_city_unlocode', 'plor_city_name', 'plor_country_name', 'plor_etd', 'plod_city_unlocode', 'plod_city_name', 'plod_country_name', 'plod_eta', 'line_item_id', 'package_count', 'cargo_description', 'harmonized_system_code', 'ncm_code', 'schedule_b', 'hazmat_undg_number', 'hazmat_imo_class', 'weight_kg', 'allocated_container_count', 'allocated_teu_count'], dtype='object')

Now, it is possible to filter rows according to certain values in column 'weight_kg' and showing the remaining rows.

filter = df1['weight_kg'] < 100000

df1.loc[filter]

This will return a smaller data frame in terms of number of rows.

Another possibility, it is to filter rows according to certain values in column 'weight_kg' and show the distribution for column 'weight_kg'.

weight = df1.loc[filter]['weight_kg']

weight.hist()

This will produce the following graph:

A very useful library is fitter that automatically searches over a certain range of probability distribution and tries to find the one that best fits the data. In the next section, we will see how to install and use it.

The previous complete code is available in the following link:

https://colab.research.google.com/drive/1Xo-2dWDgL-gmDJH3QmB6b4YMlntgQqtu?usp=sharing

Page updated

Google Sites

Report abuse