tdub.data¶
A module for handling our data.
Class Summary¶
|
A simple enum class for easily using region information. |
|
Describes a sample’s attritubes given it’s name. |
Function Summary¶
|
Convert input to |
|
Get the features to avoid for the given region. |
|
Get a list of branches from a data source. |
|
Categorize branches into a separated lists. |
|
Get the feature list for a region. |
|
Get a dictionary connecting sample processes to file lists. |
|
Get the numexpr selection string from an arbitrary selection. |
|
Get the ROOT selection string from an arbitrary selection. |
|
Construct the minimal set of branches required for a selection. |
|
Get the selection for a given region. |
Reference¶
-
class
tdub.data.
Region
(value)[source]¶ A simple enum class for easily using region information.
-
r1j1b
¶ Label for our 1j1b region.
-
r2j1b
¶ Label for our 2j1b region.
-
r2j2b
¶ Label for our 2j2b region.
Examples
Using this enum for grabing the
2j2b
region from a set of files:>>> from tdub.data import Region, selection_for >>> from tdub.frames import iterative_selection >>> df = iterative_selection(files, selection_for(Region.r2j2b))
-
static
from_str
(s)[source]¶ Get enum value for the given string.
This function supports three ways to define a region; prefixed with “r”, prefixed with “reg”, or no prefix at all. For example,
Region.r2j2b
can be retrieved like so:Region.from_str("r2j2b")
Region.from_str("reg2j2b")
Region.from_str("2j2b")
- Parameters
s (str) – String representation of the desired region
- Returns
Enum version
- Return type
Examples
>>> from tdub.data import Region >>> Region.from_str("1j1b") <Region.r1j1b: 0>
-
-
class
tdub.data.
SampleInfo
(input_file)[source]¶ Describes a sample’s attritubes given it’s name.
- Parameters
input_file (str) – File stem containing the necessary groups to parse.
Examples
>>> from tdub.data import SampleInfo >>> sampinfo = SampleInfo("ttbar_410472_AFII_MC16d_nominal.root") >>> sampinfo.phy_process ttbar >>> sampinfo.dsid 410472 >>> sampinfo.sim_type AFII >>> sampinfo.campaign MC16d >>> sampinfo.tree nominal
-
tdub.data.
as_region
(region)[source]¶ Convert input to
Region
.Meant to be similar to
numpy.asarray()
function.- Parameters
region (str or Region) – Region already as a Region or as a str
- Returns
Region representation.
- Return type
Examples
>>> from tdub.data import as_region, Region >>> as_region("r2j1b") <Region.r2j1b: 1> >>> as_region(Region.r2j2b) <Region.r2j2b: 2>
-
tdub.data.
avoids_for
(region)[source]¶ Get the features to avoid for the given region.
See the
tdub.config
module for definition of the variables to avoid (and how to modify them).- Parameters
region (str or tdub.data.Region) – Region to get the associated avoided branches.
- Returns
Features to avoid for the region.
- Return type
Examples
>>> from tdub.data import avoids_for, Region >>> avoids_for(Region.r2j1b) ['HT_jet1jet2', 'deltaR_lep1lep2_jet1jet2met', 'mass_lep2jet1', 'pT_jet2'] >>> avoids_for("2j2b") ['deltaR_jet1_jet2']
-
tdub.data.
branches_from
(source, tree='WtLoop_nominal', ignore_weights=False)[source]¶ Get a list of branches from a data source.
If the source is a list of files, the first file is the only file that is parsed.
- Parameters
source (str, list(str), os.PathLike, list(os.PathLike), or uproot File/Tree) – What to parse to get the branch information.
tree (str) – Name of the tree to get branches from
ignore_weights (bool) – Flag to ignore all branches starting with weight_.
- Returns
Branches from the source.
- Return type
- Raises
TypeError – If source can’t be used to find a list of branches.
Examples
>>> from tdub.data import branches_from >>> branches_from("/path/to/file.root", ignore_weights=True) ["pT_lep1", "pT_lep2"] >>> branches_from("/path/to/file.root") ["pT_lep1", "pT_lep2", "weight_nominal", "weight_tptrw"]
-
tdub.data.
categorize_branches
(source)[source]¶ Categorize branches into a separated lists.
The categories:
kinematics: for kinematic features (used for classifiers)
weights: for any branch that starts or ends with
weight
meta: for meta information (final state information)
- Parameters
source (list(str)) – Complete list of branches to be categorized.
- Returns
Dictionary connecting categories to their associated list of branchess.
- Return type
Examples
>>> from tdub.data import categorize_branches, branches_from >>> branches = ["pT_lep1", "pT_lep2", "weight_nominal", "weight_sys_jvt", "reg2j2b"] >>> cated = categorize_branches(branches) >>> cated["weights"] ['weight_sys_jvt', 'weight_nominal'] >>> cated["meta"] ['reg2j2b'] >>> cated["kinematics"] ['pT_lep1', 'pT_lep2']
Using a ROOT file:
>>> root_file = PosixPath("/path/to/file.root") >>> cated = categorize_branches(branches_from(root_file))
-
tdub.data.
features_for
(region)[source]¶ Get the feature list for a region.
See the
tdub.config
module for the definitions of the feature lists (and how to modify them).- Parameters
region (str or tdub.data.Region) – Region as a string or enum entry. Using
"ALL"
returns a list of unique features from all regions.- Returns
Features for that region (or all regions).
- Return type
Examples
>>> from pprint import pprint >>> from tdub.data import features_for >>> pprint(features_for("reg2j1b")) ['mass_lep1jet1', 'mass_lep1jet2', 'mass_lep2jet1', 'mass_lep2jet2', 'pT_jet2', 'pTsys_lep1lep2jet1jet2met', 'psuedoContTagBin_jet1', 'psuedoContTagBin_jet2']
-
tdub.data.
quick_files
(datapath, campaign=None, tree='nominal')[source]¶ Get a dictionary connecting sample processes to file lists.
The lists of files are sorted alphabetically. These types of samples are currently tested:
tW_DR (410648, 410649 full sim)
tW_DR_AFII (410648, 410649 fast sim)
tW_DR_PS (411038, 411039 fast sim)
tW_DR_inc (410646, 410647 full sim)
tW_DR_inc_AFII (410646, 410647 fast sim)
tW_DS (410656, 410657 full sim)
tW_DS_inc (410654, 410655 ful sim)
ttbar (410472 full sim)
ttbar_AFII (410472 fast sim)
ttbar_PS (410558 fast sim)
ttbar_PS713 (411234 fast sim)
ttbar_hdamp (410482 fast sim)
ttbar_inc (410470 full sim)
ttbar_inc_AFII (410470 fast sim)
Diboson
Zjets
MCNP
Data
- Parameters
datapath (str or os.PathLike) – Path where all of the ROOT files live.
campaign (str, optional) – Enforce a single campaign (“MC16a”, “MC16d”, or “MC16e”).
tree (str) – Upstream AnalysisTop ntuple tree.
- Returns
The dictionary of processes and their associated files.
- Return type
Examples
>>> from pprint import pprint >>> from tdub.data import quick_files >>> qf = quick_files("/path/to/some_files") ## has 410472 ttbar samples >>> pprint(qf["ttbar"]) ['/path/to/some/files/ttbar_410472_FS_MC16a_nominal.root', '/path/to/some/files/ttbar_410472_FS_MC16d_nominal.root', '/path/to/some/files/ttbar_410472_FS_MC16e_nominal.root'] >>> qf = quick_files("/path/to/some/files", campaign="MC16d") >>> pprint(qf["tW_DR"]) ['/path/to/some/files/tW_DR_410648_FS_MC16d_nominal.root', '/path/to/some/files/tW_DR_410649_FS_MC16d_nominal.root'] >>> qf = quick_files("/path/to/some/files", campaign="MC16a") >>> pprint(qf["Data"]) ['/path/to/some/files/Data15_data15_Data_Data_nominal.root', '/path/to/some/files/Data16_data16_Data_Data_nominal.root']
-
tdub.data.
selection_as_numexpr
(selection)[source]¶ Get the numexpr selection string from an arbitrary selection.
- Parameters
selection (str) – Selection string in ROOT or numexpr
- Returns
Selection in numexpr format.
- Return type
Examples
>>> selection = "reg1j1b == true && OS == true && mass_lep1jet1 < 155" >>> from tdub.data import selection_as_numexpr >>> selection_as_numexpr(selection) '(reg1j1b == True) & (OS == True) & (mass_lep1jet1 < 155)'
-
tdub.data.
selection_as_root
(selection)[source]¶ Get the ROOT selection string from an arbitrary selection.
- Parameters
selection (str) – The selection string in ROOT or numexpr
- Returns
The same selection in ROOT format.
- Return type
Examples
>>> selection = "(reg1j1b == True) & (OS == True) & (mass_lep1jet1 < 155)" >>> from tdub.data import selection_as_root >>> selection_as_root(selection) '(reg1j1b == true) && (OS == true) && (mass_lep1jet1 < 155)'
-
tdub.data.
selection_branches
(selection)[source]¶ Construct the minimal set of branches required for a selection.
- Parameters
selection (str) – Selection string in ROOT or numexpr
- Returns
Necessary branches/variables
- Return type
Examples
>>> from tdub.data import minimal_selection_branches >>> selection = "(reg1j1b == True) & (OS == True) & (mass_lep1lep2 > 100)" >>> minimal_branches(selection) {'OS', 'mass_lep1lep2', 'reg1j1b'} >>> selection = "reg2j1b == true && OS == true && (mass_lep1jet1 < 155)" >>> minimal_branches(selection) {'OS', 'mass_lep1jet1', 'reg2j1b'}
-
tdub.data.
selection_for
(region, additional=None)[source]¶ Get the selection for a given region.
We have three regions with a default selection (1j1b, 2j1b, and 2j2b), these are the possible argument options (in str or Enum form). See the
tdub.config
module for the definitions of the selections (and how to modify them).- Parameters
- Returns
Selection string in numexpr format.
- Return type
Examples
>>> from tdub.data import Region, selection_for >>> selection_for(Region.r2j1b) '(reg2j1b == True) & (OS == True)' >>> selection_for("reg1j1b") '(reg1j1b == True) & (OS == True)' >>> selection_for("2j2b") '(reg2j2b == True) & (OS == True)' >>> selection_for("2j2b", additional="minimaxmbl < 155") '((reg2j2b == True) & (OS == True)) & (minimaxmbl < 155)' >>> selection_for("2j1b", additional="mass_lep1jetb < 155 && mass_lep2jetb < 155") '((reg1j1b == True) & (OS == True)) & ((mass_lep1jetb < 155) & (mass_lep2jetb < 155))'