Welcome to ir-kit’s documentation!¶
Trec¶
Qrels¶
Functions and classes for dealing with English qrel files. The specification can be viewed at: http://trec.nist.gov/data/qrels_eng/
Usage:
>>> sample_qrels = '1 0 AP880212-0161 0\n1 0 AP880216-0139 1\n1 0 AP880216-0169 0'
>>> qrels = loads(sample_qrels)
>>> qrels.topic
['1', '1', '1']
>>> [x.topic for x in qrels['1']]
['1', '1', '1']
Harry Scells Mar 2017
-
class
irkit.trec.qrels.
Qrel
(topic: str, iteration: int, document_num: str, relevancy: int)¶ A line in a qrels file conforming to the specification at: http://trec.nist.gov/data/qrels_eng/
-
class
irkit.trec.qrels.
Qrels
(qrels: typing.List[irkit.trec.qrels.Qrel])¶ A python representation of a qrels file. It is a list of Qrel objects.
-
dump
(fp: _io.TextIOWrapper) → None¶ Dump the qrels to a file
Parameters: fp – A File pointer
-
dumps
() → str¶ Dump the qrels to a string.
Returns: Formatted qrels
-
-
irkit.trec.qrels.
load
(qrels: _io.TextIOWrapper) → irkit.trec.qrels.Qrels¶ Load qrels from a file.
Parameters: qrels – File pointer Returns: Qrels object
-
irkit.trec.qrels.
loads
(qrels: str) → irkit.trec.qrels.Qrels¶ Load qrels from a string.
Parameters: qrels – Some string representation of qrels Returns: Qrels object
Runs¶
Functions and classes for dealing with trec_eval run files. For a description on how run files look, see: http://faculty.washington.edu/levow/courses/ling573_SPR2011/hw/trec_eval_desc.htm
Usage:
>>> sample_run = '''351 0 DOC1 1 100 run-name\n351 0 DOC2 2 50 run-name'''
>>> runs = loads(sample_run)
>>> runs.dumps()
'351 Q0 DOC1 1 100 run-name
351 Q0 DOC2 2 50 run-name'
>>> [str(run) for run in runs.runs]
['351 Q0 DOC1 1 100 run-name', '351 Q0 DOC2 2 50 run-name']
>>> [str(run) for run in runs['351']]
['351 Q0 DOC1 1 100 run-name', '351 Q0 DOC2 2 50 run-name']
>>> runs.rank
['1', '2']
>>> TrecEvalRuns(runs['351']).rank
['1', '2']
Harry Scells May 2017
-
class
irkit.trec.run.
TrecEvalRun
(topic: str, q: int, doc_id: str, rank: int, score: float, run_id: str)¶ TrecEvalRun is a container class for a line in a trec_eval run file.
-
class
irkit.trec.run.
TrecEvalRuns
(runs: typing.List[irkit.trec.run.TrecEvalRun])¶ TrecEvalRuns is a wrapper around a TrecEvalRun which is just a container class for a line in a trec_eval run file. This class contains some convenience functions for dealing with runs, such as getting a list of the runs but topic id or slicing by column.
-
dump
(fp: _io.TextIOWrapper) → None¶ Dump the qrels to a file
Parameters: fp – A File pointer
-
dumps
() → str¶ Dump the qrels to a string.
Returns: Formatted qrels
-
-
irkit.trec.run.
load
(runs: _io.TextIOWrapper) → irkit.trec.run.TrecEvalRuns¶ Load a trec_eval run file.
Parameters: runs – A file pointer containing runs. Returns: TrecEvalRuns
-
irkit.trec.run.
loads
(runs: str) → irkit.trec.run.TrecEvalRuns¶ Load a trec_eval run file from a string.
Parameters: runs – A string containing runs. Returns: TrecEvalRuns
Results¶
Functions and classes for dealing with trec_eval results files.
Harry Scells Mar 2017
-
class
irkit.trec.results.
TrecEvalResults
(run_id: str, results: typing.Dict, queries: typing.Dict)¶ An object that stores the results output by trec_eval.
-
irkit.trec.results.
load
(trec_result_file: _io.TextIOWrapper) → irkit.trec.results.TrecEvalResults¶ Load trec_eval results from a file.
Parameters: trec_result_file – File pointer Returns: Qrels object
-
irkit.trec.results.
loads
(trec_results: str) → irkit.trec.results.TrecEvalResults¶ Load trec_eval results from a string.
Parameters: trec_results – Some string representation of trec results Returns: TrecEvalResults object
Query¶
Elasticsearch¶
Functions and classes for dealing with ElasticSearch queries.
Harry Scells Jun 2017
-
class
irkit.query.elasticsearch.
Visitor
(node_name: str)¶ Visitor interface for performing analysis on an ElasticSearch query. Implement the visit method for the node to perform some action. A visitor can either store a result to be returned by the traverse function, or
-
visit
(node: dict)¶ Implement this class to visit nodes in an ElasticSearch query. When implementing for traverse, place the result in self.result. When implementing for transform, return the transformed node as a dict.
Parameters: node – A node in the ElasticSearch query. Returns: See description.
-
-
irkit.query.elasticsearch.
transform
(query: dict, visitor: irkit.query.elasticsearch.Visitor) → dict¶ Transform the query using a visitor. Visitors used by this function modify the query in-place and should not return anything. The following example will transform all of the must queries to must_not queries:
>>> class ExampleVisitor(Visitor): ... def __init__(self, node_name: str): ... super().__init__(node_name) ... ... def visit(self, node: dict): ... node['must_not'] = node[self.node_name] ... del node[self.node_name] >>> visitor = ExampleVisitor('must') >>> transform({'query': {'must': {'match': 'example'}}}, visitor) {'query': {'must_not': {'match': 'example'}}}
Parameters: - query – An ElasticSearch query.
- visitor – An implemented Visitor class.
Returns: A modified query
-
irkit.query.elasticsearch.
traverse
(query: dict, visitor: irkit.query.elasticsearch.Visitor) → typing.Generic¶ Traverse down the query tree using the specified visitor. Visitors used by this function cannot modify the query. Instead they must store their return value into the result value on the visitor. This is useful for getting statistics about a query. The following example extracts all of the keywords in match queries:
>>> class ExampleVisitor(Visitor): ... def __init__(self, node_name: str): ... super().__init__(node_name) ... self.result = [] ... ... def visit(self, node: dict): ... self.result.append(node[self.node_name]) >>> visitor = ExampleVisitor('match') >>> traverse({'query': {'must': {'match': 'example'}}}, visitor) ['example']
Parameters: - query – An ElasticSearch query.
- visitor – An implemented Visitor class.
Returns: The value stored in visitor.result.
Plot¶
Various plotting functions.
Harry Scells Mar 2017
-
irkit.plot.trecplot.
pr_curve
(results: typing.List[irkit.trec.results.TrecEvalResults]) → <module 'matplotlib.pyplot' from '/home/docs/checkouts/readthedocs.org/user_builds/ir-kit/envs/latest/local/lib/python3.5/site-packages/matplotlib-2.0.2-py3.5-linux-x86_64.egg/matplotlib/pyplot.py'>¶ Create a precision-recall graph from trec_eval results.
Parameters: results – A list of TrecEvalResults files. Returns: a matplotlib plt object
-
irkit.plot.trecplot.
topic_ap
(results: typing.List[irkit.trec.results.TrecEvalResults], sort_on_ap=False)¶ Create an average-precision topic visualisation.
Parameters: - results – A list of TrecEvalResults files.
- sort_on_ap – Should the visualisation be sorted using average precision?
Returns: a matplotlib plt object.
Usage¶
Command line tools:¶
For generating precision-recall curves and plotting the average precision of a topic there is trecplot:
trecplot --help
Libraries¶
Dealing with trec-related files is done using the trec
package. This package contains classes for dealing with qrel
files, trec run files, and trec result files.