Jason Stitt

Here's how to export your Opensearch index to a table or CSV

Opensearch (or Elasticsearch) is great for querying individual records. As a search system, it’s oriented toward queries looking for the top 10 or even top 100 of something. You can of course set up aggregates and scrolls to analyze data in bulk, but it often feels overly verbose to do ad-hoc.

Sometimes you need to be able to say: give me all of the records that are in here. Usually for some kind of administrative or QA purposes. It may be easier to work with the records in a tabular structure, e.g. in a SQL database, or in a CSV that you can import somewhere and share.

I found out that a Python library called awswrangler makes this pretty easy to do:

import awswrangler as wr

search_client = wr.opensearch.connect(
    host="my.opensearch.host",
    port=80,
)

results = wr.opensearch.search(
    search_client,
    index="my_index",
    search_body={"query": {"term": {"my_type.keyword": "fact"}}},
    size=10000,
)

results.to_csv("my_index.csv")

Behind the scenes, this is wrapping the Opensearch and Pandas libraries and is essentially an easy index-to-dataframe converter. It’s an efficient approach. Obviously it’ll depend on the size of your index and you may end up needing to stream, but I found it a fast way to dump everything.

© 2009-2024 Jason Stitt. These are my personal views and don't represent any past or present employer.