Required for this example:
Wikipedia download (warning it is a 9.9Gb file, extracts to about 42Gb)
Spring Data (Great Blog / Examples on Spring Data: Petri Kainulainen's blog)
All the code and unit test for this post is on my blog GitHub Repo
When setting up Solr from scratch, you can have a look at Solr's wiki or documentation, their documentation is pretty good. There is also an example of importing Wikipedia here, I started with that and made some minor modifications.
For this specific example the Solr config needed (
For this example (and in the below config files),
Solr home: /Development/Solr
Index / Data: /Development/Data/solr_data/wikipedia
Import File: /Development/Data/enwiki-latest-pages-articles.xml
The full import into Solr took about 48 hours on my old 2011 i5 iMac and the index on my current setup is about 52Gb.
Data Config for the import:
The code for this ended up being quite clean, Spring Data - Solr, gives 2 main interfaces SolrIndexService, and SolrCrudRespository, you simply extend / implement these 2, wrap that in a single interface, autowire from a Spring Java context and you good to go.
Next thing for me to look at for sourcing data is Spring Social.
Looks like you've done some serious research for this, very informative post for programmers especially amateur programmers like me, keep up the good work, Hope to see more soon!ReplyDelete
very informative and knowledgeableReplyDelete
It is cool that you describe.ReplyDelete
Aivivu - đại lý chuyên vé máy bay trong nước và quốc tếReplyDelete
vé máy bay đi Mỹ giá rẻ
vé máy bay từ atlanta về việt nam
khi nào có chuyến bay từ đức về việt nam
ve may bay tu nga ve viet nam
khi nào có chuyến bay từ anh về việt nam
chuyến bay từ Paris về Hà Nội
chuyến bay chuyên gia
As always you explained very well about Wikipedia and Solr cache. We are providing Commercial Electrical Services Los Angeles CA that are reliable and trusted.ReplyDelete