i       /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
i                         AuraSearch Features                           /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iCurrent State of Features      /       auragem.ddns.net        70
i-------------------------      /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
i* Full Text Search of document metadata, with porter stemming. /       auragem.ddns.net        70
i* + and - operators, for a required term, or excluded term,    /       auragem.ddns.net        70
irespectively.  /       auragem.ddns.net        70
i* Title extraction using first apparent heading, regardless of its     /       auragem.ddns.net        70
ilevel. /       auragem.ddns.net        70
i* Gemsub feed detection.       /       auragem.ddns.net        70
i* Line counts. /       auragem.ddns.net        70
i* Indexed publication dates based on dates in filenames.       /       auragem.ddns.net        70
i* File size information.       /       auragem.ddns.net        70
i* Indexed Mp3, Ogg, and Flac file metadata (ID3, MP4, and Ogg/Flac).   /       auragem.ddns.net        70
i* Aggregator based on search engine index.     /       auragem.ddns.net        70
i* Wildcards: * and ?   /       auragem.ddns.net        70
i* Crawler: Robots.txt is followed, including "Allow", "Disallow", and  /       auragem.ddns.net        70
i"Crawl-Delay" directives. The Slow Down gemini status code is also     /       auragem.ddns.net        70
ifollowed.      /       auragem.ddns.net        70
i* Crawler: 2 second delay between crawling of pages on the same        /       auragem.ddns.net        70
idomain.        /       auragem.ddns.net        70
i* Parses gemtext, spartan text, nex listings, scrolltext.      /       auragem.ddns.net        70
i* Partial markdown parsing.    /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iOutdated Features:     /       auragem.ddns.net        70
i* AND, OR, NOT, parentheses grouping, and quotes       /       auragem.ddns.net        70
i* Filters: "TITLE", "URL", "ALBUM", "ARTIST", "ALBUMARTIST",   /       auragem.ddns.net        70
i"COPYRIGHT", "CONTENTTYPE", "LANGUAGE", and "PUBLISHDATE". The syntax  /       auragem.ddns.net        70
iis "field: term". Field names must be in all capital letters.  /       auragem.ddns.net        70
i* Fuzzy Searching by placing ~ after a search term     /       auragem.ddns.net        70
i* Proximity Searching: if you want to search for two words that are    /       auragem.ddns.net        70
iwithin a distance of 10 words of each other, then query with "term_one /       auragem.ddns.net        70
iterm_two"~10   /       auragem.ddns.net        70
i* Range Searching: For searching in ranges of numbers or dates. Can    /       auragem.ddns.net        70
ibe used with filters, like the PUBLISHDATE filter. An example of       /       auragem.ddns.net        70
ifiltering based on a publication date range would be,  /       auragem.ddns.net        70
iPUBLISHDATE:[20220101 to 20231201]     /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iFeatures Coming Soon   /       auragem.ddns.net        70
i--------------------   /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
i* PDF and Djvu file metadata indexed   /       auragem.ddns.net        70
i* Image file metadata indexed  /       auragem.ddns.net        70
i* Plain text file full contents indexed        /       auragem.ddns.net        70
i* Backlinks and searching of link text /       auragem.ddns.net        70
i* Page Metadata Lookup /       auragem.ddns.net        70
i* Full Markdown, Tinylog, and Twtxt parsing to get links, titles, and  /       auragem.ddns.net        70
iheading information.   /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iHistory        /       auragem.ddns.net        70
i-------        /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iAuraGem was a search engine that I started about 2 years ago under     /       auragem.ddns.net        70
iits original name, Ponix Search. It was originally designed to /       auragem.ddns.net        70
iexperiment with how I could make search results better. The official   /       auragem.ddns.net        70
iannouncement of the Search Engine happened on 2021-07-01:      /       auragem.ddns.net        70
h2021-07-01 Search Engine & Ponix Capsule Now Open Source (MIT) URL:gemini://auragem.ddns.net/devlog/20210701.gmi       auragem.ddns.net        70
02021-12-05 AuraGem Search Begins Crawling Again        /g/search/devlog/20211205.gmi   auragem.ddns.net        70
i       /       auragem.ddns.net        70
iNote that some of the information in the above posts have been /       auragem.ddns.net        70
irecently updated to match the current URL and Ip Address of the        /       auragem.ddns.net        70
icrawler and gemini capsule.    /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iOne of the first priorities with AuraSearch was to have extraction of  /       auragem.ddns.net        70
ifile metadata for as many files as possible. Audio files were one of   /       auragem.ddns.net        70
ithe first to get this feature. PDFs and Djvu files were supposed to be /       auragem.ddns.net        70
inext, and support was added for them on 2022-07-19, but the feature    /       auragem.ddns.net        70
iwas buggy and never worked, unfortunately. As you can see in the below /       auragem.ddns.net        70
ipost, I chose to go with Keyword Extraction (which was later removed   /       auragem.ddns.net        70
iand replaced with simple mentions and tags extraction) instead of Full /       auragem.ddns.net        70
iText Searching on page contents. Part of this was to save space, and   /       auragem.ddns.net        70
ipart of it was to respect copyright. However, I am rethinking this     /       auragem.ddns.net        70
iapproach now that the Stats page can determine how large the text-only /       auragem.ddns.net        70
iportion of geminispace is (no more than 5GB total).    /       auragem.ddns.net        70
h2022-07-19 AuraGem Search Engine Update        URL:gemini://auragem.ddns.net/devlog/20220719.gmi       auragem.ddns.net        70
1Stats Page     /g/search/stats/        auragem.ddns.net        70
i       /       auragem.ddns.net        70
iIn the above article, you can see that I start to play with the        /       auragem.ddns.net        70
inotion of different types of searches. I think this idea remains       /       auragem.ddns.net        70
iimportant today:       /       auragem.ddns.net        70
i> Another problem that the above process would not catch are names and /       auragem.ddns.net        70
i> proper nouns. These are often very important words that people would /       auragem.ddns.net        70
i> want to search for (e.g. Mathematics, C++, Celine Dion, FTS). I do not       /       auragem.ddns.net        70
i> have an easy method for this atm.    /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iThe next update on 2022-07-21 added Full Text Searching of link and    /       auragem.ddns.net        70
ifile metadata, which drastically improved the speed of searches. Yes,  /       auragem.ddns.net        70
ithis came with stemming because my database's FTS uses Lucene++.       /       auragem.ddns.net        70
h2022-07-21 AuraGem Search Update       URL:gemini://auragem.ddns.net/devlog/20220720_search.gmi        auragem.ddns.net        70
i       /       auragem.ddns.net        70
iNot long after I wrote an article about FTS, ranking systems, and      /       auragem.ddns.net        70
isome of the problems that Search Engines have to handle:       /       auragem.ddns.net        70
h2022-07-22 Search Engine Ranking Systems Are Being Left Unquestioned   URL:gemini://auragem.ddns.net/devlog/20220722.gmi       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iThe most important portion of this article, however, is recognizing    /       auragem.ddns.net        70
ihow people do searches:        /       auragem.ddns.net        70
i> This also introduces the argument that the ranking systems are really        /       auragem.ddns.net        70
i> only important for underspecified queries (broad queries), so the    /       auragem.ddns.net        70
i> emphasis on the problems with ranking algorithms is unwarranted. This        /       auragem.ddns.net        70
i> argument hardly makes sense when the majority of searches that people        /       auragem.ddns.net        70
i> make are broad. I would also argue that broad searches are most used /       auragem.ddns.net        70
i> for *discovering* pages, not for getting to a specific page. However,        /       auragem.ddns.net        70
i> ranking based on popularity prioritizes what it thinks people would  /       auragem.ddns.net        70
i> want, which is more suited for specific searches using broad queries,        /       auragem.ddns.net        70
i> at the expense of discovery of broad topics. Broad discovery using   /       auragem.ddns.net        70
i> broad topic queries and specific searches using proper-noun queries or       /       auragem.ddns.net        70
i> very specific queries are both much better ways of dealing with      /       auragem.ddns.net        70
i> searches without relying on popularity.      /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iWhen making a search engine, one must balance the search results       /       auragem.ddns.net        70
ibetween discovery (broadness) and exact matches (exactness). Relevancy /       auragem.ddns.net        70
iapplies to both of these, but is more important for discovery. I       /       auragem.ddns.net        70
icontinue to think that link analysis assumes that people want exact    /       auragem.ddns.net        70
imatches of pages while using broad queries. For example, if someone    /       auragem.ddns.net        70
itypes in "search engine", a PageRank system would put the most popular /       auragem.ddns.net        70
isearch engine at the top along with popular articles about search      /       auragem.ddns.net        70
iengines, assuming that the person wanted that specific search engine,  /       auragem.ddns.net        70
iwhen it's more likely they wanted a collection of search engines.      /       auragem.ddns.net        70
iRather, my approach is to return broad relevant discovery-based        /       auragem.ddns.net        70
iresults with broad queries, and exact pages with exact queries.        /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iExact queries include words from titles, domain names, capsule names,  /       auragem.ddns.net        70
iservice names, basically mainly proper nouns or a specific combination /       auragem.ddns.net        70
iof words that matches the page information. Broad queries, however,    /       auragem.ddns.net        70
iuse category names and common nouns.   /       auragem.ddns.net        70
i       /       auragem.ddns.net        70
iWhen I type "Station", I want an exact match for Station itself.       /       auragem.ddns.net        70
iHowever, when I type "social network", I want search results that give /       auragem.ddns.net        70
ia very broad set of capsules that are social networks. I believe that  /       auragem.ddns.net        70
ithis is how most people would use search engines, especially if they   /       auragem.ddns.net        70
ido not rely much on filtering, and this is the exact methodology that  /       auragem.ddns.net        70
iI use for my article analyzing gemini's search engines:        /       auragem.ddns.net        70
h2022-08-07 Gemini Search Results Study, Part 1 URL:gemini://auragem.ddns.net/devlog/20220807.gmi       auragem.ddns.net        70