i / auragem.ddns.net 70
i / auragem.ddns.net 70
i AuraSearch Features / auragem.ddns.net 70
i / auragem.ddns.net 70
i / auragem.ddns.net 70
iCurrent State of Features / auragem.ddns.net 70
i------------------------- / auragem.ddns.net 70
i / auragem.ddns.net 70
i* Full Text Search of document metadata, with porter stemming. / auragem.ddns.net 70
i* + and - operators, for a required term, or excluded term, / auragem.ddns.net 70
irespectively. / auragem.ddns.net 70
i* Title extraction using first apparent heading, regardless of its / auragem.ddns.net 70
ilevel. / auragem.ddns.net 70
i* Gemsub feed detection. / auragem.ddns.net 70
i* Line counts. / auragem.ddns.net 70
i* Indexed publication dates based on dates in filenames. / auragem.ddns.net 70
i* File size information. / auragem.ddns.net 70
i* Indexed Mp3, Ogg, and Flac file metadata (ID3, MP4, and Ogg/Flac). / auragem.ddns.net 70
i* Aggregator based on search engine index. / auragem.ddns.net 70
i* Wildcards: * and ? / auragem.ddns.net 70
i* Crawler: Robots.txt is followed, including "Allow", "Disallow", and / auragem.ddns.net 70
i"Crawl-Delay" directives. The Slow Down gemini status code is also / auragem.ddns.net 70
ifollowed. / auragem.ddns.net 70
i* Crawler: 2 second delay between crawling of pages on the same / auragem.ddns.net 70
idomain. / auragem.ddns.net 70
i* Parses gemtext, spartan text, nex listings, scrolltext. / auragem.ddns.net 70
i* Partial markdown parsing. / auragem.ddns.net 70
i / auragem.ddns.net 70
iOutdated Features: / auragem.ddns.net 70
i* AND, OR, NOT, parentheses grouping, and quotes / auragem.ddns.net 70
i* Filters: "TITLE", "URL", "ALBUM", "ARTIST", "ALBUMARTIST", / auragem.ddns.net 70
i"COPYRIGHT", "CONTENTTYPE", "LANGUAGE", and "PUBLISHDATE". The syntax / auragem.ddns.net 70
iis "field: term". Field names must be in all capital letters. / auragem.ddns.net 70
i* Fuzzy Searching by placing ~ after a search term / auragem.ddns.net 70
i* Proximity Searching: if you want to search for two words that are / auragem.ddns.net 70
iwithin a distance of 10 words of each other, then query with "term_one / auragem.ddns.net 70
iterm_two"~10 / auragem.ddns.net 70
i* Range Searching: For searching in ranges of numbers or dates. Can / auragem.ddns.net 70
ibe used with filters, like the PUBLISHDATE filter. An example of / auragem.ddns.net 70
ifiltering based on a publication date range would be, / auragem.ddns.net 70
iPUBLISHDATE:[20220101 to 20231201] / auragem.ddns.net 70
i / auragem.ddns.net 70
i / auragem.ddns.net 70
iFeatures Coming Soon / auragem.ddns.net 70
i-------------------- / auragem.ddns.net 70
i / auragem.ddns.net 70
i* PDF and Djvu file metadata indexed / auragem.ddns.net 70
i* Image file metadata indexed / auragem.ddns.net 70
i* Plain text file full contents indexed / auragem.ddns.net 70
i* Backlinks and searching of link text / auragem.ddns.net 70
i* Page Metadata Lookup / auragem.ddns.net 70
i* Full Markdown, Tinylog, and Twtxt parsing to get links, titles, and / auragem.ddns.net 70
iheading information. / auragem.ddns.net 70
i / auragem.ddns.net 70
i / auragem.ddns.net 70
iHistory / auragem.ddns.net 70
i------- / auragem.ddns.net 70
i / auragem.ddns.net 70
iAuraGem was a search engine that I started about 2 years ago under / auragem.ddns.net 70
iits original name, Ponix Search. It was originally designed to / auragem.ddns.net 70
iexperiment with how I could make search results better. The official / auragem.ddns.net 70
iannouncement of the Search Engine happened on 2021-07-01: / auragem.ddns.net 70
h2021-07-01 Search Engine & Ponix Capsule Now Open Source (MIT) URL:gemini://auragem.ddns.net/devlog/20210701.gmi auragem.ddns.net 70
02021-12-05 AuraGem Search Begins Crawling Again /g/search/devlog/20211205.gmi auragem.ddns.net 70
i / auragem.ddns.net 70
iNote that some of the information in the above posts have been / auragem.ddns.net 70
irecently updated to match the current URL and Ip Address of the / auragem.ddns.net 70
icrawler and gemini capsule. / auragem.ddns.net 70
i / auragem.ddns.net 70
iOne of the first priorities with AuraSearch was to have extraction of / auragem.ddns.net 70
ifile metadata for as many files as possible. Audio files were one of / auragem.ddns.net 70
ithe first to get this feature. PDFs and Djvu files were supposed to be / auragem.ddns.net 70
inext, and support was added for them on 2022-07-19, but the feature / auragem.ddns.net 70
iwas buggy and never worked, unfortunately. As you can see in the below / auragem.ddns.net 70
ipost, I chose to go with Keyword Extraction (which was later removed / auragem.ddns.net 70
iand replaced with simple mentions and tags extraction) instead of Full / auragem.ddns.net 70
iText Searching on page contents. Part of this was to save space, and / auragem.ddns.net 70
ipart of it was to respect copyright. However, I am rethinking this / auragem.ddns.net 70
iapproach now that the Stats page can determine how large the text-only / auragem.ddns.net 70
iportion of geminispace is (no more than 5GB total). / auragem.ddns.net 70
h2022-07-19 AuraGem Search Engine Update URL:gemini://auragem.ddns.net/devlog/20220719.gmi auragem.ddns.net 70
1Stats Page /g/search/stats/ auragem.ddns.net 70
i / auragem.ddns.net 70
iIn the above article, you can see that I start to play with the / auragem.ddns.net 70
inotion of different types of searches. I think this idea remains / auragem.ddns.net 70
iimportant today: / auragem.ddns.net 70
i> Another problem that the above process would not catch are names and / auragem.ddns.net 70
i> proper nouns. These are often very important words that people would / auragem.ddns.net 70
i> want to search for (e.g. Mathematics, C++, Celine Dion, FTS). I do not / auragem.ddns.net 70
i> have an easy method for this atm. / auragem.ddns.net 70
i / auragem.ddns.net 70
iThe next update on 2022-07-21 added Full Text Searching of link and / auragem.ddns.net 70
ifile metadata, which drastically improved the speed of searches. Yes, / auragem.ddns.net 70
ithis came with stemming because my database's FTS uses Lucene++. / auragem.ddns.net 70
h2022-07-21 AuraGem Search Update URL:gemini://auragem.ddns.net/devlog/20220720_search.gmi auragem.ddns.net 70
i / auragem.ddns.net 70
iNot long after I wrote an article about FTS, ranking systems, and / auragem.ddns.net 70
isome of the problems that Search Engines have to handle: / auragem.ddns.net 70
h2022-07-22 Search Engine Ranking Systems Are Being Left Unquestioned URL:gemini://auragem.ddns.net/devlog/20220722.gmi auragem.ddns.net 70
i / auragem.ddns.net 70
iThe most important portion of this article, however, is recognizing / auragem.ddns.net 70
ihow people do searches: / auragem.ddns.net 70
i> This also introduces the argument that the ranking systems are really / auragem.ddns.net 70
i> only important for underspecified queries (broad queries), so the / auragem.ddns.net 70
i> emphasis on the problems with ranking algorithms is unwarranted. This / auragem.ddns.net 70
i> argument hardly makes sense when the majority of searches that people / auragem.ddns.net 70
i> make are broad. I would also argue that broad searches are most used / auragem.ddns.net 70
i> for *discovering* pages, not for getting to a specific page. However, / auragem.ddns.net 70
i> ranking based on popularity prioritizes what it thinks people would / auragem.ddns.net 70
i> want, which is more suited for specific searches using broad queries, / auragem.ddns.net 70
i> at the expense of discovery of broad topics. Broad discovery using / auragem.ddns.net 70
i> broad topic queries and specific searches using proper-noun queries or / auragem.ddns.net 70
i> very specific queries are both much better ways of dealing with / auragem.ddns.net 70
i> searches without relying on popularity. / auragem.ddns.net 70
i / auragem.ddns.net 70
iWhen making a search engine, one must balance the search results / auragem.ddns.net 70
ibetween discovery (broadness) and exact matches (exactness). Relevancy / auragem.ddns.net 70
iapplies to both of these, but is more important for discovery. I / auragem.ddns.net 70
icontinue to think that link analysis assumes that people want exact / auragem.ddns.net 70
imatches of pages while using broad queries. For example, if someone / auragem.ddns.net 70
itypes in "search engine", a PageRank system would put the most popular / auragem.ddns.net 70
isearch engine at the top along with popular articles about search / auragem.ddns.net 70
iengines, assuming that the person wanted that specific search engine, / auragem.ddns.net 70
iwhen it's more likely they wanted a collection of search engines. / auragem.ddns.net 70
iRather, my approach is to return broad relevant discovery-based / auragem.ddns.net 70
iresults with broad queries, and exact pages with exact queries. / auragem.ddns.net 70
i / auragem.ddns.net 70
iExact queries include words from titles, domain names, capsule names, / auragem.ddns.net 70
iservice names, basically mainly proper nouns or a specific combination / auragem.ddns.net 70
iof words that matches the page information. Broad queries, however, / auragem.ddns.net 70
iuse category names and common nouns. / auragem.ddns.net 70
i / auragem.ddns.net 70
iWhen I type "Station", I want an exact match for Station itself. / auragem.ddns.net 70
iHowever, when I type "social network", I want search results that give / auragem.ddns.net 70
ia very broad set of capsules that are social networks. I believe that / auragem.ddns.net 70
ithis is how most people would use search engines, especially if they / auragem.ddns.net 70
ido not rely much on filtering, and this is the exact methodology that / auragem.ddns.net 70
iI use for my article analyzing gemini's search engines: / auragem.ddns.net 70
h2022-08-07 Gemini Search Results Study, Part 1 URL:gemini://auragem.ddns.net/devlog/20220807.gmi auragem.ddns.net 70