Add new SentenceTransformer model
Browse files- README.md +68 -65
- model.safetensors +1 -1
    	
        README.md
    CHANGED
    
    | @@ -16,50 +16,53 @@ tags: | |
| 16 | 
             
            - loss:MultipleNegativesSymmetricRankingLoss
         | 
| 17 | 
             
            base_model: Alibaba-NLP/gte-modernbert-base
         | 
| 18 | 
             
            widget:
         | 
| 19 | 
            -
            - source_sentence:  | 
| 20 | 
            -
                 | 
| 21 | 
             
              sentences:
         | 
| 22 | 
            -
              - In  | 
| 23 | 
            -
                 | 
| 24 | 
            -
              - In  | 
| 25 | 
            -
                the  | 
| 26 | 
            -
             | 
| 27 | 
            -
             | 
| 28 | 
            -
             | 
| 29 | 
            -
             | 
|  | |
| 30 | 
             
              sentences:
         | 
| 31 | 
            -
              -  | 
| 32 | 
            -
             | 
| 33 | 
            -
             | 
| 34 | 
            -
             | 
| 35 | 
            -
             | 
| 36 | 
            -
             | 
| 37 | 
            -
             | 
| 38 | 
            -
                 | 
| 39 | 
             
              sentences:
         | 
| 40 | 
            -
              - The  | 
| 41 | 
            -
                 | 
| 42 | 
            -
              -  | 
| 43 | 
            -
             | 
| 44 | 
            -
                 | 
| 45 | 
            -
            -  | 
| 46 | 
            -
                ,  | 
|  | |
|  | |
| 47 | 
             
              sentences:
         | 
| 48 | 
            -
              -  | 
| 49 | 
            -
                 | 
| 50 | 
            -
              -  | 
| 51 | 
            -
                 | 
| 52 | 
            -
              -  | 
| 53 | 
            -
                 | 
| 54 | 
            -
            - source_sentence:  | 
| 55 | 
            -
                 | 
| 56 | 
             
              sentences:
         | 
| 57 | 
            -
              -  | 
| 58 | 
            -
                 | 
| 59 | 
            -
              -  | 
| 60 | 
            -
                 | 
| 61 | 
            -
              -  | 
| 62 | 
            -
                 | 
| 63 | 
             
            datasets:
         | 
| 64 | 
             
            - redis/langcache-sentencepairs-v1
         | 
| 65 | 
             
            pipeline_tag: sentence-similarity
         | 
| @@ -159,9 +162,9 @@ from sentence_transformers import SentenceTransformer | |
| 159 | 
             
            model = SentenceTransformer("redis/langcache-embed-v3")
         | 
| 160 | 
             
            # Run inference
         | 
| 161 | 
             
            sentences = [
         | 
| 162 | 
            -
                ' | 
| 163 | 
            -
                ' | 
| 164 | 
            -
                ' | 
| 165 | 
             
            ]
         | 
| 166 | 
             
            embeddings = model.encode(sentences)
         | 
| 167 | 
             
            print(embeddings.shape)
         | 
| @@ -170,9 +173,9 @@ print(embeddings.shape) | |
| 170 | 
             
            # Get the similarity scores for the embeddings
         | 
| 171 | 
             
            similarities = model.similarity(embeddings, embeddings)
         | 
| 172 | 
             
            print(similarities)
         | 
| 173 | 
            -
            # tensor([[0. | 
| 174 | 
            -
            #         [0. | 
| 175 | 
            -
            #         [0. | 
| 176 | 
             
            ```
         | 
| 177 |  | 
| 178 | 
             
            <!--
         | 
| @@ -238,19 +241,19 @@ You can finetune this model on your own dataset. | |
| 238 | 
             
            #### LangCache Sentence Pairs (all)
         | 
| 239 |  | 
| 240 | 
             
            * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
         | 
| 241 | 
            -
            * Size:  | 
| 242 | 
             
            * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
         | 
| 243 | 
             
            * Approximate statistics based on the first 1000 samples:
         | 
| 244 | 
            -
              |         | sentence1                                                                         | sentence2                                                                         | label | 
| 245 | 
            -
               | 
| 246 | 
            -
              | type    | string                                                                            | string                                                                            | int | 
| 247 | 
            -
              | details | <ul><li>min: 8 tokens</li><li>mean: 27. | 
| 248 | 
             
            * Samples:
         | 
| 249 | 
            -
              | sentence1 | 
| 250 | 
            -
               | 
| 251 | 
            -
              | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | 
| 252 | 
            -
              | <code> | 
| 253 | 
            -
              | <code> | 
| 254 | 
             
            * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
         | 
| 255 | 
             
              ```json
         | 
| 256 | 
             
              {
         | 
| @@ -265,19 +268,19 @@ You can finetune this model on your own dataset. | |
| 265 | 
             
            #### LangCache Sentence Pairs (all)
         | 
| 266 |  | 
| 267 | 
             
            * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
         | 
| 268 | 
            -
            * Size:  | 
| 269 | 
             
            * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
         | 
| 270 | 
             
            * Approximate statistics based on the first 1000 samples:
         | 
| 271 | 
            -
              |         | sentence1                                                                         | sentence2                                                                         | label | 
| 272 | 
            -
               | 
| 273 | 
            -
              | type    | string                                                                            | string                                                                            | int | 
| 274 | 
            -
              | details | <ul><li>min: 8 tokens</li><li>mean: 27. | 
| 275 | 
             
            * Samples:
         | 
| 276 | 
            -
              | sentence1 | 
| 277 | 
            -
               | 
| 278 | 
            -
              | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | 
| 279 | 
            -
              | <code> | 
| 280 | 
            -
              | <code> | 
| 281 | 
             
            * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
         | 
| 282 | 
             
              ```json
         | 
| 283 | 
             
              {
         | 
|  | |
| 16 | 
             
            - loss:MultipleNegativesSymmetricRankingLoss
         | 
| 17 | 
             
            base_model: Alibaba-NLP/gte-modernbert-base
         | 
| 18 | 
             
            widget:
         | 
| 19 | 
            +
            - source_sentence: 'See Precambrian time scale # Proposed Geologic timeline for another
         | 
| 20 | 
            +
                set of periods 4600 -- 541 MYA .'
         | 
| 21 | 
             
              sentences:
         | 
| 22 | 
            +
              - In 2014 election , Biju Janata Dal candidate Tathagat Satapathy Bharatiya Janata
         | 
| 23 | 
            +
                party candidate Rudra Narayan Pany defeated with a margin of 1.37,340 votes .
         | 
| 24 | 
            +
              - In Scotland , the Strathclyde Partnership for Transport , formerly known as Strathclyde
         | 
| 25 | 
            +
                Passenger Transport Executive , comprises the former Strathclyde region , which
         | 
| 26 | 
            +
                includes the urban area around Glasgow .
         | 
| 27 | 
            +
              - 'See Precambrian Time Scale # Proposed Geological Timeline for another set of
         | 
| 28 | 
            +
                periods of 4600 -- 541 MYA .'
         | 
| 29 | 
            +
            - source_sentence: It is also 5 kilometers northeast of Tamaqua , 27 miles south of
         | 
| 30 | 
            +
                Allentown and 9 miles northwest of Hazleton .
         | 
| 31 | 
             
              sentences:
         | 
| 32 | 
            +
              - In 1948 he moved to Massachusetts , and eventually settled in Vermont .
         | 
| 33 | 
            +
              - Suddenly I remembered that I was a New Zealander , I caught the first plane home
         | 
| 34 | 
            +
                and came back .
         | 
| 35 | 
            +
              - It is also 5 miles northeast of Tamaqua , 27 miles south of Allentown , and 9
         | 
| 36 | 
            +
                miles northwest of Hazleton .
         | 
| 37 | 
            +
            - source_sentence: The party has a Member of Parliament , a member of the House of
         | 
| 38 | 
            +
                Lords , three members of the London Assembly and two Members of the European Parliament
         | 
| 39 | 
            +
                .
         | 
| 40 | 
             
              sentences:
         | 
| 41 | 
            +
              - The party has one Member of Parliament , one member of the House of Lords , three
         | 
| 42 | 
            +
                Members of the London Assembly and two Members of the European Parliament .
         | 
| 43 | 
            +
              - Grapsid crabs dominate in Australia , Malaysia and Panama , while gastropods Cerithidea
         | 
| 44 | 
            +
                scalariformis and Melampus coeffeus are important seed predators in Florida mangroves
         | 
| 45 | 
            +
                .
         | 
| 46 | 
            +
              - Music Story is a music service website and international music data provider that
         | 
| 47 | 
            +
                curates , aggregates and analyses metadata for digital music services .
         | 
| 48 | 
            +
            - source_sentence: 'The play received two 1969 Tony Award nominations : Best Actress
         | 
| 49 | 
            +
                in a Play ( Michael Annals ) and Best Costume Design ( Charlotte Rae ) .'
         | 
| 50 | 
             
              sentences:
         | 
| 51 | 
            +
              - Ravishanker is a fellow of the International Statistical Institute and an elected
         | 
| 52 | 
            +
                member of the American Statistical Association .
         | 
| 53 | 
            +
              - 'In 1969 , the play received two Tony - Award nominations : Best Actress in a
         | 
| 54 | 
            +
                Theatre Play ( Michael Annals ) and Best Costume Design ( Charlotte Rae ) .'
         | 
| 55 | 
            +
              - AMD and Nvidia both have proprietary methods of scaling , CrossFireX for AMD ,
         | 
| 56 | 
            +
                and SLI for Nvidia .
         | 
| 57 | 
            +
            - source_sentence: He was a close friend of Ángel Cabrera and is a cousin of golfer
         | 
| 58 | 
            +
                Tony Croatto .
         | 
| 59 | 
             
              sentences:
         | 
| 60 | 
            +
              - He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto
         | 
| 61 | 
            +
                .
         | 
| 62 | 
            +
              - Eugenijus Bartulis ( born December 7 , 1949 in Kaunas ) is a Lithuanian Roman
         | 
| 63 | 
            +
                Catholic priest , and Bishop of Šiauliai .
         | 
| 64 | 
            +
              - UWIRE also distributes its members content to professional media outlets , including
         | 
| 65 | 
            +
                Yahoo , CNN and CBS News .
         | 
| 66 | 
             
            datasets:
         | 
| 67 | 
             
            - redis/langcache-sentencepairs-v1
         | 
| 68 | 
             
            pipeline_tag: sentence-similarity
         | 
|  | |
| 162 | 
             
            model = SentenceTransformer("redis/langcache-embed-v3")
         | 
| 163 | 
             
            # Run inference
         | 
| 164 | 
             
            sentences = [
         | 
| 165 | 
            +
                'He was a close friend of Ángel Cabrera and is a cousin of golfer Tony Croatto .',
         | 
| 166 | 
            +
                'He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto .',
         | 
| 167 | 
            +
                'UWIRE also distributes its members content to professional media outlets , including Yahoo , CNN and CBS News .',
         | 
| 168 | 
             
            ]
         | 
| 169 | 
             
            embeddings = model.encode(sentences)
         | 
| 170 | 
             
            print(embeddings.shape)
         | 
|  | |
| 173 | 
             
            # Get the similarity scores for the embeddings
         | 
| 174 | 
             
            similarities = model.similarity(embeddings, embeddings)
         | 
| 175 | 
             
            print(similarities)
         | 
| 176 | 
            +
            # tensor([[0.9922, 0.9922, 0.5352],
         | 
| 177 | 
            +
            #         [0.9922, 0.9961, 0.5391],
         | 
| 178 | 
            +
            #         [0.5352, 0.5391, 1.0000]], dtype=torch.bfloat16)
         | 
| 179 | 
             
            ```
         | 
| 180 |  | 
| 181 | 
             
            <!--
         | 
|  | |
| 241 | 
             
            #### LangCache Sentence Pairs (all)
         | 
| 242 |  | 
| 243 | 
             
            * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
         | 
| 244 | 
            +
            * Size: 26,850 training samples
         | 
| 245 | 
             
            * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
         | 
| 246 | 
             
            * Approximate statistics based on the first 1000 samples:
         | 
| 247 | 
            +
              |         | sentence1                                                                         | sentence2                                                                         | label                        |
         | 
| 248 | 
            +
              |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------|
         | 
| 249 | 
            +
              | type    | string                                                                            | string                                                                            | int                          |
         | 
| 250 | 
            +
              | details | <ul><li>min: 8 tokens</li><li>mean: 27.35 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
         | 
| 251 | 
             
            * Samples:
         | 
| 252 | 
            +
              | sentence1                                                                                                             | sentence2                                                                                                                      | label          |
         | 
| 253 | 
            +
              |:----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------|:---------------|
         | 
| 254 | 
            +
              | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code>  | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code>            | <code>1</code> |
         | 
| 255 | 
            +
              | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
         | 
| 256 | 
            +
              | <code>The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .</code>        | <code>The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .</code>                      | <code>1</code> |
         | 
| 257 | 
             
            * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
         | 
| 258 | 
             
              ```json
         | 
| 259 | 
             
              {
         | 
|  | |
| 268 | 
             
            #### LangCache Sentence Pairs (all)
         | 
| 269 |  | 
| 270 | 
             
            * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
         | 
| 271 | 
            +
            * Size: 26,850 evaluation samples
         | 
| 272 | 
             
            * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
         | 
| 273 | 
             
            * Approximate statistics based on the first 1000 samples:
         | 
| 274 | 
            +
              |         | sentence1                                                                         | sentence2                                                                         | label                        |
         | 
| 275 | 
            +
              |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------|
         | 
| 276 | 
            +
              | type    | string                                                                            | string                                                                            | int                          |
         | 
| 277 | 
            +
              | details | <ul><li>min: 8 tokens</li><li>mean: 27.35 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
         | 
| 278 | 
             
            * Samples:
         | 
| 279 | 
            +
              | sentence1                                                                                                             | sentence2                                                                                                                      | label          |
         | 
| 280 | 
            +
              |:----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------|:---------------|
         | 
| 281 | 
            +
              | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code>  | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code>            | <code>1</code> |
         | 
| 282 | 
            +
              | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
         | 
| 283 | 
            +
              | <code>The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .</code>        | <code>The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .</code>                      | <code>1</code> |
         | 
| 284 | 
             
            * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
         | 
| 285 | 
             
              ```json
         | 
| 286 | 
             
              {
         | 
    	
        model.safetensors
    CHANGED
    
    | @@ -1,3 +1,3 @@ | |
| 1 | 
             
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            -
            oid sha256: | 
| 3 | 
             
            size 298041696
         | 
|  | |
| 1 | 
             
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:95d02211c4cca89113f9f3e93ed91f5176bf50170faa2cb835f7bfea15bb9dd2
         | 
| 3 | 
             
            size 298041696
         | 

